Skip to content

compare pandas NA ambiguous objects correct #54

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Mar 15, 2025

Conversation

KRRT7
Copy link
Contributor

@KRRT7 KRRT7 commented Mar 15, 2025

User description

when trying to compare pandas NA values, which have ambiguous boolean behavior, codeflash crashed, fix


PR Type

Bug fix, Tests


Description

  • Fix NA comparison with pandas.isna check

  • Prevent crashes on comparing pd.NA values

  • Add extensive tests for pd.NA edge cases


Changes walkthrough 📝

Relevant files
Bug fix
comparator.py
NA handling using pandas.isna check                                           

codeflash/verification/comparator.py

  • Added condition to check both values with pandas.isna
  • Returns True when both NA values are detected
  • +2/-0     
    Tests
    test_comparator.py
    Add tests for pd.NA and None comparisons                                 

    tests/test_comparator.py

  • Introduced tests for comparing pd.NA with pd.NA and None
  • Validated NA behavior in Series, DataFrames, and dicts
  • Extended edge case coverage for NA sparse arrays
  • +30/-0   

    Need help?
  • Type /help how to ... in the comments thread for any questions about PR-Agent usage.
  • Check out the documentation for more information.
  • when trying to compare pandas NA values, which have ambiguous boolean behavior, codeflash crashed, fix
    Copy link

    PR Reviewer Guide 🔍

    Here are some key observations to aid the review process:

    ⏱️ Estimated effort to review: 2 🔵🔵⚪⚪⚪
    🧪 PR contains tests
    🔒 No security concerns identified
    ⚡ Recommended focus areas for review

    Condition Placement

    The NA check using pandas.isna is a good addition. However, ensure that its position in the branching logic does not inadvertently override other type-specific comparisons. It may be beneficial to explicitly document the intended priority of this condition.

    if HAS_PANDAS and pandas.isna(orig) and pandas.isna(new):
        return True
    Test Robustness

    Extensive tests for pd.NA edge cases are added. Consider including tests where pd.NA is mixed with other representations of missing data (e.g. numpy.nan) to fully validate comparator behavior.

    assert comparator(pd.NA, pd.NA)
    assert not comparator(pd.NA, None)
    assert not comparator(None, pd.NA)
    
    s1 = pd.Series([1, 2, pd.NA, 4])
    s2 = pd.Series([1, 2, pd.NA, 4])
    s3 = pd.Series([1, 2, None, 4])
    
    assert comparator(s1, s2)
    assert not comparator(s1, s3)
    
    df1 = pd.DataFrame({'a': [1, 2, pd.NA], 'b': [4, pd.NA, 6]})
    df2 = pd.DataFrame({'a': [1, 2, pd.NA], 'b': [4, pd.NA, 6]})
    df3 = pd.DataFrame({'a': [1, 2, None], 'b': [4, None, 6]})
    assert comparator(df1, df2)
    assert not comparator(df1, df3)
    
    d1 = {'a': pd.NA, 'b': [1, pd.NA, 3]}
    d2 = {'a': pd.NA, 'b': [1, pd.NA, 3]}
    d3 = {'a': None, 'b': [1, None, 3]}
    assert comparator(d1, d2)
    assert not comparator(d1, d3)
    
    s1 = pd.Series([1, 2, pd.NA, 4])
    s2 = pd.Series([1, 2, pd.NA, 4])
    
    filtered1 = s1[s1 > 1]
    filtered2 = s2[s2 > 1]
    assert comparator(filtered1, filtered2)

    Copy link

    PR Code Suggestions ✨

    Explore these optional code suggestions:

    CategorySuggestion                                                                                                                                    Impact
    Possible issue
    Ensure matching types

    Add a type equality check to ensure that both values are of the same missing-value
    type to prevent pd.NA and None from being treated as equivalent.

    codeflash/verification/comparator.py [158-159]

    -if HAS_PANDAS and pandas.isna(orig) and pandas.isna(new):
    +if HAS_PANDAS and type(orig) is type(new) and pandas.isna(orig) and pandas.isna(new):
         return True
    Suggestion importance[1-10]: 8

    __

    Why: The suggestion addresses a potential bug by ensuring that only missing values of the same type are considered equal, which prevents pd.NA and None from being erroneously evaluated as equivalent. This added type check increases correctness while the score is capped due to it being a type checking improvement.

    Medium

    @misrasaurabh1 misrasaurabh1 self-requested a review March 15, 2025 04:05
    @KRRT7 KRRT7 enabled auto-merge March 15, 2025 04:05
    @KRRT7 KRRT7 merged commit 6725dce into main Mar 15, 2025
    16 checks passed
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Projects
    None yet
    Development

    Successfully merging this pull request may close these issues.

    2 participants