-
Notifications
You must be signed in to change notification settings - Fork 909
More ordinal comparison fixes #2025
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report
@@ Coverage Diff @@
## main #2025 +/- ##
=======================================
Coverage 99.08% 99.09%
=======================================
Files 143 143
Lines 16448 16505 +57
=======================================
+ Hits 16298 16355 +57
Misses 150 150
Continue to review full report at Codecov.
|
| def get_function(self): | ||
| return np.greater | ||
| def greater_than(val1, val2): | ||
| if pdtypes.is_categorical_dtype(val1) and pdtypes.is_categorical_dtype( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we have the top level logic be
if val_1 is categorical or val_2 is categorical:
....
return val_1 > val_2
so that for the non-categorical case we exit after examining one if statement instead of 2?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we could, but if I'm thinking about this right if we did that the logic might a little harder to follow.
We have a few cases to handle:
- One input is categorical, the other is not ->
nan - Both inputs are categorical and categories are not equal ->
nan - Both inputs are categorical and categories are equal ->
val1 > val2 - Inputs are both numeric or both datetime ->
val1 > val2
I think if we moved the or statement to the top we would have to do something like this and handle the case where they are both equal inside that first conditional:
if val1 is categorical or val2 is categorical:
if val1 is categorical and val2 is categorical:
if val1.categories == val2.categories:
return val1 > val2
return np.nan
return val1 > val2I guess it's debatable whether that is more clear or less clear, but I find it a little harder to follow. I think the "special case" that should work but doesn't is a little more hidden in the updated flow since it's combined with the case where both are categorical and the categories don't match.
If you prefer that approach I can update - don't have a strong preference. Or is there something better yet that I'm not seeing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if we keep the if / elif structure but store the is_categorical calls in variables so we aren't potentially making them multiple times
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, we can do that. I was thinking those calls would be pretty fast so didn't worry about it, but maybe that's not the case. Storing in variables certainly won't hurt. I'll update.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that change will also make the code a little easier to read.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated: 3b26676
More ordinal comparison fixes
Implements additional changes to these primitives to enable comparison between ordinal columns that have different order values:
GreaterThanGreaterThanEqualToLessThanLessThanEqualTo