Skip to content

Conversation

@thehomebrewnerd
Copy link
Contributor

@thehomebrewnerd thehomebrewnerd commented Apr 21, 2022

More ordinal comparison fixes

Implements additional changes to these primitives to enable comparison between ordinal columns that have different order values:

  • GreaterThan
  • GreaterThanEqualTo
  • LessThan
  • LessThanEqualTo

@thehomebrewnerd thehomebrewnerd marked this pull request as draft April 21, 2022 15:00
@codecov
Copy link

codecov bot commented Apr 21, 2022

Codecov Report

Merging #2025 (5e6bdc9) into main (54c2ba8) will increase coverage by 0.00%.
The diff coverage is 100.00%.

@@           Coverage Diff           @@
##             main    #2025   +/-   ##
=======================================
  Coverage   99.08%   99.09%           
=======================================
  Files         143      143           
  Lines       16448    16505   +57     
=======================================
+ Hits        16298    16355   +57     
  Misses        150      150           
Impacted Files Coverage Δ
...aturetools/primitives/standard/binary_transform.py 100.00% <100.00%> (ø)
...s/tests/primitive_tests/test_transform_features.py 99.85% <100.00%> (+<0.01%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 54c2ba8...5e6bdc9. Read the comment docs.

@thehomebrewnerd thehomebrewnerd marked this pull request as ready for review April 21, 2022 17:25
def get_function(self):
return np.greater
def greater_than(val1, val2):
if pdtypes.is_categorical_dtype(val1) and pdtypes.is_categorical_dtype(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we have the top level logic be

if val_1 is categorical or val_2 is categorical:
    ....
return val_1 > val_2

so that for the non-categorical case we exit after examining one if statement instead of 2?

Copy link
Contributor Author

@thehomebrewnerd thehomebrewnerd Apr 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could, but if I'm thinking about this right if we did that the logic might a little harder to follow.

We have a few cases to handle:

  1. One input is categorical, the other is not -> nan
  2. Both inputs are categorical and categories are not equal -> nan
  3. Both inputs are categorical and categories are equal -> val1 > val2
  4. Inputs are both numeric or both datetime -> val1 > val2

I think if we moved the or statement to the top we would have to do something like this and handle the case where they are both equal inside that first conditional:

if val1 is categorical or val2 is categorical:
    if val1 is categorical and val2 is categorical:
        if val1.categories == val2.categories:
            return val1 > val2
    return np.nan
return val1 > val2

I guess it's debatable whether that is more clear or less clear, but I find it a little harder to follow. I think the "special case" that should work but doesn't is a little more hidden in the updated flow since it's combined with the case where both are categorical and the categories don't match.

If you prefer that approach I can update - don't have a strong preference. Or is there something better yet that I'm not seeing?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if we keep the if / elif structure but store the is_categorical calls in variables so we aren't potentially making them multiple times

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, we can do that. I was thinking those calls would be pretty fast so didn't worry about it, but maybe that's not the case. Storing in variables certainly won't hurt. I'll update.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that change will also make the code a little easier to read.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated: 3b26676

@thehomebrewnerd thehomebrewnerd requested a review from rwedge April 21, 2022 20:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants