-
Notifications
You must be signed in to change notification settings - Fork 85
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tml 2022 backend 1st column id mark as primary key #3683
Tml 2022 backend 1st column id mark as primary key #3683
Conversation
Codecov Report
@@ Coverage Diff @@
## main #3683 +/- ##
=======================================
+ Coverage 99.7% 99.7% +0.1%
=======================================
Files 337 337
Lines 34067 34077 +10
=======================================
+ Hits 33936 33946 +10
Misses 131 131
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
…github.com:alteryx/evalml into TML-2022-backend-1st-column-id-mark-as-primary-key
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LEft some comments on some suggestions to clean/improve code! looking good otherwise
check_all_unique = X.nunique() == len(X) | ||
# Temporary solution for baton logical types mapping integers to doubles in woodwork logical types. | ||
# Will be removed when resolved. | ||
check_all_unique = X_double.nunique() == len(X_double) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To reduce the re-used code, I wonder if we can do either a for loop:
for dtypes in [['Double'], [Integer', 'Categorical']]:
# logic here
or to separate out this into a helper method, where we can pass in the Double
and Integer/Categorical
arguments. I know this is just a stopgap fix for now as we wait, but might be nice to not have repetitive code and reuse parts when we can.
check_all_unique | ||
].index.tolist() # columns whose values are all unique and doubles | ||
cols_with_all_unique_integers = [ | ||
col for col in cols_with_all_unique if all(X_double[col].mod(1).eq(0)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we can use
col for col in cols_with_all_unique if all(X_double[col].is_integer())
to capture this logic
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't work unfortunately because .is_integer
doesn't work for series types
@@ -1,4 +1,6 @@ | |||
"""Data check that checks if any of the features are likely to be ID columns.""" | |||
from xml.etree.ElementInclude import include |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this line for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure where that came from but I removed it!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for making the changes! This looks good to me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me!
Pull Request Description
IDColumnCheck now handles primary key columns containing "integer" values that are typed as doubles