Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tml 2022 backend 1st column id mark as primary key #3683

Merged
merged 37 commits into from
Sep 1, 2022

Conversation

simha104
Copy link
Contributor

@simha104 simha104 commented Aug 24, 2022

Pull Request Description

IDColumnCheck now handles primary key columns containing "integer" values that are typed as doubles

simha104 and others added 30 commits August 4, 2022 10:17
@codecov
Copy link

codecov bot commented Aug 24, 2022

Codecov Report

Merging #3683 (b30cc6f) into main (6f82c82) will increase coverage by 0.1%.
The diff coverage is 100.0%.

@@           Coverage Diff           @@
##            main   #3683     +/-   ##
=======================================
+ Coverage   99.7%   99.7%   +0.1%     
=======================================
  Files        337     337             
  Lines      34067   34077     +10     
=======================================
+ Hits       33936   33946     +10     
  Misses       131     131             
Impacted Files Coverage Δ
evalml/data_checks/id_columns_data_check.py 100.0% <100.0%> (ø)
evalml/tests/conftest.py 98.0% <100.0%> (+0.1%) ⬆️
...ts/data_checks_tests/test_id_columns_data_check.py 100.0% <100.0%> (ø)

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

…github.com:alteryx/evalml into TML-2022-backend-1st-column-id-mark-as-primary-key
@simha104 simha104 marked this pull request as ready for review August 24, 2022 15:27
Copy link
Contributor

@bchen1116 bchen1116 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LEft some comments on some suggestions to clean/improve code! looking good otherwise

check_all_unique = X.nunique() == len(X)
# Temporary solution for baton logical types mapping integers to doubles in woodwork logical types.
# Will be removed when resolved.
check_all_unique = X_double.nunique() == len(X_double)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To reduce the re-used code, I wonder if we can do either a for loop:

for dtypes in [['Double'], [Integer', 'Categorical']]:
   # logic here

or to separate out this into a helper method, where we can pass in the Double and Integer/Categorical arguments. I know this is just a stopgap fix for now as we wait, but might be nice to not have repetitive code and reuse parts when we can.

check_all_unique
].index.tolist() # columns whose values are all unique and doubles
cols_with_all_unique_integers = [
col for col in cols_with_all_unique if all(X_double[col].mod(1).eq(0))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we can use

col for col in cols_with_all_unique if all(X_double[col].is_integer())

to capture this logic

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't work unfortunately because .is_integer doesn't work for series types

@@ -1,4 +1,6 @@
"""Data check that checks if any of the features are likely to be ID columns."""
from xml.etree.ElementInclude import include
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this line for?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure where that came from but I removed it!

Copy link
Contributor

@bchen1116 bchen1116 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for making the changes! This looks good to me

Copy link
Contributor

@ParthivNaresh ParthivNaresh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

@simha104 simha104 merged commit 368106d into main Sep 1, 2022
@simha104 simha104 deleted the TML-2022-backend-1st-column-id-mark-as-primary-key branch September 1, 2022 16:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants