-
Notifications
You must be signed in to change notification settings - Fork 879
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove null index values on normalized dataframe #1897
Conversation
Codecov Report
@@ Coverage Diff @@
## main #1897 +/- ##
=======================================
Coverage 98.76% 98.77%
=======================================
Files 147 148 +1
Lines 16307 16355 +48
=======================================
+ Hits 16106 16154 +48
Misses 201 201
Continue to review full report at Codecov.
|
featuretools/entityset/entityset.py
Outdated
@@ -803,6 +803,7 @@ def normalize_dataframe(self, base_dataframe_name, new_dataframe_name, index, | |||
|
|||
new_dataframe2 = new_dataframe. \ | |||
drop_duplicates(index, keep='first')[selected_columns] | |||
new_dataframe2 = new_dataframe2.dropna(subset=[index]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doing the drop here might cause a scenario where make_secondary_time_index
logic reintroduces null index values but I'm not positive
any ideas on why upgrading to WW 0.12.0 would impact this test? featuretools/featuretools/tests/entityset_tests/test_es.py Lines 2175 to 2180 in 79eea5f
Edit: appears to be an issue comparing the null tuples of LatLongs, introduced by alteryx/woodwork#1255 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm assuming tests pass!
Fixes #1874
This PR doesn't address #1680 as with the NA index value removed from the dataframe at normalization, there's no NA-indexed data to group on, even if we enabled the NA GROUP