Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove null index values on normalized dataframe #1897

Merged
merged 8 commits into from
Feb 11, 2022
Merged

Remove null index values on normalized dataframe #1897

merged 8 commits into from
Feb 11, 2022

Conversation

rwedge
Copy link
Contributor

@rwedge rwedge commented Feb 10, 2022

Fixes #1874

This PR doesn't address #1680 as with the NA index value removed from the dataframe at normalization, there's no NA-indexed data to group on, even if we enabled the NA GROUP

@tamargrey
Copy link
Contributor

We can probably close #1873 now @dvreed77

@gsheni gsheni requested a review from a team February 10, 2022 21:25
@codecov
Copy link

codecov bot commented Feb 10, 2022

Codecov Report

Merging #1897 (e72379b) into main (a8604d5) will increase coverage by 0.00%.
The diff coverage is 100.00%.

Impacted file tree graph

@@           Coverage Diff           @@
##             main    #1897   +/-   ##
=======================================
  Coverage   98.76%   98.77%           
=======================================
  Files         147      148    +1     
  Lines       16307    16355   +48     
=======================================
+ Hits        16106    16154   +48     
  Misses        201      201           
Impacted Files Coverage Δ
featuretools/entityset/entityset.py 99.21% <100.00%> (+<0.01%) ⬆️
featuretools/primitives/__init__.py 100.00% <100.00%> (ø)
...ools/tests/entityset_tests/test_last_time_index.py 100.00% <100.00%> (ø)
...eaturetools/tests/entry_point_tests/test_plugin.py 100.00% <100.00%> (ø)
...retools/tests/entry_point_tests/test_primitives.py 100.00% <100.00%> (ø)
featuretools/tests/entry_point_tests/utils.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b5f0251...e72379b. Read the comment docs.

@@ -803,6 +803,7 @@ def normalize_dataframe(self, base_dataframe_name, new_dataframe_name, index,

new_dataframe2 = new_dataframe. \
drop_duplicates(index, keep='first')[selected_columns]
new_dataframe2 = new_dataframe2.dropna(subset=[index])
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doing the drop here might cause a scenario where make_secondary_time_index logic reintroduces null index values but I'm not positive

@rwedge
Copy link
Contributor Author

rwedge commented Feb 10, 2022

any ideas on why upgrading to WW 0.12.0 would impact this test?

def test_pd_es_pickling(pd_es):
pkl = pickle.dumps(pd_es)
unpickled = pickle.loads(pkl)
assert pd_es.__eq__(unpickled, deep=True)
assert not hasattr(unpickled, WW_SCHEMA_KEY)

Edit: appears to be an issue comparing the null tuples of LatLongs, introduced by alteryx/woodwork#1255

Copy link
Contributor

@tamargrey tamargrey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm assuming tests pass!

@rwedge rwedge enabled auto-merge (squash) February 11, 2022 18:57
@rwedge rwedge merged commit ad4b7d1 into main Feb 11, 2022
@rwedge rwedge deleted the issue-1874 branch February 11, 2022 19:17
@dvreed77 dvreed77 mentioned this pull request Feb 11, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

normalize_dataframe can introduce a null value into the new DataFrame's index, causing Woodwork error
2 participants