Set index after adding ancestor relationship variables #668

kmax12 · 2019-07-14T22:35:57Z

This fixes a bug where we essentially reset the dataframe index after adding ancestor variables. This breaks merging later when trying to create aggregation features because we merge on the index

https://github.com/Featuretools/featuretools/blob/master/featuretools/computational_backends/feature_set_calculator.py#L611

This only occurs when you stack to a certain depth because you need to be creating features for an entity whose dataframe has had ancestor relationship variables added to it.

The test cases uses a string index to avoid the situation where the reset index is masked because it is the same as the existing index.

Fixes #643

codecov · 2019-07-14T22:41:42Z

Codecov Report

Merging #668 into master will increase coverage by <.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master     #668      +/-   ##
==========================================
+ Coverage   97.44%   97.44%   +<.01%     
==========================================
  Files         118      118              
  Lines        9618     9634      +16     
==========================================
+ Hits         9372     9388      +16     
  Misses        246      246

Impacted Files	Coverage Δ
...mputational_backend/test_feature_set_calculator.py	`100% <100%> (ø)`	⬆️
...s/computational_backends/feature_set_calculator.py	`98.1% <100%> (ø)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d191c64...e452c62. Read the comment docs.

CJStadler

LGTM!

CJStadler · 2019-07-18T20:39:54Z

featuretools/computational_backends/feature_set_calculator.py

@@ -337,6 +337,9 @@ def _add_ancestor_relationship_variables(self, child_df, parent_df,
                            left_on=relationship.child_variable.id,
                            right_on=relationship.child_variable.id)

+        # ensure index is maintained
+        df = df.set_index(relationship.child_entity.index, drop=False)


Probably not a big deal, but inplace looks like it might be faster.

In [1]: import pandas as pd In [2]: df10k = pd.DataFrame({'a': range(10000)}, index=range(10000)) In [3]: df100k = pd.DataFrame({'a': range(100000)}, index=range(100000)) In [4]: %timeit df10k.set_index('a', drop=False) 312 µs ± 7.26 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) In [5]: %timeit df100k.set_index('a', drop=False) 912 µs ± 50.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) In [6]: %timeit df10k.set_index('a', drop=False, inplace=True) 101 µs ± 5.42 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) In [7]: %timeit df100k.set_index('a', drop=False, inplace=True) 113 µs ± 25.7 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

kmax12 added 2 commits July 14, 2019 18:32

set index after adding ancestor relationship variables and add test

3bc1ab5

Update changelog.rst

757d9b5

kmax12 mentioned this pull request Jul 14, 2019

Depth 3 feature always equals to 0 #643

Closed

CJStadler previously approved these changes Jul 18, 2019

View reviewed changes

use inplace

a63eca8

kmax12 dismissed CJStadler’s stale review via a63eca8 July 18, 2019 21:02

kmax12 added 2 commits July 18, 2019 17:03

Merge branch 'master' into set-index-featureset-calculator

3c209a7

Update changelog.rst

e452c62

kmax12 merged commit 278c0c4 into master Jul 19, 2019

rwedge mentioned this pull request Aug 19, 2019

v0.10.0 #709

Merged

kmax12 deleted the set-index-featureset-calculator branch September 11, 2019 15:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Set index after adding ancestor relationship variables #668

Set index after adding ancestor relationship variables #668

kmax12 commented Jul 14, 2019 •

edited

Loading

codecov bot commented Jul 14, 2019 •

edited

Loading

CJStadler left a comment

CJStadler Jul 18, 2019

Set index after adding ancestor relationship variables #668

Set index after adding ancestor relationship variables #668

Conversation

kmax12 commented Jul 14, 2019 • edited Loading

codecov bot commented Jul 14, 2019 • edited Loading

Codecov Report

CJStadler left a comment

Choose a reason for hiding this comment

CJStadler Jul 18, 2019

Choose a reason for hiding this comment

kmax12 commented Jul 14, 2019 •

edited

Loading

codecov bot commented Jul 14, 2019 •

edited

Loading