Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Uses full entity update for dependencies of uses_full_entity features #110

Merged
merged 5 commits into from Mar 14, 2018

Conversation

bschreck
Copy link
Contributor

This PR fixes two issues related to dependencies of uses_full_entity features.

Issue 1:

If:

a feature X does not have the uses_full_entity attribute
a dependent feature Y does have the uses_full_entity attribute
another dependent feature Z does not have the uses_full_entity attribute, and is one of the requested output features
Then the output frame as a result of computing this feature X should be placed in both entity_frames, and large_entity_frames in pandas_backend.py. However, we previously only check if the feature X itself is a requested output feature, not if it has a dependent that is an output feature (and not a uses_full_entity feature). This means the output of X was only placed in large_entity_frames. The computation of feature Z then failed because it depended on X being in entity_frames.

Issue 2:
To make sure we don't have overlapping columns when we concatenate the output frame as a result of computing a feature with the existing entity_frames, we drop duplicated columns in feature computation output frame (called result_frame in the code). However, we removed these columns in-place, such that if we have to do 2 concats (placing the output frame in both entity_frames and large_entity_frames, it's possible that we remove some computed features that wouldn't get placed in the second output frame.

It is much cleaner to explicitly label each feature with the input frame is should be given, and which output frames its result should be placed in. This fixes issue 2 nicely

I wrote a test case that checks for both of these conditions.

@bschreck bschreck requested a review from kmax12 March 13, 2018 21:50
@codecov-io
Copy link

Codecov Report

Merging #110 into master will increase coverage by 0.03%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #110      +/-   ##
==========================================
+ Coverage   88.21%   88.24%   +0.03%     
==========================================
  Files          73       73              
  Lines        7393     7412      +19     
==========================================
+ Hits         6522     6541      +19     
  Misses        871      871
Impacted Files Coverage Δ
...turetools/computational_backends/pandas_backend.py 89.27% <100%> (+0.03%) ⬆️
...eaturetools/computational_backends/feature_tree.py 99.31% <100%> (+0.02%) ⬆️
.../feature_function_tests/test_transform_features.py 87.84% <100%> (+0.2%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7b16155...b1b1107. Read the comment docs.

@kmax12
Copy link
Contributor

kmax12 commented Mar 14, 2018

Looks great! Merging

@kmax12 kmax12 merged commit 6c9a011 into master Mar 14, 2018
@rwedge rwedge mentioned this pull request Mar 21, 2018
@rwedge rwedge deleted the uses-full-entity-update branch June 3, 2019 15:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants