Uses full entity update for dependencies of uses_full_entity features #110
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
This PR fixes two issues related to dependencies of uses_full_entity features.
Issue 1:
If:
a feature X does not have the uses_full_entity attribute
a dependent feature Y does have the uses_full_entity attribute
another dependent feature Z does not have the uses_full_entity attribute, and is one of the requested output features
Then the output frame as a result of computing this feature X should be placed in both entity_frames, and large_entity_frames in pandas_backend.py. However, we previously only check if the feature X itself is a requested output feature, not if it has a dependent that is an output feature (and not a uses_full_entity feature). This means the output of X was only placed in large_entity_frames. The computation of feature Z then failed because it depended on X being in entity_frames.
Issue 2:
To make sure we don't have overlapping columns when we concatenate the output frame as a result of computing a feature with the existing entity_frames, we drop duplicated columns in feature computation output frame (called result_frame in the code). However, we removed these columns in-place, such that if we have to do 2 concats (placing the output frame in both entity_frames and large_entity_frames, it's possible that we remove some computed features that wouldn't get placed in the second output frame.
It is much cleaner to explicitly label each feature with the input frame is should be given, and which output frames its result should be placed in. This fixes issue 2 nicely
I wrote a test case that checks for both of these conditions.