Skip to content

Implement New Single Table DFS Algorithm#2516

Merged
dvreed77 merged 84 commits into
mainfrom
dfs_rewrite
Apr 27, 2023
Merged

Implement New Single Table DFS Algorithm#2516
dvreed77 merged 84 commits into
mainfrom
dfs_rewrite

Conversation

@dvreed77

@dvreed77 dvreed77 commented Mar 13, 2023

Copy link
Copy Markdown
Contributor

Pull Request Description

Implement New Single Table DFS Algorithm only using Schema as input

Fixes #2487

@dvreed77 dvreed77 self-assigned this Mar 13, 2023
@codecov

codecov Bot commented Mar 13, 2023

Copy link
Copy Markdown

Codecov Report

Merging #2516 (255207d) into main (ffaad87) will increase coverage by 0.00%.
The diff coverage is 99.69%.

@@           Coverage Diff            @@
##             main    #2516    +/-   ##
========================================
  Coverage   99.46%   99.47%            
========================================
  Files         392      403    +11     
  Lines       23211    24197   +986     
========================================
+ Hits        23087    24070   +983     
- Misses        124      127     +3     
Impacted Files Coverage Δ
...ols/tests/testing_utils/generate_fake_dataframe.py 98.00% <98.00%> (ø)
featuretools/feature_discovery/LiteFeature.py 98.95% <98.95%> (ø)
...eaturetools/feature_discovery/FeatureCollection.py 100.00% <100.00%> (ø)
featuretools/feature_discovery/convertors.py 100.00% <100.00%> (ø)
...eaturetools/feature_discovery/feature_discovery.py 100.00% <100.00%> (ø)
featuretools/feature_discovery/type_defs.py 100.00% <100.00%> (ø)
featuretools/feature_discovery/utils.py 100.00% <100.00%> (ø)
featuretools/primitives/utils.py 99.00% <100.00%> (+0.05%) ⬆️
...retools/tests/feature_discovery/test_convertors.py 100.00% <100.00%> (ø)
...tests/feature_discovery/test_feature_collection.py 100.00% <100.00%> (ø)
... and 2 more

Comment thread featuretools/feature_discovery/LiteFeature.py Outdated
Comment thread featuretools/feature_discovery/FeatureCollection.py Outdated
Comment thread featuretools/feature_discovery/feature_discovery.py Outdated
Comment thread featuretools/feature_discovery/LiteFeature.py Outdated
Comment thread featuretools/feature_discovery/LiteFeature.py
Comment thread featuretools/feature_discovery/LiteFeature.py Outdated
Comment thread featuretools/feature_discovery/LiteFeature.py Outdated
Comment thread featuretools/feature_discovery/LiteFeature.py Outdated
Comment thread featuretools/feature_discovery/LiteFeature.py
Comment thread featuretools/feature_discovery/LiteFeature.py
Comment thread featuretools/feature_discovery/LiteFeature.py
Comment thread featuretools/feature_discovery/LiteFeature.py Outdated
Comment thread featuretools/feature_discovery/LiteFeature.py Outdated
Comment thread featuretools/feature_discovery/convertors.py
Comment thread featuretools/tests/feature_discovery/test_feature_discovery.py
Comment thread featuretools/tests/feature_discovery/test_type_defs.py Outdated
Comment thread featuretools/feature_discovery/FeatureCollection.py Outdated
Comment thread featuretools/feature_discovery/FeatureCollection.py
Comment thread featuretools/feature_discovery/FeatureCollection.py Outdated
Comment thread featuretools/feature_discovery/FeatureCollection.py Outdated
Comment thread featuretools/feature_discovery/convertors.py Outdated
Comment thread featuretools/feature_discovery/feature_discovery.py
Comment thread featuretools/feature_discovery/feature_discovery.py
Comment thread featuretools/feature_discovery/feature_discovery.py
Comment thread featuretools/feature_discovery/feature_discovery.py Outdated
assert fb.base_features[0].get_name() == "f_1"

f_2.name = "f_2"
fb = convert_feature_to_featurebase(f_2, df, {})

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if this is the right place for this question, but what happens if we create the f_2 feature which depends on f_1, and then after that f_1 gets renamed (a new alias, I think), does everything still work as expected when it comes time to calculate feature values for f_2? What would happen to the default name of the f_2 feature in this case?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing the alias has the biggest impact when dealing with origin features and should be avoided as much as possible, but should be fine:

Here is what would happen:

  • You create an origin Feature based on column named "Column A"
  • You create an engineered Feature using the Absolute primitive to create the name "ABSOLUTE(Column A)"
  • You now have 2 features, and if you want to create a feature matrix off of this, you need to convert these back to FeatureBase features first using the conversion function. This will create 2 FeatureBase features with names above and everything should work as expected

Now if you want to change the alias AFTER feature discovery, here is what would happen.

  • you set an alias on the origin LiteFeature, to give it a name of "f1", and alter the corresponding column on the dataframe
  • you now need to convert the 2 LiteFeatures to FeatureBase features. The Origin Feature would create an IdentityFeature pointing to the "f1" column on the dataframe. Then it would create a TransformFeature, with the name of "ABSOLUTE(Column A)", which probably isn't what you want, but it will still be pointing to the correct base_feature.
  • Essentially everything works as expected, its just the names won't be right.

@thehomebrewnerd thehomebrewnerd changed the title Rewrite DFS Algorithm Implement New Single Table DFS Algorithm Apr 26, 2023
@dvreed77 dvreed77 merged commit a686a0f into main Apr 27, 2023
@dvreed77 dvreed77 deleted the dfs_rewrite branch April 27, 2023 13:31
@dvreed77 dvreed77 mentioned this pull request Apr 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement Single Table DFS Algorithm

8 participants