Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement New Single Table DFS Algorithm #2516

Merged
merged 84 commits into from
Apr 27, 2023
Merged

Implement New Single Table DFS Algorithm #2516

merged 84 commits into from
Apr 27, 2023

Conversation

dvreed77
Copy link
Contributor

@dvreed77 dvreed77 commented Mar 13, 2023

Pull Request Description

Implement New Single Table DFS Algorithm only using Schema as input

Fixes #2487

@dvreed77 dvreed77 self-assigned this Mar 13, 2023
@codecov
Copy link

codecov bot commented Mar 13, 2023

Codecov Report

Merging #2516 (255207d) into main (ffaad87) will increase coverage by 0.00%.
The diff coverage is 99.69%.

@@           Coverage Diff            @@
##             main    #2516    +/-   ##
========================================
  Coverage   99.46%   99.47%            
========================================
  Files         392      403    +11     
  Lines       23211    24197   +986     
========================================
+ Hits        23087    24070   +983     
- Misses        124      127     +3     
Impacted Files Coverage Δ
...ols/tests/testing_utils/generate_fake_dataframe.py 98.00% <98.00%> (ø)
featuretools/feature_discovery/LiteFeature.py 98.95% <98.95%> (ø)
...eaturetools/feature_discovery/FeatureCollection.py 100.00% <100.00%> (ø)
featuretools/feature_discovery/convertors.py 100.00% <100.00%> (ø)
...eaturetools/feature_discovery/feature_discovery.py 100.00% <100.00%> (ø)
featuretools/feature_discovery/type_defs.py 100.00% <100.00%> (ø)
featuretools/feature_discovery/utils.py 100.00% <100.00%> (ø)
featuretools/primitives/utils.py 99.00% <100.00%> (+0.05%) ⬆️
...retools/tests/feature_discovery/test_convertors.py 100.00% <100.00%> (ø)
...tests/feature_discovery/test_feature_collection.py 100.00% <100.00%> (ø)
... and 2 more

assert fb.base_features[0].get_name() == "f_1"

f_2.name = "f_2"
fb = convert_feature_to_featurebase(f_2, df, {})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if this is the right place for this question, but what happens if we create the f_2 feature which depends on f_1, and then after that f_1 gets renamed (a new alias, I think), does everything still work as expected when it comes time to calculate feature values for f_2? What would happen to the default name of the f_2 feature in this case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing the alias has the biggest impact when dealing with origin features and should be avoided as much as possible, but should be fine:

Here is what would happen:

  • You create an origin Feature based on column named "Column A"
  • You create an engineered Feature using the Absolute primitive to create the name "ABSOLUTE(Column A)"
  • You now have 2 features, and if you want to create a feature matrix off of this, you need to convert these back to FeatureBase features first using the conversion function. This will create 2 FeatureBase features with names above and everything should work as expected

Now if you want to change the alias AFTER feature discovery, here is what would happen.

  • you set an alias on the origin LiteFeature, to give it a name of "f1", and alter the corresponding column on the dataframe
  • you now need to convert the 2 LiteFeatures to FeatureBase features. The Origin Feature would create an IdentityFeature pointing to the "f1" column on the dataframe. Then it would create a TransformFeature, with the name of "ABSOLUTE(Column A)", which probably isn't what you want, but it will still be pointing to the correct base_feature.
  • Essentially everything works as expected, its just the names won't be right.

@thehomebrewnerd thehomebrewnerd changed the title Rewrite DFS Algorithm Implement New Single Table DFS Algorithm Apr 26, 2023
@dvreed77 dvreed77 merged commit a686a0f into main Apr 27, 2023
@dvreed77 dvreed77 deleted the dfs_rewrite branch April 27, 2023 13:31
@dvreed77 dvreed77 mentioned this pull request Apr 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement Single Table DFS Algorithm
8 participants