Implement New Single Table DFS Algorithm#2516
Conversation
Codecov Report
@@ Coverage Diff @@
## main #2516 +/- ##
========================================
Coverage 99.46% 99.47%
========================================
Files 392 403 +11
Lines 23211 24197 +986
========================================
+ Hits 23087 24070 +983
- Misses 124 127 +3
|
| assert fb.base_features[0].get_name() == "f_1" | ||
|
|
||
| f_2.name = "f_2" | ||
| fb = convert_feature_to_featurebase(f_2, df, {}) |
There was a problem hiding this comment.
I'm not sure if this is the right place for this question, but what happens if we create the f_2 feature which depends on f_1, and then after that f_1 gets renamed (a new alias, I think), does everything still work as expected when it comes time to calculate feature values for f_2? What would happen to the default name of the f_2 feature in this case?
There was a problem hiding this comment.
Changing the alias has the biggest impact when dealing with origin features and should be avoided as much as possible, but should be fine:
Here is what would happen:
- You create an origin Feature based on column named "Column A"
- You create an engineered Feature using the Absolute primitive to create the name "ABSOLUTE(Column A)"
- You now have 2 features, and if you want to create a feature matrix off of this, you need to convert these back to FeatureBase features first using the conversion function. This will create 2 FeatureBase features with names above and everything should work as expected
Now if you want to change the alias AFTER feature discovery, here is what would happen.
- you set an alias on the origin LiteFeature, to give it a name of "f1", and alter the corresponding column on the dataframe
- you now need to convert the 2 LiteFeatures to FeatureBase features. The Origin Feature would create an IdentityFeature pointing to the "f1" column on the dataframe. Then it would create a TransformFeature, with the name of "ABSOLUTE(Column A)", which probably isn't what you want, but it will still be pointing to the correct base_feature.
- Essentially everything works as expected, its just the names won't be right.
Pull Request Description
Implement New Single Table DFS Algorithm only using Schema as input
Fixes #2487