Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle features with dataframe names that aren't 'X' #3873

Merged
merged 7 commits into from
Dec 6, 2022

Conversation

tamargrey
Copy link
Contributor

closes #3872

If features are passed in, uses the dataframe_name from the Feature object to set the dastaframe_name in _make_entity_set instead of always calling it "X".

This allows us to run calculate feature matrix from the DFS transformer on Features with a dataframe name that is not "X".

@@ -28,6 +28,8 @@ def __init__(self, index="index", features=None, random_seed=0, **kwargs):
self.index = index
self.features = features
self._passed_in_features = True if features else None
# If features are passed in, they'll have a dataframe_name we should utilize
self._dataframe_name = self.features[0].dataframe_name if self.features else "X"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The use of self.features[0] here assumes that all features were created from the same dataframe.

I didn't add a comment here explaining that, since it seems that assumption is true for the whole component (for example, in how we create the entityset in _make_entity_set), but I'd be happy to note that here if it would be useful.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's add the note!

@codecov
Copy link

codecov bot commented Dec 5, 2022

Codecov Report

Merging #3873 (f45370d) into main (d8f9c68) will increase coverage by 0.1%.
The diff coverage is 100.0%.

@@           Coverage Diff           @@
##            main   #3873     +/-   ##
=======================================
+ Coverage   99.7%   99.7%   +0.1%     
=======================================
  Files        344     344             
  Lines      36191   36211     +20     
=======================================
+ Hits       36054   36074     +20     
  Misses       137     137             
Impacted Files Coverage Δ
...ponents/transformers/preprocessing/featuretools.py 100.0% <100.0%> (ø)
evalml/tests/component_tests/test_featuretools.py 100.0% <100.0%> (ø)

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@@ -36,16 +38,21 @@ def _make_entity_set(self, X):
ft_es = EntitySet()
# TODO: This delete was introduced for compatibility with Featuretools 1.0.0. This should
# be removed after Featuretools handles unnamed dataframes being passed to this function.
# But even then, we should still use any dataframe name that might be available from input features.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way I see it, if a user has passed Feature objects in, then we do have a dataframe_name that we can use for X instead of X.ww.name (even if they're the same!), so this TODO explaining why we delete the woodwork schema does still apply.

I wanted to make sure that was clear, since I think it's a little less obvious now that we aren't fully hardcoding "X".

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

love it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jeremyliweishih added it 4c37408 - let me know if that makes sense to you!

Copy link
Collaborator

@jeremyliweishih jeremyliweishih left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

amazing!

@@ -28,6 +28,8 @@ def __init__(self, index="index", features=None, random_seed=0, **kwargs):
self.index = index
self.features = features
self._passed_in_features = True if features else None
# If features are passed in, they'll have a dataframe_name we should utilize
self._dataframe_name = self.features[0].dataframe_name if self.features else "X"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's add the note!

@@ -36,16 +38,21 @@ def _make_entity_set(self, X):
ft_es = EntitySet()
# TODO: This delete was introduced for compatibility with Featuretools 1.0.0. This should
# be removed after Featuretools handles unnamed dataframes being passed to this function.
# But even then, we should still use any dataframe name that might be available from input features.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

love it

Copy link
Contributor

@thehomebrewnerd thehomebrewnerd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One non-blocking suggestion, but looks good to me overall.

Comment on lines 45 to 57
if self.index not in X.columns:
es = ft_es.add_dataframe(
dataframe=X,
dataframe_name="X",
dataframe_name=self._dataframe_name,
index=self.index,
make_index=True,
)
else:
es = ft_es.add_dataframe(dataframe=X, dataframe_name="X", index=self.index)
es = ft_es.add_dataframe(
dataframe=X,
dataframe_name=self._dataframe_name,
index=self.index,
)
Copy link
Contributor

@thehomebrewnerd thehomebrewnerd Dec 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit of an unrelated nitpick, but I think this entire code block can be cleaned up to remove the if/else construct:

should_make_index = self.index not in X.columns
es = ft_es.add_dataframe(
    dataframe=X,
    dataframe_name=self._dataframe_name,
    index=self.index,
    make_index=should_make_index,
)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice, changed

@tamargrey tamargrey merged commit 53a4912 into main Dec 6, 2022
@tamargrey tamargrey deleted the fix-dfs-feature-df-names branch December 6, 2022 16:49
@christopherbunn christopherbunn mentioned this pull request Dec 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DFS Transformer will error if features passed in have DataFrame name that is not "X"
5 participants