-
Notifications
You must be signed in to change notification settings - Fork 87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle features with dataframe names that aren't 'X' #3873
Conversation
@@ -28,6 +28,8 @@ def __init__(self, index="index", features=None, random_seed=0, **kwargs): | |||
self.index = index | |||
self.features = features | |||
self._passed_in_features = True if features else None | |||
# If features are passed in, they'll have a dataframe_name we should utilize | |||
self._dataframe_name = self.features[0].dataframe_name if self.features else "X" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The use of self.features[0]
here assumes that all features were created from the same dataframe.
I didn't add a comment here explaining that, since it seems that assumption is true for the whole component (for example, in how we create the entityset in _make_entity_set
), but I'd be happy to note that here if it would be useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's add the note!
Codecov Report
@@ Coverage Diff @@
## main #3873 +/- ##
=======================================
+ Coverage 99.7% 99.7% +0.1%
=======================================
Files 344 344
Lines 36191 36211 +20
=======================================
+ Hits 36054 36074 +20
Misses 137 137
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
@@ -36,16 +38,21 @@ def _make_entity_set(self, X): | |||
ft_es = EntitySet() | |||
# TODO: This delete was introduced for compatibility with Featuretools 1.0.0. This should | |||
# be removed after Featuretools handles unnamed dataframes being passed to this function. | |||
# But even then, we should still use any dataframe name that might be available from input features. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The way I see it, if a user has passed Feature
objects in, then we do have a dataframe_name that we can use for X
instead of X.ww.name
(even if they're the same!), so this TODO explaining why we delete the woodwork schema does still apply.
I wanted to make sure that was clear, since I think it's a little less obvious now that we aren't fully hardcoding "X"
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
love it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jeremyliweishih added it 4c37408 - let me know if that makes sense to you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
amazing!
@@ -28,6 +28,8 @@ def __init__(self, index="index", features=None, random_seed=0, **kwargs): | |||
self.index = index | |||
self.features = features | |||
self._passed_in_features = True if features else None | |||
# If features are passed in, they'll have a dataframe_name we should utilize | |||
self._dataframe_name = self.features[0].dataframe_name if self.features else "X" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's add the note!
@@ -36,16 +38,21 @@ def _make_entity_set(self, X): | |||
ft_es = EntitySet() | |||
# TODO: This delete was introduced for compatibility with Featuretools 1.0.0. This should | |||
# be removed after Featuretools handles unnamed dataframes being passed to this function. | |||
# But even then, we should still use any dataframe name that might be available from input features. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
love it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One non-blocking suggestion, but looks good to me overall.
if self.index not in X.columns: | ||
es = ft_es.add_dataframe( | ||
dataframe=X, | ||
dataframe_name="X", | ||
dataframe_name=self._dataframe_name, | ||
index=self.index, | ||
make_index=True, | ||
) | ||
else: | ||
es = ft_es.add_dataframe(dataframe=X, dataframe_name="X", index=self.index) | ||
es = ft_es.add_dataframe( | ||
dataframe=X, | ||
dataframe_name=self._dataframe_name, | ||
index=self.index, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bit of an unrelated nitpick, but I think this entire code block can be cleaned up to remove the if/else construct:
should_make_index = self.index not in X.columns
es = ft_es.add_dataframe(
dataframe=X,
dataframe_name=self._dataframe_name,
index=self.index,
make_index=should_make_index,
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice, changed
closes #3872
If features are passed in, uses the
dataframe_name
from the Feature object to set thedastaframe_name
in_make_entity_set
instead of always calling it"X"
.This allows us to run calculate feature matrix from the DFS transformer on Features with a dataframe name that is not
"X"
.