-
Notifications
You must be signed in to change notification settings - Fork 87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix pipeline example erroring out on DFS #4059
Conversation
Codecov Report
@@ Coverage Diff @@
## main #4059 +/- ##
=======================================
+ Coverage 99.7% 99.7% +0.1%
=======================================
Files 349 349
Lines 37514 37559 +45
=======================================
+ Hits 37396 37441 +45
Misses 118 118
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
evalml/pipelines/pipeline_base.py
Outdated
@@ -709,6 +709,8 @@ def repr_component(parameters): | |||
parameters_repr = ", ".join( | |||
[ | |||
f"'{component}':{{{repr_component(parameters)}}}" | |||
if component != "DFS Transformer" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to exclude the DFS Transformer
since the __repr__
of a feature is not executable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could this ever be problematic if there's separate Numeric Pipeline
vs Categorical Pipeline
? I forget if dfs transformer comes before we split pipelines, though, so I definitely might not have all the info here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same question goes for has_dfs = "DFS Transformer" in element.component_graph.compute_order
below
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the DFS Transformer
always comes first so I don't believe it should be a problem (and it'll always be named DFS Transformer
df = pd.read_csv(PATH_TO_TRAIN) | ||
y_train = df[TARGET] | ||
X_train = df.drop(TARGET, axis=1) | ||
df = ww.deserialize.from_disk(PATH_TO_TRAIN) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
moved to use woodwork serialization to keep feature origin info.
|
||
X_train = pd.DataFrame(X_train) | ||
X_train.columns = X_train.columns.astype(str) | ||
es = ft.EntitySet() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can split the DFS/non-DFS case as well but I thought that the DFS case covers all the edge cases already so I kept it like this for now. LMK what you all think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tihnk it's fine to just retain the DFS case only for now!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit + question about warning but once addressed can approve!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think if we just change the check for DFS Transformer to be more robust based on the name attribute, we're good!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed with Karsten about removing reliance on the DFS transformer's name, and I'm worried about the implications of removing necessary repro information from __repr__
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great!
Fixes #4056.
This PR removes the features parameters in the pipeline example when DFS is present and adds in the features as an optional argument.