Fix pipeline example erroring out on DFS#4059
Conversation
Codecov Report
@@ Coverage Diff @@
## main #4059 +/- ##
=======================================
+ Coverage 99.7% 99.7% +0.1%
=======================================
Files 349 349
Lines 37514 37559 +45
=======================================
+ Hits 37396 37441 +45
Misses 118 118
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
| parameters_repr = ", ".join( | ||
| [ | ||
| f"'{component}':{{{repr_component(parameters)}}}" | ||
| if component != "DFS Transformer" |
There was a problem hiding this comment.
Need to exclude the DFS Transformer since the __repr__ of a feature is not executable.
There was a problem hiding this comment.
could this ever be problematic if there's separate Numeric Pipeline vs Categorical Pipeline? I forget if dfs transformer comes before we split pipelines, though, so I definitely might not have all the info here.
There was a problem hiding this comment.
Same question goes for has_dfs = "DFS Transformer" in element.component_graph.compute_order below
There was a problem hiding this comment.
the DFS Transformer always comes first so I don't believe it should be a problem (and it'll always be named DFS Transformer
| df = pd.read_csv(PATH_TO_TRAIN) | ||
| y_train = df[TARGET] | ||
| X_train = df.drop(TARGET, axis=1) | ||
| df = ww.deserialize.from_disk(PATH_TO_TRAIN) |
There was a problem hiding this comment.
moved to use woodwork serialization to keep feature origin info.
|
|
||
| X_train = pd.DataFrame(X_train) | ||
| X_train.columns = X_train.columns.astype(str) | ||
| es = ft.EntitySet() |
There was a problem hiding this comment.
I can split the DFS/non-DFS case as well but I thought that the DFS case covers all the edge cases already so I kept it like this for now. LMK what you all think.
There was a problem hiding this comment.
I tihnk it's fine to just retain the DFS case only for now!
christopherbunn
left a comment
There was a problem hiding this comment.
Nit + question about warning but once addressed can approve!
chukarsten
left a comment
There was a problem hiding this comment.
I think if we just change the check for DFS Transformer to be more robust based on the name attribute, we're good!
eccabay
left a comment
There was a problem hiding this comment.
Agreed with Karsten about removing reliance on the DFS transformer's name, and I'm worried about the implications of removing necessary repro information from __repr__
Fixes #4056.
This PR removes the features parameters in the pipeline example when DFS is present and adds in the features as an optional argument.