Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix pipeline example erroring out on DFS #4059

Merged
merged 19 commits into from
Mar 14, 2023

Conversation

jeremyliweishih
Copy link
Collaborator

Fixes #4056.

This PR removes the features parameters in the pipeline example when DFS is present and adds in the features as an optional argument.

@codecov
Copy link

codecov bot commented Mar 8, 2023

Codecov Report

Merging #4059 (f09ff18) into main (1412fc3) will increase coverage by 0.1%.
The diff coverage is 100.0%.

@@           Coverage Diff           @@
##            main   #4059     +/-   ##
=======================================
+ Coverage   99.7%   99.7%   +0.1%     
=======================================
  Files        349     349             
  Lines      37514   37559     +45     
=======================================
+ Hits       37396   37441     +45     
  Misses       118     118             
Impacted Files Coverage Δ
...nents/transformers/preprocessing/stl_decomposer.py 100.0% <ø> (ø)
evalml/pipelines/pipeline_base.py 98.5% <ø> (ø)
evalml/pipelines/utils.py 99.6% <100.0%> (+0.1%) ⬆️
evalml/tests/pipeline_tests/test_pipeline_utils.py 99.8% <100.0%> (+0.1%) ⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@jeremyliweishih jeremyliweishih marked this pull request as ready for review March 8, 2023 20:54
@@ -709,6 +709,8 @@ def repr_component(parameters):
parameters_repr = ", ".join(
[
f"'{component}':{{{repr_component(parameters)}}}"
if component != "DFS Transformer"
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to exclude the DFS Transformer since the __repr__ of a feature is not executable.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could this ever be problematic if there's separate Numeric Pipeline vs Categorical Pipeline? I forget if dfs transformer comes before we split pipelines, though, so I definitely might not have all the info here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question goes for has_dfs = "DFS Transformer" in element.component_graph.compute_order below

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the DFS Transformer always comes first so I don't believe it should be a problem (and it'll always be named DFS Transformer

df = pd.read_csv(PATH_TO_TRAIN)
y_train = df[TARGET]
X_train = df.drop(TARGET, axis=1)
df = ww.deserialize.from_disk(PATH_TO_TRAIN)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved to use woodwork serialization to keep feature origin info.


X_train = pd.DataFrame(X_train)
X_train.columns = X_train.columns.astype(str)
es = ft.EntitySet()
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can split the DFS/non-DFS case as well but I thought that the DFS case covers all the edge cases already so I kept it like this for now. LMK what you all think.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tihnk it's fine to just retain the DFS case only for now!

@jeremyliweishih jeremyliweishih requested review from eccabay, chukarsten, christopherbunn and fjlanasa and removed request for eccabay March 8, 2023 21:00
Copy link
Contributor

@christopherbunn christopherbunn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit + question about warning but once addressed can approve!

evalml/pipelines/utils.py Show resolved Hide resolved
evalml/pipelines/utils.py Outdated Show resolved Hide resolved
Copy link
Contributor

@chukarsten chukarsten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if we just change the check for DFS Transformer to be more robust based on the name attribute, we're good!

evalml/pipelines/utils.py Outdated Show resolved Hide resolved
Copy link
Contributor

@eccabay eccabay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed with Karsten about removing reliance on the DFS transformer's name, and I'm worried about the implications of removing necessary repro information from __repr__

evalml/pipelines/utils.py Show resolved Hide resolved
evalml/pipelines/utils.py Outdated Show resolved Hide resolved
evalml/pipelines/pipeline_base.py Show resolved Hide resolved
Copy link
Contributor

@eccabay eccabay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!

evalml/pipelines/utils.py Show resolved Hide resolved
@jeremyliweishih jeremyliweishih enabled auto-merge (squash) March 14, 2023 23:00
@jeremyliweishih jeremyliweishih merged commit 20d5b2c into main Mar 14, 2023
@jeremyliweishih jeremyliweishih deleted the js_4056_fix_dfs_pipeline_example branch March 14, 2023 23:19
@chukarsten chukarsten mentioned this pull request Mar 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

generate_pipeline_example does not work with DFSTransformer
5 participants