Skip to content

Fix pipeline example erroring out on DFS#4059

Merged
jeremyliweishih merged 19 commits into
mainfrom
js_4056_fix_dfs_pipeline_example
Mar 14, 2023
Merged

Fix pipeline example erroring out on DFS#4059
jeremyliweishih merged 19 commits into
mainfrom
js_4056_fix_dfs_pipeline_example

Conversation

@jeremyliweishih

Copy link
Copy Markdown
Collaborator

Fixes #4056.

This PR removes the features parameters in the pipeline example when DFS is present and adds in the features as an optional argument.

@codecov

codecov Bot commented Mar 8, 2023

Copy link
Copy Markdown

Codecov Report

Merging #4059 (f09ff18) into main (1412fc3) will increase coverage by 0.1%.
The diff coverage is 100.0%.

@@           Coverage Diff           @@
##            main   #4059     +/-   ##
=======================================
+ Coverage   99.7%   99.7%   +0.1%     
=======================================
  Files        349     349             
  Lines      37514   37559     +45     
=======================================
+ Hits       37396   37441     +45     
  Misses       118     118             
Impacted Files Coverage Δ
...nents/transformers/preprocessing/stl_decomposer.py 100.0% <ø> (ø)
evalml/pipelines/pipeline_base.py 98.5% <ø> (ø)
evalml/pipelines/utils.py 99.6% <100.0%> (+0.1%) ⬆️
evalml/tests/pipeline_tests/test_pipeline_utils.py 99.8% <100.0%> (+0.1%) ⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@jeremyliweishih jeremyliweishih marked this pull request as ready for review March 8, 2023 20:54
Comment thread evalml/pipelines/pipeline_base.py Outdated
parameters_repr = ", ".join(
[
f"'{component}':{{{repr_component(parameters)}}}"
if component != "DFS Transformer"

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to exclude the DFS Transformer since the __repr__ of a feature is not executable.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could this ever be problematic if there's separate Numeric Pipeline vs Categorical Pipeline? I forget if dfs transformer comes before we split pipelines, though, so I definitely might not have all the info here.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question goes for has_dfs = "DFS Transformer" in element.component_graph.compute_order below

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the DFS Transformer always comes first so I don't believe it should be a problem (and it'll always be named DFS Transformer

Comment thread evalml/pipelines/utils.py
df = pd.read_csv(PATH_TO_TRAIN)
y_train = df[TARGET]
X_train = df.drop(TARGET, axis=1)
df = ww.deserialize.from_disk(PATH_TO_TRAIN)

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved to use woodwork serialization to keep feature origin info.


X_train = pd.DataFrame(X_train)
X_train.columns = X_train.columns.astype(str)
es = ft.EntitySet()

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can split the DFS/non-DFS case as well but I thought that the DFS case covers all the edge cases already so I kept it like this for now. LMK what you all think.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tihnk it's fine to just retain the DFS case only for now!

@jeremyliweishih jeremyliweishih requested review from christopherbunn, chukarsten, eccabay and fjlanasa and removed request for eccabay March 8, 2023 21:00

@christopherbunn christopherbunn left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit + question about warning but once addressed can approve!

Comment thread evalml/pipelines/utils.py
Comment thread evalml/pipelines/utils.py Outdated
@gsheni gsheni requested a review from thehomebrewnerd March 9, 2023 19:48
Comment thread docs/source/release_notes.rst Outdated

@chukarsten chukarsten left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if we just change the check for DFS Transformer to be more robust based on the name attribute, we're good!

Comment thread evalml/pipelines/utils.py Outdated

@eccabay eccabay left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed with Karsten about removing reliance on the DFS transformer's name, and I'm worried about the implications of removing necessary repro information from __repr__

Comment thread evalml/pipelines/utils.py
Comment thread evalml/pipelines/utils.py Outdated
Comment thread evalml/pipelines/pipeline_base.py

@eccabay eccabay left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!

Comment thread evalml/pipelines/utils.py
Comment thread evalml/tests/pipeline_tests/test_pipeline_utils.py
@jeremyliweishih jeremyliweishih enabled auto-merge (squash) March 14, 2023 23:00
@jeremyliweishih jeremyliweishih merged commit 20d5b2c into main Mar 14, 2023
@jeremyliweishih jeremyliweishih deleted the js_4056_fix_dfs_pipeline_example branch March 14, 2023 23:19
@chukarsten chukarsten mentioned this pull request Mar 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

generate_pipeline_example does not work with DFSTransformer

5 participants