Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changing Elastic Net Classifier params #2269

Merged
merged 23 commits into from May 18, 2021
Merged

Changing Elastic Net Classifier params #2269

merged 23 commits into from May 18, 2021

Conversation

bchen1116
Copy link
Contributor

@bchen1116 bchen1116 commented May 13, 2021

partial fix for #2145.
Docs here
Perf tests here

While this PR does result in better performance in the docs, the main issue that still stands is that the scoring methods we had used (lead_scoring and fraud) allowed a pipeline that always guessed the positive (minority) class to have the best score. To actually solve the issue, it would be useful to tweak the objective functions so that this case would score poorly, otherwise this issue can arise again in the future.

I think that can be a separate PR that can follow some additional discussion. Things we'd need to discuss are:

  • How can we create objectives that can score better, even if we are using various sampling methods?
  • Can these new objectives be robust to models that only guess one class?

@bchen1116 bchen1116 self-assigned this May 13, 2021
@codecov
Copy link

codecov bot commented May 17, 2021

Codecov Report

Merging #2269 (8e71b4a) into main (077ec74) will not change coverage.
The diff coverage is 100.0%.

Impacted file tree graph

@@           Coverage Diff           @@
##             main    #2269   +/-   ##
=======================================
  Coverage   100.0%   100.0%           
=======================================
  Files         280      280           
  Lines       24336    24336           
=======================================
  Hits        24314    24314           
  Misses         22       22           
Impacted Files Coverage Δ
...derstanding/prediction_explanations/_algorithms.py 98.9% <ø> (ø)
evalml/tests/component_tests/test_en_classifier.py 100.0% <ø> (ø)
evalml/tests/pipeline_tests/test_pipelines.py 100.0% <ø> (ø)
...ts/estimators/classifiers/elasticnet_classifier.py 100.0% <100.0%> (ø)
...ents/estimators/regressors/elasticnet_regressor.py 100.0% <100.0%> (ø)
evalml/tests/component_tests/test_components.py 100.0% <100.0%> (ø)
evalml/tests/component_tests/test_en_regressor.py 100.0% <100.0%> (ø)
...s/prediction_explanations_tests/test_algorithms.py 98.3% <100.0%> (+0.1%) ⬆️
...lml/tests/model_understanding_tests/test_graphs.py 100.0% <100.0%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 077ec74...8e71b4a. Read the comment docs.

@bchen1116
Copy link
Contributor Author

Filed an issue to track the SHAP failure here

@bchen1116 bchen1116 changed the title [TEST] Changing Elastic Net Classifier params Changing Elastic Net Classifier params May 17, 2021
@bchen1116 bchen1116 marked this pull request as ready for review May 17, 2021 20:28
Copy link
Contributor

@freddyaboulton freddyaboulton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bchen1116 I think this looks good! Thanks for chasing down the original issue and recommending a very targeted fix. 🎉

I agree that we can tackle the issues with custom objectives you've identified in a separate ticket.

@@ -109,8 +109,12 @@ def test_shap(estimator, problem_type, n_points_to_explain, X_y_binary, X_y_mult
is_binary = False
else:
training_data, y = X_y_regression
if "Elastic Net" in estimator.name:
parameters = {"Elastic Net Classifier": {"alpha": 0.5, "l1_ratio": 0.5, 'n_jobs': 1}}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tracked by #2281 - @bchen1116 can you add a TODO?

@@ -68,7 +68,7 @@
"metadata": {},
"outputs": [],
"source": [
"X, y = evalml.demos.load_fraud(n_rows=1000)"
"X, y = evalml.demos.load_fraud(n_rows=5000)"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added n_rows=5000 as stated in the perf test doc here for performance improvement

Copy link
Collaborator

@chukarsten chukarsten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Copy link
Contributor

@ParthivNaresh ParthivNaresh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work with the perf tests!

assert (output_.index == ['Intercept', 'Second', 'Fourth', 'First', 'Third']).all()
elif estimator.name == 'Elastic Net Regressor':
assert (output_.index == ['Intercept', 'Second', 'Third', 'Fourth', 'First']).all()
assert (output_.index == ['Intercept', 'Second', 'Fourth', 'First', 'Third']).all()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noice

@bchen1116 bchen1116 merged commit 59e9dd0 into main May 18, 2021
@chukarsten chukarsten mentioned this pull request May 24, 2021
@chukarsten chukarsten mentioned this pull request Jun 2, 2021
@freddyaboulton freddyaboulton deleted the bc_testing_elasticnet branch May 13, 2022 15:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants