-
Notifications
You must be signed in to change notification settings - Fork 91
Changing Elastic Net Classifier params #2269
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report
@@ Coverage Diff @@
## main #2269 +/- ##
=======================================
Coverage 100.0% 100.0%
=======================================
Files 280 280
Lines 24336 24336
=======================================
Hits 24314 24314
Misses 22 22
Continue to review full report at Codecov.
|
|
Filed an issue to track the SHAP failure here |
freddyaboulton
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bchen1116 I think this looks good! Thanks for chasing down the original issue and recommending a very targeted fix. 🎉
I agree that we can tackle the issues with custom objectives you've identified in a separate ticket.
| else: | ||
| training_data, y = X_y_regression | ||
| if "Elastic Net" in estimator.name: | ||
| parameters = {"Elastic Net Classifier": {"alpha": 0.5, "l1_ratio": 0.5, 'n_jobs': 1}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tracked by #2281 - @bchen1116 can you add a TODO?
| "outputs": [], | ||
| "source": [ | ||
| "X, y = evalml.demos.load_fraud(n_rows=1000)" | ||
| "X, y = evalml.demos.load_fraud(n_rows=5000)" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added n_rows=5000 as stated in the perf test doc here for performance improvement
chukarsten
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
ParthivNaresh
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work with the perf tests!
| assert (output_.index == ['Intercept', 'Second', 'Fourth', 'First', 'Third']).all() | ||
| elif estimator.name == 'Elastic Net Regressor': | ||
| assert (output_.index == ['Intercept', 'Second', 'Third', 'Fourth', 'First']).all() | ||
| assert (output_.index == ['Intercept', 'Second', 'Fourth', 'First', 'Third']).all() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Noice
partial fix for #2145.
Docs here
Perf tests here
While this PR does result in better performance in the docs, the main issue that still stands is that the scoring methods we had used (lead_scoring and fraud) allowed a pipeline that always guessed the positive (minority) class to have the best score. To actually solve the issue, it would be useful to tweak the objective functions so that this case would score poorly, otherwise this issue can arise again in the future.
I think that can be a separate PR that can follow some additional discussion. Things we'd need to discuss are: