New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve SVM hyperparameters #2651
Conversation
Codecov Report
@@ Coverage Diff @@
## main #2651 +/- ##
=====================================
Coverage 99.9% 99.9%
=====================================
Files 298 298
Lines 27305 27305
=====================================
Hits 27261 27261
Misses 44 44
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Just two hyper nits about the doc strings!
evalml/pipelines/components/estimators/classifiers/svm_classifier.py
Outdated
Show resolved
Hide resolved
evalml/pipelines/components/estimators/regressors/svm_regressor.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@eccabay Thank you for this! I agree with removing the linear kernel for both regression and classification but not sure if we should tweak the other hyperparameters for regression since the perf tests you only ran on binary classification.
I left a comment on your results as well about how much better SVM is than the next best estimator. The fit time is only slightly slower for most datasets but for some it's like 4x slower. If the SVM more than 4x better, there is an argument for including it in AutoMLSearch.
I'd like to continue the discussion on your perf test doc before approving!
@@ -42,7 +42,7 @@ class SVMRegressor(Estimator): | |||
ProblemTypes.TIME_SERIES_REGRESSION, | |||
]""" | |||
|
|||
def __init__(self, C=1.0, kernel="rbf", gamma="scale", random_seed=0, **kwargs): | |||
def __init__(self, C=1.0, kernel="rbf", gamma="auto", random_seed=0, **kwargs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we make this change? The perf tests only considered binary classification problems.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a fair point! It's hard to say since we don't have very many regression datasets in looking glass. I'll run a few tests with what we have and see if the results are consistent or not!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@freddyaboulton Results from the regression testing is now in the performance test doc! On the very small number of datasets we have, "auto" performs better significantly more often than "scale", so I think this change should happen.
One thing I forgot to implement/mention before publishing this PR is that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Also curious about the change of gamma
from scale
to auto
, but will wait for the results on that!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @eccabay !
Closes #2615 by removing "linear" as an option from
SVMClassifier
andSVMRegressor
, and swaps "auto" to be SVM's default gamma parameter for increased first-guess performance (as discussed in performance test results here)