Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Vowpal Wabbit estimators #2846

Merged
merged 42 commits into from
Oct 25, 2021
Merged

Add Vowpal Wabbit estimators #2846

merged 42 commits into from
Oct 25, 2021

Conversation

angela97lin
Copy link
Contributor

@angela97lin angela97lin commented Sep 27, 2021

Reviving #770 as part of inno days experimentation.

(Closes #770)

Perf test results here on adding to AutoML: https://alteryx.atlassian.net/wiki/spaces/PS/pages/1059620252/Adding+Vowpal+Wabbit+to+AutoML

I don't think we should add this to AutoML quite yet, but it's still worth to have these estimators as components!

IIRC, we probably need to update the feedstock requirements to get the conda package tests to pass, will do, but wanted to put this up for review first 😁


Edit: looks like the vowpal wabbit 8.8.1 version (aka the latest conda version) has a very different API from the latest (8.11.0) version which I based my code on. Filed VowpalWabbit/vowpal_wabbit#3406, but otherwise may consider looking into using the older API or holding off entirely.

@angela97lin angela97lin self-assigned this Sep 27, 2021
@codecov
Copy link

codecov bot commented Sep 30, 2021

Codecov Report

Merging #2846 (cf3195d) into main (8bc7823) will increase coverage by 0.1%.
The diff coverage is 98.4%.

Impacted file tree graph

@@           Coverage Diff           @@
##            main   #2846     +/-   ##
=======================================
+ Coverage   99.7%   99.7%   +0.1%     
=======================================
  Files        302     307      +5     
  Lines      28872   29049    +177     
=======================================
+ Hits       28781   28958    +177     
  Misses        91      91             
Impacted Files Coverage Δ
evalml/pipelines/__init__.py 100.0% <ø> (ø)
evalml/pipelines/components/__init__.py 100.0% <ø> (ø)
evalml/pipelines/components/estimators/__init__.py 100.0% <ø> (ø)
evalml/tests/conftest.py 98.3% <ø> (ø)
...alml/tests/model_family_tests/test_model_family.py 100.0% <ø> (ø)
...s/prediction_explanations_tests/test_explainers.py 100.0% <ø> (ø)
evalml/tests/utils_tests/test_dependencies.py 85.2% <ø> (ø)
evalml/utils/gen_utils.py 98.1% <ø> (ø)
evalml/tests/component_tests/test_utils.py 95.3% <25.0%> (ø)
evalml/model_family/model_family.py 100.0% <100.0%> (ø)
... and 9 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8bc7823...cf3195d. Read the comment docs.

Copy link
Contributor

@chukarsten chukarsten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks all well and good to me! I have to say, I'm a little out of the loop on this classifier/regressor and mostly just thought "VowpalWabbit" was weird seeing it living on the PR list. But this looks like a textbook new component addition. I think it would be very interesting using this one as a guinea pig for @christopherbunn 's modifications to the perf test report.

Comment on lines +63 to +65
raise NotImplementedError(
"Feature importance is not implemented for the Vowpal Wabbit classifiers."
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think what we normally do is return an array of 0's. I think this is fine for now, and ultimately the right thing, but we might have to revisit this again come AutoML integration.

@angela97lin
Copy link
Contributor Author

Update: looks like the vowpal wabbit 8.8.1 version (aka the latest conda version) has a very different API from the latest (8.11.0) version which I based my code on. Filed VowpalWabbit/vowpal_wabbit#3406, but otherwise may consider looking into using the older API or holding off entirely.

@angela97lin
Copy link
Contributor Author

Since there's the discrepancy above, we could either:

  • Ask and hope that VW releases a newer version on conda--seems unlikely
  • Make VW a pip only package.

Prophet is already a pip only package so it's not an entirely foreign concept, and I like this idea until we find it important to add to AutoML / better integrate it with EvalML.

Copy link
Contributor

@freddyaboulton freddyaboulton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@angela97lin Thank you for reviving this! I think it's ok to keep as pip-only. I'm curious what the perf tests will look like for this estimator. I just went down a documentation rabbit hole and there's a lot of options we can configure here.

self,
loss_function="logistic",
learning_rate=0.5,
decay_learning_rate=1.0,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great question. Hard to tell how this parameter is used from the docs but my guess is that if the decay learning rate is 1, the learning rate won't decay. That seems fine to me for a default, especially if we let automl tune it.

However, since the number of passes defaults to 1, I don't think the value of this parameter will take effect. @angela97lin Maybe we should also expose/tune the number of passes parameter?

@angela97lin
Copy link
Contributor Author

@freddyaboulton I had done really preliminary perf testing a while back here: https://alteryx.atlassian.net/wiki/spaces/PS/pages/1059620252/Adding+Vowpal+Wabbit+to+AutoML

I ran into a lot of weird behavior with the tuner and vowpal wabbit which will need to be resolved and will be pretty interesting to dig into once we want to add VW to AutoML, but you're totally right--there are so many knobs to turn with these estimators 😂

@angela97lin angela97lin merged commit c11809c into main Oct 25, 2021
@angela97lin angela97lin deleted the wabbit_integration branch October 25, 2021 05:59
@chukarsten chukarsten mentioned this pull request Oct 27, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add vowpal wabbit component
6 participants