Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Third Party Estimator Wrapper #1103

Merged
merged 4 commits into from Oct 5, 2020
Merged

Third Party Estimator Wrapper #1103

merged 4 commits into from Oct 5, 2020

Conversation

bbengfort
Copy link
Member

@bbengfort bbengfort commented Oct 3, 2020

Adds a wrapper for estimators that implement the scikit-learn API but do not extend BaseEstimator. If the estimator is missing required properties (generally the learned attributes) then a sensible error is raised. Includes documentation about how to use non-sklearn estimators with Yellowbrick.

Fixes #1098
Fixes #1099
Fixes #397
Related to #1066

I have made the following changes:

  1. Added a ContribEstimator wrapper and helper functions to wrap cuML, catboost, neuraxle etc estimators
  2. Wrapper allows access to estimator functions, raising sensible error if required attr is not available
  3. Wrapper allows model type checking e.g. is_classifier
  4. Adapted our is_estimator and get_model_name helpers to use the ContribEstimator
  5. Removed recursive imports from yellowbrick/contrib/__init__.py
  6. Wrote tests for wrapper library
  7. Tested wrapper with cuML and catboost
  8. Wrote documentation about how to use third party estimators with Yellowbrick

Still to do:

  • Official tests with cuML and catboost
  • Finish documentation

CHECKLIST

  • Is the commit message formatted correctly?
  • Have you noted the new functionality/bugfix in the release notes of the next release?
  • Included a sample plot to visually illustrate your changes?
  • Do all of your functions and methods have docstrings?
  • Have you added/updated unit tests where appropriate?
  • Have you updated the baseline images if necessary?
  • Have you run the unit tests using pytest?
  • Is your code style correct (are you using PEP8, pyflakes)?
  • Have you documented your new feature/functionality in the docs?
  • Have you built the docs using make html?

Adds a wrapper for estimators that implement the scikit-learn API but do
not extend BaseEstimator. If the estimator is missing required
properties (generally the learned attributes) then a sensible error is
raised. Includes documentation about how to use non-sklearn estimators
with Yellowbrick.

Fixes #1098
Fixes #1099
Related to #397
Related to #1066
@bbengfort bbengfort changed the title [WIP] Third Party Estimator Wrapper Third Party Estimator Wrapper Oct 3, 2020
Copy link
Member

@rebeccabilbro rebeccabilbro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great @bbengfort; thank you for taking this on (especially given it's been on our wishlist for >2 years now!) I've noted just a handful of minor comments inline.

One other thought — I think that this PR should close #397 as well as #1098 and #1099; if there are specific libraries that folks want to add support for, I propose opening those as separate issues (e.g. #1066), given the potential for variations in external API behaviors, which may be easier to tackle individually (that odd catboost thing you noticed, for instance).

Once we've merged this, I might do a small follow-on PR to update our FAQ page, since this would be a great new addition!

docs/api/contrib/wrapper.rst Outdated Show resolved Hide resolved
docs/api/contrib/wrapper.rst Outdated Show resolved Hide resolved
tests/test_contrib/test_wrapper.py Show resolved Hide resolved
tests/test_contrib/test_wrapper.py Show resolved Hide resolved
yellowbrick/contrib/wrapper.py Show resolved Hide resolved
Comment on lines +141 to +146
if isinstance(model, Pipeline):
return get_model_name(model.steps[-1][-1])
elif isinstance(model, ContribEstimator):
return model.estimator.__class__.__name__
else:
if isinstance(model, Pipeline):
return get_model_name(model.steps[-1][-1])
else:
return model.__class__.__name__
return model.__class__.__name__
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we think this change needs a new test?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Model and Pipeline are tested in tests/test_utils/test_helpers.py lines 51-86 and ContribEstimator in tests/test_contrib/test_wrapper.py line 96; so the function is completely covered - but would you prefer that all of these tests were in one place?

@bbengfort
Copy link
Member Author

@rebeccabilbro adding this to the FAQ would be great, thanks! Closing #397 works for me- I'll add a couple of issues for libraries that I think we should cover; possibly as spike issues so that interested folks can take them on one by one. Let me know about the tests and I can merge it in; or feel free to merge when you're ready!

@bbengfort bbengfort mentioned this pull request Oct 5, 2020
3 tasks
@rebeccabilbro rebeccabilbro merged commit 1329281 into DistrictDataLabs:develop Oct 5, 2020
@bbengfort bbengfort mentioned this pull request Oct 5, 2020
3 tasks
@bbengfort bbengfort deleted the contrib-wrapper branch October 5, 2020 19:22
@bbengfort
Copy link
Member Author

@rebeccabilbro Created #1107 #1106 and #1105 in response to your suggestion about closing #397

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Not supporting model of Catboost Add Support for Other ML Libraries
2 participants