Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add custom component docs #993

Merged
merged 36 commits into from
Jul 31, 2020
Merged

Add custom component docs #993

merged 36 commits into from
Jul 31, 2020

Conversation

jeremyliweishih
Copy link
Collaborator

@jeremyliweishih jeremyliweishih commented Jul 29, 2020

Fixes #964.
View here

  • Add bit on hyperparameters
  • Add class reference component_graph to pipelines page and talk about custom components
  • Add info on automl about allowed_pipelines

@codecov
Copy link

codecov bot commented Jul 29, 2020

Codecov Report

Merging #993 into main will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##             main     #993   +/-   ##
=======================================
  Coverage   99.86%   99.86%           
=======================================
  Files         181      181           
  Lines        9584     9584           
=======================================
  Hits         9571     9571           
  Misses         13       13           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8a92cf5...a00a7ae. Read the comment docs.

@jeremyliweishih jeremyliweishih marked this pull request as ready for review July 29, 2020 21:43
@jeremyliweishih jeremyliweishih requested review from dsherry, freddyaboulton, angela97lin and eccabay and removed request for dsherry July 29, 2020 22:01
Copy link
Contributor

@angela97lin angela97lin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good! Left some feedback for rewording, mostly non-blocking and stylistic :d

docs/source/user_guide.ipynb Outdated Show resolved Hide resolved
docs/source/user_guide/automl.ipynb Outdated Show resolved Hide resolved
docs/source/user_guide/automl.ipynb Outdated Show resolved Hide resolved
docs/source/user_guide/components.ipynb Show resolved Hide resolved
docs/source/user_guide/components.ipynb Outdated Show resolved Hide resolved
"cell_type": "markdown",
"metadata": {},
"source": [
"However, if your component does not require a `component_obj`, you will need to override all methods you intend on using. This can be seen in [DropNullColumns](../generated/evalml.pipelines.components.DropNullColumns.ipynb)."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I feel like this could be emphasized more / the idea that you could provide component_obj to provide or not.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wrote it this way because I wanted to emphasize that EvalML's implementation works off of component_obj you need to override everything. I'll try to make it more clear in the text above.

docs/source/user_guide/components.ipynb Outdated Show resolved Hide resolved
docs/source/user_guide/components.ipynb Outdated Show resolved Hide resolved
docs/source/user_guide/components.ipynb Outdated Show resolved Hide resolved
"\n",
"#### Estimators\n",
"- target expected as int from 0 to n-1\n",
"- `predict_proba` column order must match the class integers and column names are set to integer values\n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, I'm not quiteeee sure what this is supposed to say 🤔

predict_proba column order for custom estimators must match the order of classes...?

What do you mean by column names are set to integer values :o

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@angela97lin I took these assumptions from here. I tried to follow the PR and from my understanding it was just that targets must be of type int from 0 to n-1, column order(?) from predict_proba must match the class integers (I guess predict_proba must return columns like [0, 1, 2]), and column names in predict_proba must be of type int as well. I'll reword what I have but let me know if my understanding is correct.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the link is broken :thonk: I'm guessing this was my str target PR but let me know and I'll take a look? :))

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry its this

"metadata": {},
"source": [
"### Implementation\n",
"EvalML components by default work off of a `component_object` that provides the implementation for methods such as `fit()`, `predict()`, and `transform()`. This can be seen in [ComponentBase](../generated/evalml.pipelines.components.ComponentBase.ipynb) where the base class calls the `fit()` method of the `component_obj`. This applies to both transformers and estimators. You can provide this `component_obj` in the `__init__()` method of your new component."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be confusing for newer users - it may be worth it to clarify here that these component_objects are third party classes that implement the necessary methods, since that's not implied here.

It might also be helpful to move the section about implementing a component without a component object up here, to make it clear that it's possible to implement a new component without a third party tool.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point on mentioning the fact that the component_obj is a third-party class, and that we provide that interface to make it easy to use third-party code.

I agree with @eccabay's suggestion to shuffle the section order a bit. I have a suggestion, will leave in another comment.

" pass\n",
"\n",
"class NewEstimator(Estimator):\n",
" pass"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest you delete this example. People won't want to define empty components. Also won't this error? If it doesn't today it should eventually haha

Copy link
Contributor

@dsherry dsherry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jeremyliweishih nice going on this! This is going to make it a lot easier for people to understand how to do custom definitions. This has been a big pain point recently. I expect this'll save us some bug reports. Also I love that you've added the API ref / user guide links!

I think all the content is here, and now its just a question of reorganizing and deleting for the highest impact. What do you think of this (I'm skipping a bunch of details which you already have writing for, but the top-level organization is what I'm really getting at):

Defining Custom Components

  • Custom Transformers
    • Transformer example, with init/fit/transform
    • Talk about required fields, name. Required methods, init fit transform. Mention fit_transform as an optional optimization.
  • Custom Estimators
    • Estimator example, with init/fit/predict
    • Same as estimator, except predict instead of transform
    • Mention classification, if estimator supports that it needs a predict_proba
  • Components Wrapping Third-party Objects
    • Example of a transformer or estimator which uses component_obj. Just need to define init, unless third-party naming doesn't match our expectations.
  • Hyperparameter Ranges for AutoML
    • Transformers and estimators can have hyperparameter ranges, not required by default. Example of how to set that for an estimator.
  • Expectations for Custom Classification Components
    • Target format, predict_proba for estimators

With the code examples, let's try to include fewer, but where the example could be pasted into a terminal and used without errors. Hopefully we can add code which raises an error when you try to define a component incorrectly, but until then we need our examples to be spotless.

docs/source/user_guide/components.ipynb Outdated Show resolved Hide resolved
docs/source/user_guide/components.ipynb Outdated Show resolved Hide resolved
docs/source/user_guide/components.ipynb Outdated Show resolved Hide resolved
docs/source/user_guide/components.ipynb Outdated Show resolved Hide resolved
docs/source/user_guide/components.ipynb Outdated Show resolved Hide resolved
"metadata": {},
"source": [
"### Implementation\n",
"EvalML components by default work off of a `component_object` that provides the implementation for methods such as `fit()`, `predict()`, and `transform()`. This can be seen in [ComponentBase](../generated/evalml.pipelines.components.ComponentBase.ipynb) where the base class calls the `fit()` method of the `component_obj`. This applies to both transformers and estimators. You can provide this `component_obj` in the `__init__()` method of your new component."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point on mentioning the fact that the component_obj is a third-party class, and that we provide that interface to make it easy to use third-party code.

I agree with @eccabay's suggestion to shuffle the section order a bit. I have a suggestion, will leave in another comment.

docs/source/user_guide/components.ipynb Outdated Show resolved Hide resolved
docs/source/user_guide/components.ipynb Outdated Show resolved Hide resolved
docs/source/user_guide/pipelines.ipynb Show resolved Hide resolved
docs/source/user_guide/automl.ipynb Outdated Show resolved Hide resolved
@jeremyliweishih
Copy link
Collaborator Author

@dsherry FYI latest ver hasn't build on RTD yet - we're hitting the concurrency limit with all the jobs 😄

Copy link
Contributor

@dsherry dsherry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jeremyliweishih great!!! Thanks for updating this. I'm glad we're getting this in!

"cell_type": "markdown",
"metadata": {},
"source": [
"### Components Wrapping Third-party Objects\n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Third-Party

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add a "Defining Custom Components" page to the user guide
4 participants