-
Notifications
You must be signed in to change notification settings - Fork 86
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add custom component docs #993
Conversation
Codecov Report
@@ Coverage Diff @@
## main #993 +/- ##
=======================================
Coverage 99.86% 99.86%
=======================================
Files 181 181
Lines 9584 9584
=======================================
Hits 9571 9571
Misses 13 13 Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good! Left some feedback for rewording, mostly non-blocking and stylistic :d
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"However, if your component does not require a `component_obj`, you will need to override all methods you intend on using. This can be seen in [DropNullColumns](../generated/evalml.pipelines.components.DropNullColumns.ipynb)." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I feel like this could be emphasized more / the idea that you could provide component_obj
to provide or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wrote it this way because I wanted to emphasize that EvalML's implementation works off of component_obj
you need to override everything. I'll try to make it more clear in the text above.
"\n", | ||
"#### Estimators\n", | ||
"- target expected as int from 0 to n-1\n", | ||
"- `predict_proba` column order must match the class integers and column names are set to integer values\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm, I'm not quiteeee sure what this is supposed to say 🤔
predict_proba
column order for custom estimators must match the order of classes...?
What do you mean by column names are set to integer values :o
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@angela97lin I took these assumptions from here. I tried to follow the PR and from my understanding it was just that targets must be of type int from 0 to n-1, column order(?) from predict_proba
must match the class integers (I guess predict_proba must return columns like [0, 1, 2]), and column names in predict_proba
must be of type int as well. I'll reword what I have but let me know if my understanding is correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the link is broken :thonk: I'm guessing this was my str target PR but let me know and I'll take a look? :))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry its this
"metadata": {}, | ||
"source": [ | ||
"### Implementation\n", | ||
"EvalML components by default work off of a `component_object` that provides the implementation for methods such as `fit()`, `predict()`, and `transform()`. This can be seen in [ComponentBase](../generated/evalml.pipelines.components.ComponentBase.ipynb) where the base class calls the `fit()` method of the `component_obj`. This applies to both transformers and estimators. You can provide this `component_obj` in the `__init__()` method of your new component." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be confusing for newer users - it may be worth it to clarify here that these component_object
s are third party classes that implement the necessary methods, since that's not implied here.
It might also be helpful to move the section about implementing a component without a component object up here, to make it clear that it's possible to implement a new component without a third party tool.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point on mentioning the fact that the component_obj is a third-party class, and that we provide that interface to make it easy to use third-party code.
I agree with @eccabay's suggestion to shuffle the section order a bit. I have a suggestion, will leave in another comment.
" pass\n", | ||
"\n", | ||
"class NewEstimator(Estimator):\n", | ||
" pass" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest you delete this example. People won't want to define empty components. Also won't this error? If it doesn't today it should eventually haha
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jeremyliweishih nice going on this! This is going to make it a lot easier for people to understand how to do custom definitions. This has been a big pain point recently. I expect this'll save us some bug reports. Also I love that you've added the API ref / user guide links!
I think all the content is here, and now its just a question of reorganizing and deleting for the highest impact. What do you think of this (I'm skipping a bunch of details which you already have writing for, but the top-level organization is what I'm really getting at):
Defining Custom Components
- Custom Transformers
- Transformer example, with init/fit/transform
- Talk about required fields,
name
. Required methods,init
fit
transform
. Mentionfit_transform
as an optional optimization.
- Custom Estimators
- Estimator example, with init/fit/predict
- Same as estimator, except
predict
instead oftransform
- Mention classification, if estimator supports that it needs a
predict_proba
- Components Wrapping Third-party Objects
- Example of a transformer or estimator which uses
component_obj
. Just need to define init, unless third-party naming doesn't match our expectations.
- Example of a transformer or estimator which uses
- Hyperparameter Ranges for AutoML
- Transformers and estimators can have hyperparameter ranges, not required by default. Example of how to set that for an estimator.
- Expectations for Custom Classification Components
- Target format,
predict_proba
for estimators
- Target format,
With the code examples, let's try to include fewer, but where the example could be pasted into a terminal and used without errors. Hopefully we can add code which raises an error when you try to define a component incorrectly, but until then we need our examples to be spotless.
"metadata": {}, | ||
"source": [ | ||
"### Implementation\n", | ||
"EvalML components by default work off of a `component_object` that provides the implementation for methods such as `fit()`, `predict()`, and `transform()`. This can be seen in [ComponentBase](../generated/evalml.pipelines.components.ComponentBase.ipynb) where the base class calls the `fit()` method of the `component_obj`. This applies to both transformers and estimators. You can provide this `component_obj` in the `__init__()` method of your new component." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point on mentioning the fact that the component_obj is a third-party class, and that we provide that interface to make it easy to use third-party code.
I agree with @eccabay's suggestion to shuffle the section order a bit. I have a suggestion, will leave in another comment.
@dsherry FYI latest ver hasn't build on RTD yet - we're hitting the concurrency limit with all the jobs 😄 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jeremyliweishih great!!! Thanks for updating this. I'm glad we're getting this in!
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"### Components Wrapping Third-party Objects\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Third-Party
Fixes #964.
View here