-
Notifications
You must be signed in to change notification settings - Fork 87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update documentation / demos to use Woodwork #1466
Conversation
…/evalml into 1291_ww_pipeline_components_docs
Codecov Report
@@ Coverage Diff @@
## main #1466 +/- ##
=========================================
- Coverage 100.0% 100.0% -0.0%
=========================================
Files 227 227
Lines 15483 15592 +109
=========================================
+ Hits 15476 15584 +108
- Misses 7 8 +1
Continue to review full report at Codecov.
|
…/evalml into 1291_ww_pipeline_components_docs
.shape[0]
@@ -26,7 +26,7 @@ | |||
"source": [ | |||
"## Configure \"Cost of Fraud\" \n", | |||
"\n", | |||
"To optimize the pipelines toward the specific business needs of this model, you can set your own assumptions for the cost of fraud. These parameters are\n", | |||
"To optimize the pipelines toward the specific business needs of this model, we can set our own assumptions for the cost of fraud. These parameters are\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For consistency, updated second person --> first person
from sklearn.datasets import load_breast_cancer as load_breast_cancer_sk | ||
|
||
|
||
def load_breast_cancer(): | ||
def load_breast_cancer(return_pandas=False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I decided to go with adding a parameter because it'd still be nice to give users that flexibility (and then by default, use Woodwork), but lmk any thoughts!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@angela97lin sure that seems fine to me.
Could you please update the Returns:
docstring? Same for the others. I googled and couldn't find a clear answer on what to say for Returns:
but I did find this:
Returns:
Union[ww.DataTable, pd.Dataframe], Union[ww.DataColumn, pd.Series]: X and y
Same comment for the other demo methods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dsherry Thanks! I looked at the link and am going to update this to
Union[(ww.DataTable, ww.DataColumn), (pd.Dataframe, pd.Series)]: X and y
I think this makes a bit more sense because it's more clear that it either returns two Woodwork or two pandas, but lmk if you think otherwise!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@angela97lin yep that looks good! Our sphinx API ref gen doesn't do any sort of parsing of return type currently, so I'm on board with whatever is communicative and more or less in-line with the spec :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@angela97lin this is huge! Left some suggestions but pretty much ready to 🚢
from sklearn.datasets import load_breast_cancer as load_breast_cancer_sk | ||
|
||
|
||
def load_breast_cancer(): | ||
def load_breast_cancer(return_pandas=False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@angela97lin sure that seems fine to me.
Could you please update the Returns:
docstring? Same for the others. I googled and couldn't find a clear answer on what to say for Returns:
but I did find this:
Returns:
Union[ww.DataTable, pd.Dataframe], Union[ww.DataColumn, pd.Series]: X and y
Same comment for the other demo methods.
@@ -240,7 +239,7 @@ | |||
"source": [ | |||
"## Why encode text this way?\n", | |||
"\n", | |||
"To demonstrate the importance of text-specific modeling, let's train a model with the same dataset, without letting `AutoMLSearch` detect the text column. We can change this by explicitly setting the data type of the `'Message'` column in Woodwork." | |||
"To demonstrate the importance of text-specific modeling, let's train a model with the same dataset, without letting `AutoMLSearch` detect the text column. We can change this by explicitly setting the data type of the `'Message'` column in Woodwork to `Categorical`." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit pick but should "categorical" be lowercase? Maybe case doesn't matter for woodwork type matching, idk
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I've always just used "Categorical" since that's what the actual name is but you're right, woodwork type matching doesn't matter 🤷
…/evalml into 1291_ww_pipeline_components_docs
(Verified docs locally since RtD is still broken) |
Closes #1291
Note: more work on updating graphing utils to support WW will be done in #1292; this PR only tracks updating what is necessary for our docs.