Skip to content

Update documentation / demos to use Woodwork#1466

Merged
angela97lin merged 43 commits intomainfrom
1291_ww_pipeline_components_docs
Dec 7, 2020
Merged

Update documentation / demos to use Woodwork#1466
angela97lin merged 43 commits intomainfrom
1291_ww_pipeline_components_docs

Conversation

@angela97lin
Copy link
Copy Markdown
Contributor

@angela97lin angela97lin commented Nov 24, 2020

Closes #1291

Note: more work on updating graphing utils to support WW will be done in #1292; this PR only tracks updating what is necessary for our docs.

@angela97lin angela97lin self-assigned this Nov 24, 2020
@angela97lin angela97lin marked this pull request as draft November 26, 2020 02:57
@angela97lin angela97lin added this to the December 2020 milestone Nov 26, 2020
@codecov
Copy link
Copy Markdown

codecov bot commented Nov 30, 2020

Codecov Report

Merging #1466 (56e25da) into main (ed909a0) will decrease coverage by 0.1%.
The diff coverage is 100.0%.

Impacted file tree graph

@@            Coverage Diff            @@
##             main    #1466     +/-   ##
=========================================
- Coverage   100.0%   100.0%   -0.0%     
=========================================
  Files         227      227             
  Lines       15483    15592    +109     
=========================================
+ Hits        15476    15584    +108     
- Misses          7        8      +1     
Impacted Files Coverage Δ
...tanding/prediction_explanations/_user_interface.py 100.0% <ø> (ø)
evalml/objectives/objective_base.py 100.0% <ø> (ø)
...tive_tests/test_binary_classification_objective.py 100.0% <ø> (ø)
evalml/demos/breast_cancer.py 100.0% <100.0%> (ø)
evalml/demos/churn.py 100.0% <100.0%> (ø)
evalml/demos/diabetes.py 100.0% <100.0%> (ø)
evalml/demos/fraud.py 100.0% <100.0%> (ø)
evalml/demos/wine.py 100.0% <100.0%> (ø)
evalml/model_understanding/graphs.py 99.7% <100.0%> (-0.3%) ⬇️
...nderstanding/prediction_explanations/explainers.py 100.0% <100.0%> (ø)
... and 12 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ed909a0...56e25da. Read the comment docs.

@angela97lin angela97lin changed the title Update pipeline and component documentation / demos to use Woodwork Update documentation / demos to use Woodwork Dec 3, 2020
"## Configure \"Cost of Fraud\" \n",
"\n",
"To optimize the pipelines toward the specific business needs of this model, you can set your own assumptions for the cost of fraud. These parameters are\n",
"To optimize the pipelines toward the specific business needs of this model, we can set our own assumptions for the cost of fraud. These parameters are\n",
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For consistency, updated second person --> first person



def load_breast_cancer():
def load_breast_cancer(return_pandas=False):
Copy link
Copy Markdown
Contributor Author

@angela97lin angela97lin Dec 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I decided to go with adding a parameter because it'd still be nice to give users that flexibility (and then by default, use Woodwork), but lmk any thoughts!

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@angela97lin sure that seems fine to me.

Could you please update the Returns: docstring? Same for the others. I googled and couldn't find a clear answer on what to say for Returns: but I did find this:

Returns:
    Union[ww.DataTable, pd.Dataframe], Union[ww.DataColumn, pd.Series]: X and y

Same comment for the other demo methods.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dsherry Thanks! I looked at the link and am going to update this to

Union[(ww.DataTable, ww.DataColumn), (pd.Dataframe, pd.Series)]: X and y

I think this makes a bit more sense because it's more clear that it either returns two Woodwork or two pandas, but lmk if you think otherwise!

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@angela97lin yep that looks good! Our sphinx API ref gen doesn't do any sort of parsing of return type currently, so I'm on board with whatever is communicative and more or less in-line with the spec :)

Copy link
Copy Markdown
Contributor

@dsherry dsherry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@angela97lin this is huge! Left some suggestions but pretty much ready to 🚢



def load_breast_cancer():
def load_breast_cancer(return_pandas=False):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@angela97lin sure that seems fine to me.

Could you please update the Returns: docstring? Same for the others. I googled and couldn't find a clear answer on what to say for Returns: but I did find this:

Returns:
    Union[ww.DataTable, pd.Dataframe], Union[ww.DataColumn, pd.Series]: X and y

Same comment for the other demo methods.

Comment thread evalml/objectives/objective_base.py
Comment thread evalml/preprocessing/utils.py
Comment thread evalml/tests/demo_tests/test_datasets.py
Comment thread docs/source/demos/fraud.ipynb
Comment thread docs/source/demos/fraud.ipynb
Comment thread docs/source/demos/text_input.ipynb
Comment thread docs/source/demos/text_input.ipynb
"## Why encode text this way?\n",
"\n",
"To demonstrate the importance of text-specific modeling, let's train a model with the same dataset, without letting `AutoMLSearch` detect the text column. We can change this by explicitly setting the data type of the `'Message'` column in Woodwork."
"To demonstrate the importance of text-specific modeling, let's train a model with the same dataset, without letting `AutoMLSearch` detect the text column. We can change this by explicitly setting the data type of the `'Message'` column in Woodwork to `Categorical`."
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit pick but should "categorical" be lowercase? Maybe case doesn't matter for woodwork type matching, idk

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I've always just used "Categorical" since that's what the actual name is but you're right, woodwork type matching doesn't matter 🤷

@angela97lin angela97lin merged commit b09ea39 into main Dec 7, 2020
@angela97lin angela97lin deleted the 1291_ww_pipeline_components_docs branch December 7, 2020 23:50
@angela97lin
Copy link
Copy Markdown
Contributor Author

(Verified docs locally since RtD is still broken)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Update pipeline/component documentation to use DataTables in all examples

2 participants