Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update documentation / demos to use Woodwork #1466

Merged
merged 43 commits into from
Dec 7, 2020

Conversation

angela97lin
Copy link
Contributor

@angela97lin angela97lin commented Nov 24, 2020

Closes #1291

Note: more work on updating graphing utils to support WW will be done in #1292; this PR only tracks updating what is necessary for our docs.

@angela97lin angela97lin self-assigned this Nov 24, 2020
@angela97lin angela97lin marked this pull request as draft November 26, 2020 02:57
@angela97lin angela97lin added this to the December 2020 milestone Nov 26, 2020
@codecov
Copy link

codecov bot commented Nov 30, 2020

Codecov Report

Merging #1466 (56e25da) into main (ed909a0) will decrease coverage by 0.1%.
The diff coverage is 100.0%.

Impacted file tree graph

@@            Coverage Diff            @@
##             main    #1466     +/-   ##
=========================================
- Coverage   100.0%   100.0%   -0.0%     
=========================================
  Files         227      227             
  Lines       15483    15592    +109     
=========================================
+ Hits        15476    15584    +108     
- Misses          7        8      +1     
Impacted Files Coverage Δ
...tanding/prediction_explanations/_user_interface.py 100.0% <ø> (ø)
evalml/objectives/objective_base.py 100.0% <ø> (ø)
...tive_tests/test_binary_classification_objective.py 100.0% <ø> (ø)
evalml/demos/breast_cancer.py 100.0% <100.0%> (ø)
evalml/demos/churn.py 100.0% <100.0%> (ø)
evalml/demos/diabetes.py 100.0% <100.0%> (ø)
evalml/demos/fraud.py 100.0% <100.0%> (ø)
evalml/demos/wine.py 100.0% <100.0%> (ø)
evalml/model_understanding/graphs.py 99.7% <100.0%> (-0.3%) ⬇️
...nderstanding/prediction_explanations/explainers.py 100.0% <100.0%> (ø)
... and 12 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ed909a0...56e25da. Read the comment docs.

@angela97lin angela97lin changed the title Update pipeline and component documentation / demos to use Woodwork Update documentation / demos to use Woodwork Dec 3, 2020
@@ -26,7 +26,7 @@
"source": [
"## Configure \"Cost of Fraud\" \n",
"\n",
"To optimize the pipelines toward the specific business needs of this model, you can set your own assumptions for the cost of fraud. These parameters are\n",
"To optimize the pipelines toward the specific business needs of this model, we can set our own assumptions for the cost of fraud. These parameters are\n",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For consistency, updated second person --> first person

from sklearn.datasets import load_breast_cancer as load_breast_cancer_sk


def load_breast_cancer():
def load_breast_cancer(return_pandas=False):
Copy link
Contributor Author

@angela97lin angela97lin Dec 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I decided to go with adding a parameter because it'd still be nice to give users that flexibility (and then by default, use Woodwork), but lmk any thoughts!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@angela97lin sure that seems fine to me.

Could you please update the Returns: docstring? Same for the others. I googled and couldn't find a clear answer on what to say for Returns: but I did find this:

Returns:
    Union[ww.DataTable, pd.Dataframe], Union[ww.DataColumn, pd.Series]: X and y

Same comment for the other demo methods.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dsherry Thanks! I looked at the link and am going to update this to

Union[(ww.DataTable, ww.DataColumn), (pd.Dataframe, pd.Series)]: X and y

I think this makes a bit more sense because it's more clear that it either returns two Woodwork or two pandas, but lmk if you think otherwise!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@angela97lin yep that looks good! Our sphinx API ref gen doesn't do any sort of parsing of return type currently, so I'm on board with whatever is communicative and more or less in-line with the spec :)

Copy link
Contributor

@dsherry dsherry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@angela97lin this is huge! Left some suggestions but pretty much ready to 🚢

from sklearn.datasets import load_breast_cancer as load_breast_cancer_sk


def load_breast_cancer():
def load_breast_cancer(return_pandas=False):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@angela97lin sure that seems fine to me.

Could you please update the Returns: docstring? Same for the others. I googled and couldn't find a clear answer on what to say for Returns: but I did find this:

Returns:
    Union[ww.DataTable, pd.Dataframe], Union[ww.DataColumn, pd.Series]: X and y

Same comment for the other demo methods.

evalml/objectives/objective_base.py Show resolved Hide resolved
evalml/preprocessing/utils.py Show resolved Hide resolved
evalml/tests/demo_tests/test_datasets.py Show resolved Hide resolved
docs/source/demos/fraud.ipynb Show resolved Hide resolved
docs/source/demos/fraud.ipynb Show resolved Hide resolved
docs/source/demos/text_input.ipynb Show resolved Hide resolved
docs/source/demos/text_input.ipynb Show resolved Hide resolved
@@ -240,7 +239,7 @@
"source": [
"## Why encode text this way?\n",
"\n",
"To demonstrate the importance of text-specific modeling, let's train a model with the same dataset, without letting `AutoMLSearch` detect the text column. We can change this by explicitly setting the data type of the `'Message'` column in Woodwork."
"To demonstrate the importance of text-specific modeling, let's train a model with the same dataset, without letting `AutoMLSearch` detect the text column. We can change this by explicitly setting the data type of the `'Message'` column in Woodwork to `Categorical`."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit pick but should "categorical" be lowercase? Maybe case doesn't matter for woodwork type matching, idk

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I've always just used "Categorical" since that's what the actual name is but you're right, woodwork type matching doesn't matter 🤷

@angela97lin angela97lin merged commit b09ea39 into main Dec 7, 2020
@angela97lin angela97lin deleted the 1291_ww_pipeline_components_docs branch December 7, 2020 23:50
@angela97lin
Copy link
Contributor Author

(Verified docs locally since RtD is still broken)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Update pipeline/component documentation to use DataTables in all examples
2 participants