-
Notifications
You must be signed in to change notification settings - Fork 87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update and overhaul documentation #937
Conversation
Codecov Report
@@ Coverage Diff @@
## main #937 +/- ##
==========================================
+ Coverage 99.65% 99.86% +0.20%
==========================================
Files 170 170
Lines 8593 8593
==========================================
+ Hits 8563 8581 +18
+ Misses 30 12 -18
Continue to review full report at Codecov.
|
docs/source/install.ipynb
Outdated
@@ -9,7 +9,7 @@ | |||
"EvalML is available for Python 3.6+. It can be installed by running the following command:\n", | |||
"\n", | |||
"```bash\n", | |||
" pip install evaml --extra-index-url https://install.featurelabs.com/<license>/\n", | |||
"pip install evaml\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🎉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
side note: has our release documents/process been updated to reflect pushing to pip?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also should be evalml
!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not yet! Good point. I'd like to do that once we've done our first pip release. I'll file something for that now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, this is a more general comment not specifically related to this PR, but when we begin open sourcing the repo, are we considering adding a "Thanks to the following people for contributing to this release: ..." to the changelog similar to the featuretools changelog? |
@ctduffy that's a great idea! Yes, once we open up the repo to outside contributors, adding that sort of info to the release notes would be good. GitHub shows who's contributed and when so that information will already be available in another form at least. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looking awesome and I'm super excited for this! I left comments below that can be are either typo/grammatical errors or style comments. Feel free to ignore the style comments based on your own judgement.
There are a couple more issues:
- favicon: we need to update the favicon to the new evalml logo (or at least the old version still shows up for me). If we need to fix this would be great to include in this PR.
- inheritance graph: since we change the templating of our API reference, the inheritance graph looks a little short now. We should put up a new issue fixing this.
This is a lot of comments at once so let me know if you want to hop on a call to go through them!
docs/source/install.ipynb
Outdated
@@ -9,7 +9,7 @@ | |||
"EvalML is available for Python 3.6+. It can be installed by running the following command:\n", | |||
"\n", | |||
"```bash\n", | |||
" pip install evaml --extra-index-url https://install.featurelabs.com/<license>/\n", | |||
"pip install evaml\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also should be evalml
!
docs/source/user_guide/automl.ipynb
Outdated
"\n", | ||
"There are [a variety](https://en.wikipedia.org/wiki/Machine_learning#Approaches) of ML problem types. Supervised learning describes the case where the collected data contains an output quantity to be modeled and a set of inputs with which to train the model. EvalML focuses on training models for the supervised learning case.\n", | ||
"\n", | ||
"EvalML supports three common supervised ML problem types. The first is regression, where the target quantity to model is a continuous numeric value. Next are binary and multiclass classification, where the target quantity to model consists of two or more discrete values or categories. The choice of which supervised ML problem type is most appropriate depends on domain expertise and on how the model will be evaluated and used.\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although implied I think it might be nice to explain binary = 2 classes and multiclass = 2+.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if this is helpful but I'm more used to reading supervised ML problem types as defined between regression and classification (then classification being defined between multiclass and binary). I like how it says 3 problem types as it coincides with evalml problem types though. The former might be easier to understand for a ML newbie but I dont think this comment is important to consider if the target audience is ML engineers or data scientists etc..
"\n", | ||
"Estimator classes each define a `model_family` attribute indicating what type of model is used.\n", | ||
"\n", | ||
"Here's an example of using the [`LogisticRegressionClassifier`](../generated/evalml.pipelines.components.LogisticRegressionClassifier.ipynb) estimator to fit and predict on a simple dataset:" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This link is broken on the actual website.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. I think I fixed it. Needed to be ref to rst I think?
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"## Get Pipeline\n", | ||
"We can get the object of any pipeline via their `id` as well:" | ||
"## Feature Importance\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can add how we calculate such importances (maybe in a new issue).
@@ -137,7 +103,7 @@ | |||
"source": [ | |||
"## Precision-Recall Curve\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This particular dataset makes the curves not very useful. We could put up an issue to use another dataset for the below curves.
a page on how to get help would be good to add. something like: https://docs.featuretools.com/en/stable/help.html. not blocking though |
9e16318
to
8e6a427
Compare
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"First, we load in the features and outcomes we want to use to train our model." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it might be helpful here to describe what form this might take-like specify if this should be two dataframes, two arrays or something else. I think this is a good addition, as it might not be specifically mentioned elsewhere in the documentation as X and y together.
Also might be useful to change the wording from outcomes to target? Not sure exactly, but I feel like target is more generally used to refer to the y value in ML?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dsherry The new theme looks great! These changes look good to me but the links to classes, such as LogisticRegressionClassifier
are still broken for me (even after you changed the link to point to the .rst
file). I think it'd be nice to fix that before merging.
I also left some comments that can be addressed in other issues. One thing I have in mind is making sure all classes wrapped in "``" should link to their source. My instinct is to expect a link and go to the source when a new class or concept is introduced.
"[What is EvalML](self)\n", | ||
"\n", | ||
"[Installation](install)" | ||
"[Start](start)" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not blocking: I prefer the old "Getting Started" title
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. Let's circle back on this. I liked having both install and start as top-level items, but could be persuaded otherwise
"\n", | ||
"Estimator classes each define a `model_family` attribute indicating what type of model is used.\n", | ||
"\n", | ||
"Here's an example of using the [`LogisticRegressionClassifier`](../generated/evalml.pipelines.components.LogisticRegressionClassifier.rst) estimator to fit and predict on a simple dataset:" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This link is still broken for me. Also, we should make all the classes we put in "``" links? Might be a little confusing that some class names are broken and some aren't.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤦 dang, IDK why its broken. Will try one more push but probably going to punt
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe needs to be html
@@ -20,7 +20,7 @@ | |||
"source": [ | |||
"## Missing Data\n", | |||
"\n", | |||
"Missing data or rows with `NaN` values provide many challenges for machine learning pipelines. In the worst case, many algorithms simply will not run with missing data! EvalML pipelines contain imputation [components](../pipelines/components.ipynb) to ensure that doesn't happen. Imputation works by approximating missing values with existing values. However, if a column contains a high number of missing values, a large percentage of the column would be approximated by a small percentage. This could potentially create a column without useful information for machine learning pipelines. By using the `HighlyNullDataCheck()` data check, EvalML will alert you to this potential problem by returning the columns that pass the missing values threshold." | |||
"Missing data or rows with `NaN` values provide many challenges for machine learning pipelines. In the worst case, many algorithms simply will not run with missing data! EvalML pipelines contain imputation [components](../user_guide/components.ipynb) to ensure that doesn't happen. Imputation works by approximating missing values with existing values. However, if a column contains a high number of missing values, a large percentage of the column would be approximated by a small percentage. This could potentially create a column without useful information for machine learning pipelines. By using the `HighlyNullDataCheck()` data check, EvalML will alert you to this potential problem by returning the columns that pass the missing values threshold." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not-blocking: Make HighlyNullDataCheck
a link to the source?
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"## Feature Importance\n", | ||
"## Permutation Importance\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not-blocking: I think it'd be nice to add a few sentences interpreting the plots for users who are not familiar with these techniques.
@@ -1,7 +1,7 @@ | |||
.. _changelog: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's change the file name and page name to release notes, so the url updates. currently https://evalml.featurelabs.com/en/ds_update_theme_pandas/changelog.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚢 lets goo
docs/source/install.ipynb
Outdated
@@ -23,7 +23,7 @@ | |||
"EvalML includes several dependencies in `requirements.txt` by default: `xgboost` and `catboost` support pipelines built around those modeling libraries, and `plotly` and `ipywidgets` support plotting functionality in automl searches. These dependencies are recommended but are not required in order to install and use EvalML. To install these additional dependencies run `pip install -r requirements.txt`.\n", | |||
"\n", | |||
"### Core Dependencies\n", | |||
"If you wish to install EvalML with only the core required dependencies, include `--no-dependencies` in your EvalML pip install command, and then install all core dependencies with `pip install -r core-requirements.txt`." | |||
"If you wish to install EvalML with only the core required dependencies, include `--no-dependencies` in your EvalML pip install command, and then install all core dependencies with `pip install -r core-requirements.txt`." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not necessarily related to your change, but I'm curious as to how a user actually installs evalml. Do they not just call pip install evalml
and call it a day? Just thinking that with other pip packages, I don't call the pip install -r core-requirements.txt
or similar commands.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@angela97lin and I just chatted about this. And I made some improvements to the install page as a result. Long-term, its unclear if we need to document our core-requirements.txt
vs requirements.txt
on a user-facing install page. Maybe we just keep that context for developers / contributors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is some really awesome stuff!! Thank you for doing this and adding in so much more documentation 👏 All my comments are minor / not blocking :)
…ormat changed. Add svg logo in top left.
a4564e6
to
66efc5b
Compare
Will merge when tests are green! |
Fix #861, also: fix #713 fix #630
View the docs changes here.
Changes:
What's missing: