Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve TextFeaturizer Documentation #2568

Merged
merged 8 commits into from Jul 29, 2021
Merged

Conversation

eccabay
Copy link
Contributor

@eccabay eccabay commented Jul 28, 2021

Closes #2474

  • Adds an extra paragraph to the TextFeaturizer docstring
  • Adds an extra section to the "Using Text Data with EvalML" tutorial

@codecov
Copy link

codecov bot commented Jul 28, 2021

Codecov Report

Merging #2568 (382d1a7) into main (aae7b55) will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@          Coverage Diff          @@
##            main   #2568   +/-   ##
=====================================
  Coverage   99.9%   99.9%           
=====================================
  Files        287     287           
  Lines      26338   26338           
=====================================
  Hits       26304   26304           
  Misses        34      34           
Impacted Files Coverage Δ
...ents/transformers/preprocessing/text_featurizer.py 100.0% <ø> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update aae7b55...382d1a7. Read the comment docs.

@eccabay eccabay marked this pull request as ready for review July 29, 2021 13:41
@eccabay eccabay requested a review from a team July 29, 2021 13:42
Copy link
Contributor

@jeremyliweishih jeremyliweishih left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Just some suggestions but not blocking.

docs/source/demos/text_input.ipynb Outdated Show resolved Hide resolved
Copy link
Contributor

@chukarsten chukarsten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! One small nit, take it or leave it.

docs/source/demos/text_input.ipynb Outdated Show resolved Hide resolved
Copy link
Contributor

@freddyaboulton freddyaboulton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me @eccabay !

Copy link
Contributor

@angela97lin angela97lin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, coming back full circle to the work you previously added, how awesome! Just left some nit-picky comments but LGTM, I love these additions to our docs!

"cell_type": "markdown",
"metadata": {},
"source": [
"Here, the Text Featurization component takes in a single \"Message\" column, but then the next component in the pipeline, the Imputer, recieves five columns of input, the result of featurizing the text-type \"Message\" column. Most importantly, these featurized columns are what ends up passed in to the estimator.\n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

recieves --> receives

Just a suggestion but maybe split this up into something like:

Here, the Text Featurization component takes in a single "Message" column, but then the next component in the pipeline, the Imputer, receives five columns of input. These five columns are the result of featurizing the text-type "Message" column.

Comment on lines 248 to 252
"**Diversity Score** is the ratio of unique words to total words\n",
"\n",
"**Mean Characters per Word** is the average number of letters in each word\n",
"\n",
"**Polarity Score** is a prediction of how \"polarized\" the text is, on a scale from -1 (extremely negative) to 1 (extremely positive)\n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Omega nitpick: Let's add periods to the end of these!

@eccabay eccabay merged commit 9e8a42d into main Jul 29, 2021
@chukarsten chukarsten mentioned this pull request Aug 3, 2021
@eccabay eccabay deleted the 2474_improve_text_documentation branch March 10, 2022 15:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add more information about our TextFeaturization component in our docs
5 participants