New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve TextFeaturizer Documentation #2568
Conversation
Codecov Report
@@ Coverage Diff @@
## main #2568 +/- ##
=====================================
Coverage 99.9% 99.9%
=====================================
Files 287 287
Lines 26338 26338
=====================================
Hits 26304 26304
Misses 34 34
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Just some suggestions but not blocking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! One small nit, take it or leave it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me @eccabay !
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow, coming back full circle to the work you previously added, how awesome! Just left some nit-picky comments but LGTM, I love these additions to our docs!
docs/source/demos/text_input.ipynb
Outdated
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"Here, the Text Featurization component takes in a single \"Message\" column, but then the next component in the pipeline, the Imputer, recieves five columns of input, the result of featurizing the text-type \"Message\" column. Most importantly, these featurized columns are what ends up passed in to the estimator.\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
recieves --> receives
Just a suggestion but maybe split this up into something like:
Here, the Text Featurization component takes in a single "Message" column, but then the next component in the pipeline, the Imputer, receives five columns of input. These five columns are the result of featurizing the text-type "Message" column.
docs/source/demos/text_input.ipynb
Outdated
"**Diversity Score** is the ratio of unique words to total words\n", | ||
"\n", | ||
"**Mean Characters per Word** is the average number of letters in each word\n", | ||
"\n", | ||
"**Polarity Score** is a prediction of how \"polarized\" the text is, on a scale from -1 (extremely negative) to 1 (extremely positive)\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Omega nitpick: Let's add periods to the end of these!
Closes #2474
TextFeaturizer
docstring