Skip to content

Commit

Permalink
docs: fix simple typo, puncutation -> punctuation
Browse files Browse the repository at this point in the history
There is a small typo in docs/source/formatting_data.rst.

Should read `punctuation` rather than `puncutation`.
  • Loading branch information
timgates42 committed Dec 29, 2020
1 parent 4b7e0e7 commit 748bdbd
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion docs/source/formatting_data.rst
Expand Up @@ -24,7 +24,7 @@ NOTE: We assume each column is a numerical column, unless you specify otherwise

#. ``attribute_name: 'output'`` The ``column_descriptions`` dictionary must specify one of your attributes as the output column. This is what the ``auto_ml`` predictor will try to predict. Importantly, the data you pass into ``.train()`` should have the correct values for this column, so we can teach the algorithms what is right and what is wrong.
#. ``attribute_name: 'categorical'`` All attribute names that hold a string in any of the rows after the header row will be encoded as categorical data. If, however, you have any numerical columns that you want encoded as categorical data, you can specify that here.
#. ``attribute_name: 'nlp'`` If any of your data is a text field that you'd like to run some Natural Language Processing on, specify that in the header row. Data stored in this attribute will be encoded using TF-IDF, along with some other feature engineering (count of some aggregations like total capital letters, puncutation characters, smiley faces, etc., as well as a sentiment prediction of that text).
#. ``attribute_name: 'nlp'`` If any of your data is a text field that you'd like to run some Natural Language Processing on, specify that in the header row. Data stored in this attribute will be encoded using TF-IDF, along with some other feature engineering (count of some aggregations like total capital letters, punctuation characters, smiley faces, etc., as well as a sentiment prediction of that text).
#. ``attribute_name: 'ignore'`` This column of data will be ignored.
#. ``attribute_name: 'date'`` Since ML algorithms don't know how to handle a Python datetime object, we will perform feature engineering on this object, creating new features like day_of_week, or minutes_into_day, etc. Then the original date field will be removed from the training data so the algorithsm don't throw a TypeError.

Expand Down

0 comments on commit 748bdbd

Please sign in to comment.