You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
The meta-purpose of this project is to learn some NLP. Word2Vec is a really nice low-bar-of-entry way of doing that, and vectors for sentences would be a nice place to start.
Describe the solution you'd like
Currently, the blog parser sanitizes posts by removing punctuation and then NLTKing the words in the post. We should do something similar but, instead of splitting on spaces, we should split on periods.
Describe alternatives you've considered
N/A
Additional context
N/A
The text was updated successfully, but these errors were encountered:
Oooh; also, one nice thing about the word tokenizer I already wrote is that we can throw out words that don't mean much. Glancing at the word_details table, we can probably toss words with the following part_of_speech:
Is your feature request related to a problem? Please describe.
The meta-purpose of this project is to learn some NLP. Word2Vec is a really nice low-bar-of-entry way of doing that, and vectors for sentences would be a nice place to start.
Describe the solution you'd like
Currently, the blog parser sanitizes posts by removing punctuation and then NLTKing the words in the post. We should do something similar but, instead of splitting on spaces, we should split on periods.
Describe alternatives you've considered
N/A
Additional context
N/A
The text was updated successfully, but these errors were encountered: