Next Steps: Notes from 2018 10 26

spaCy.io

entity recognition
nltk that people actually use
underlying stuff is also more interesting
classification stuff
sentiment recognition

Misc

Bag of words: a vectorized representation of whatever you're analyzing Hot encoding Dimensionality Reduction: Look at one sentence, apply statistics to look for features that actually matter, then scale up (principal component analysis) In lieu of bag of words, word2vec "I wanna train a simple classifier"

Google dataset search

Possible workflow

Tokenize
Reduce dimension
Take that reduced

To actually detect my writing v. not:

Crawl other blogs. Like, a lot.
Grab a bunch of your sentence, grab other sentences, go forth and sample
The model would be, give me a word vector, spit out yes/no (dave/not dave)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Next Steps: Notes from 2018 10 26

spaCy.io

Misc

Clone this wiki locally