-
Notifications
You must be signed in to change notification settings - Fork 0
Next Steps: Notes from 2018 10 26
dvfeinblum edited this page Oct 29, 2018
·
1 revision
- entity recognition
- nltk that people actually use
- underlying stuff is also more interesting
- classification stuff
- sentiment recognition
Bag of words: a vectorized representation of whatever you're analyzing Hot encoding Dimensionality Reduction: Look at one sentence, apply statistics to look for features that actually matter, then scale up (principal component analysis) In lieu of bag of words, word2vec "I wanna train a simple classifier"
Google dataset search
Possible workflow
- Tokenize
- Reduce dimension
- Take that reduced
To actually detect my writing v. not:
- Crawl other blogs. Like, a lot.
- Grab a bunch of your sentence, grab other sentences, go forth and sample
- The model would be, give me a word vector, spit out yes/no (dave/not dave)