Lynn Cherny, @arnicas / arnicas@gmail
Intro to some NLP concepts and libraries in Python for a class at CMU, Feb 2015.
Lots of libraries are required - see here for install info.
Notebook viewer links:
0. Reading in Files: How I made the data files, mostly Gutenberg operations. Add your own URLs!
1. Tokenizing, Stemming, POS - the very basics. POS is "parts of speech" not "piece of %#@t".
2. Wordclouds - entirely optional, but shows off interactive widgets to live-filter stopwords for visual effect
3. TF-IDF, Clustering, Pattern - getting to the meat! Hierarchical clustering here too.
Bonus: Doing some TF-IDF NLP in node's package "natural" (but caveats apply): see here
4. Naive Bayes Classification - the infamous 50 Shades Sex Scene Detection because spam is boring
5. Naive Bayes in Scikit-Learn - very quick intro to the main ML package in Python, for comparison purposes; same sex scene data.
There are some links to libraries and books in the [Intro NLP Links.md](Intro NLP Links.md)