Intro to some NLP concepts in Python for a class
Shell Other
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.
0. Reading Files.ipynb
1. Tokenizing, Stemming, POS.ipynb
2. WordClouds.ipynb
3. TF-IDF, Clustering, Pattern.ipynb
4. Naive Bayes Classification.ipynb
5. Naive Bayes in Scikit-Learn.ipynb
Intro NLP


Lynn Cherny, @arnicas / arnicas@gmail

Intro to some NLP concepts and libraries in Python for a class at CMU, Feb 2015.

Lots of libraries are required - see here for install info.

Notebook viewer links:

0. Reading in Files: How I made the data files, mostly Gutenberg operations. Add your own URLs!

1. Tokenizing, Stemming, POS - the very basics. POS is "parts of speech" not "piece of %#@t".

2. Wordclouds - entirely optional, but shows off interactive widgets to live-filter stopwords for visual effect

3. TF-IDF, Clustering, Pattern - getting to the meat! Hierarchical clustering here too.

Bonus: Doing some TF-IDF NLP in node's package "natural" (but caveats apply): see here

4. Naive Bayes Classification - the infamous 50 Shades Sex Scene Detection because spam is boring

5. Naive Bayes in Scikit-Learn - very quick intro to the main ML package in Python, for comparison purposes; same sex scene data.

There are some links to libraries and books in the [Intro NLP](Intro NLP