topic-model-tutorial

This repository contains notebooks, slides, and data for the short tutorial "Topic modelling with Scikit-learn", presented at PyData Dublin in September 2017.

Text Preprocessing: Provides a basic introduction to preprocessing documents with scitkit-learn.
NMF Topic Models: Covers the application and interpretation of topic models via the NMF implementation provided by scitkit-learn.
Parameter Selection for NMF: More advanced material on selecting the number of topics for NMF, using topic coherence.

To demonstrate the topic modelling techniques, a sample dataset is provided here. This consists of 4,551 news articles collected from the Guardian News API in 2016, stored in a single text file (25MB), with one article per line.

Dependencies

This code has been tested with Python 3.6-3.8. The core package requirements are:

scikit-learn
numpy
matplotlib

The model selection code also relies on the gensim package to build a Word2Vec model (tested with v4.1.2). A sample pre-built Word2Vec model for the sample dataset is also provided here for download (71MB).

Links and References

Scikit-learn home
NMF documentation for scikit-learn
Lee, D. D., & Seung, H. S. (1999). Learning the parts of objects by non-negative matrix factorization. Nature. [Article]
Blei, D. M. (2012). Probabilistic topic models. Communications of the ACM, 55(4). [Article]
O’Callaghan, D., Greene, D., Carthy, J., & Cunningham, P. (2015). An analysis of the coherence of descriptors in topic modeling. Expert Systems with Applications. [PDF]

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
data		data
.gitignore		.gitignore
1 - Text Preprocessing.ipynb		1 - Text Preprocessing.ipynb
2 - NMF Topic Models.ipynb		2 - NMF Topic Models.ipynb
3 - Parameter Selection for NMF.ipynb		3 - Parameter Selection for NMF.ipynb
README.md		README.md
stopwords.txt		stopwords.txt
topic-modelling-with-scikitlearn.pdf		topic-modelling-with-scikitlearn.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

topic-model-tutorial

Contents

Dependencies

Links and References

About

Releases

Packages

Contributors 2

Languages

derekgreene/topic-model-tutorial

Folders and files

Latest commit

History

Repository files navigation

topic-model-tutorial

Contents

Dependencies

Links and References

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages