Skip to content

derekgreene/topic-model-tutorial

master
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Code

Latest commit

Files

Permalink
Failed to load latest commit information.

topic-model-tutorial

This repository contains notebooks, slides, and data for the short tutorial "Topic modelling with Scikit-learn", presented at PyData Dublin in September 2017.

Contents

The summary tutorial is covered in these slides. There are three associated IPython notebooks:

  1. Text Preprocessing: Provides a basic introduction to preprocessing documents with scitkit-learn.
  2. NMF Topic Models: Covers the application and interpretation of topic models via the NMF implementation provided by scitkit-learn.
  3. Parameter Selection for NMF: More advanced material on selecting the number of topics for NMF, using topic coherence.

To demonstrate the topic modelling techniques, a sample dataset is provided here. This consists of 4,551 news articles collected from the Guardian News API in 2016, stored in a single text file (25MB), with one article per line.

Dependencies

This code has been tested with Python 3.6-3.8. The core package requirements are:

  • scikit-learn
  • numpy
  • matplotlib

The model selection code also relies on the gensim package to build a Word2Vec model (tested with v4.1.2). A sample pre-built Word2Vec model for the sample dataset is also provided here for download (71MB).

Links and References

  • Scikit-learn home
  • NMF documentation for scikit-learn
  • Lee, D. D., & Seung, H. S. (1999). Learning the parts of objects by non-negative matrix factorization. Nature. [Article]
  • Blei, D. M. (2012). Probabilistic topic models. Communications of the ACM, 55(4). [Article]
  • O’Callaghan, D., Greene, D., Carthy, J., & Cunningham, P. (2015). An analysis of the coherence of descriptors in topic modeling. Expert Systems with Applications. [PDF]

About

Tutorial on topic models in Python with scikit-learn

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published