An example topic model for debates from the 18th German Bundestag
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.

A topic model for the debates of the 18th German Bundestag as a showcase example

Markus Konrad, May 2018

Important note: If you want to git clone this project, you need to install git lfs first.


For a workshop on practical topic modeling, I created this topic model as a showcase example that demostrates the steps that are necessary to take in order to arrive at a usable, informative model:

  1. Preprocessing the raw data (
  2. Generating the document-term-matrix from the data (
  3. Evaluating topic models for a set of hyperparameters ( and
  4. Generating the final model using the best combination of hyperparameters (
  5. Visualizing, interpreting and analysing the model (report1.ipynb, report2.ipynb and – note that this was not the focus of the workshop and hence only exemplary analyses are given

Used software packages

This example uses Python 2.7 because of some dependency issues (namely the pattern package for better lemmatization of German texts does not support Python 3).

These are the main software packages in use:

  • tmtoolkit for evaluating models in parallel, calculating some model statistics and visualizations
  • lda for topic modeling with LDA using Gibbs sampling
  • PyLDAVis and Jupyter Notebooks for interactive visualizations

All software dependencies can be installed via pip install -r requirements.txt.


The data for the debates comes from


Licensed under Apache License 2.0 (except for the data from which have their own license). See LICENSE file.