Topics is a Python library for Topic Modeling. Furthermore, this repository provides a convenient, modular workflow that can be entirely controlled from within a well documented Jupyter notebook. Users not yet familiar with programming in Python can test basic Topic Modeling in a Flask-based GUI demonstrator. For a standalone application, which does not require a Python interpreter or any extra installations, have a look at the release-section.
At the moment, this library supports three LDA implementations:
- lda, which is lightweight and provides basic LDA.
- MALLET, which is known to be very robust.
- Gensim, which is attractive because of its multi-core support.
- Topics website
- Topics API documentation
- Topics paper
- Standalone Demonstrator releases
- An introduction to Topic Modeling using lda
- An introduction to Topic Modeling using MALLET
- An introduction to Topic Modeling using Gensim
To install the latest stable version of the library dariah_topics
:
$ pip install git+https://github.com/DARIAH-DE/Topics.git
To install the latest development version:
$ pip install --upgrade git+https://github.com/DARIAH-DE/Topics.git@testing
If you wish to work through the tutorials, you can clone the repository using Git:
$ git clone https://github.com/DARIAH-DE/Topics.git
or download the ZIP-archive (don't forget to unzip it) and install dariah_topics
from its source code:
$ python setup.py install
As a server-client application, Jupyter allows you to edit and run Python code interactively from within so-called notebooks via a web browser.
To install Jupyter:
$ pip install jupyter
Python distributions like Anaconda come with Jupyter by default.
You can run Jupyter via:
$ jupyter notebook
MALLET is a Java-based package for statistical natural language processing. The MALLET Topic Model package includes an extremely fast and highly scalable implementation of Gibbs sampling and tools for inferring topics for new documents given trained models.
To call MALLET from within the Python environment, dariah_topics
provides a convenient wrapper.
You can download MALLET here. For more detailed instructions, have a look at this.
If you are confronted with any issues regarding installation or usability, please use GitHub issues.
This library requires Python 3.6 or higher.
- You will have to install
future‑0.16.0‑py3‑none‑any.whl
from this resource. Download the appropriate file and runpip install future‑0.16.0‑py3‑none‑any.whl
. - In case of the error
Microsoft Visual C++ 10.0 is required
, check if you are using Python 3.6 or higher withpython -V
. If you do, you have to install Microsoft Windows SDK from this resource. If you do not, upgrade to Python 3.6 or higher and try installing the library again.
- In case of
PermissionError: [Errno 13] Permission denied
, trypip install --user
orpython setup.py install --user
, respectively. - Due to several visualization dependencies, you might have to install the distribution packages
libfreetype6-dev
andlibpng-dev
(e.g. usingsudo apt-get install
).
- Make sure to install Python 3.6 correctly and adjust the selection of the Python interpreter in your editor accordingly. See also: https://docs.python.org/3/using/mac.html