Topics is a Python library for Text Mining and Topic Modeling. Furthermore, this repository provides a convenient, modular workflow that can be entirely controlled from within and which comes with a well documented Jupyter notebook. Users not yet familiar with programming in Python can test basic Topic Modeling in a Flask-based GUI demonstrator. For a standalone application, which does not require a Python interpreter or any extra installations, have a look at the release-section.
At the moment, this library supports three LDA implementations:
- lda, which is lightweight and provides basic LDA.
- MALLET, which is known to be very robust.
- Gensim, which is attractive because of its multi-core support.
- Topics website
- Topics API documentation
- Topics paper
- Demonstrator releases
- An introduction to Topic Modeling using lda
- An introduction to Topic Modeling using MALLET
- An introduction to Topic Modeling using Gensim
To install the latest stable version:
$ pip install git+https://github.com/DARIAH-DE/Topics.git
To install the latest development version:
$ pip install --upgrade git+https://github.com/DARIAH-DE/Topics.git@testing
Also, you can clone the repository:
$ git clone https://github.com/DARIAH-DE/Topics.git
or download the ZIP-archive and install it from its source code:
$ cd Topics
$ python setup.py install
-
Download and install the latest version of WinPython.
-
Download and install Git.
-
Open the WinPython PowerShell Prompt.exe in your WinPython folder and type
git clone https://github.com/DARIAH-DE/Topics.git
to clone Topics into your WinPython folder. -
Type
cd .\Topics
in WinPython PowerShell to navigate to the Topics folder. -
Either: Type
pip install .
in WinPython PowerShell to install packages required by Topics -
Or: Type
pip install -r requirements.txt
in Winpython PowerShell to install Topics with additional development packages. -
Type
jupyter notebook
in WinPython PowerShell to open Jupyter, select one of the files with suffix.ipynb
and follow the instructions. -
Note: For the development packages the Python module future is needed. Depending in your WinPython and your Windows version you might have to install future manually.
-
Therefore, download the latest future-x.xx.x-py3-none-any.whl.
-
Open the WinPython Control Panel.exe in your WinPython folder.
-
Install the future-wheel via the WinPython Control Panel.exe.
-
Troubleshooting: If the installing process fails and you get the error message: Microsoft Visual C++ 10.0 is required please check if you are using python 3.6. (Type 'python -V')
- Download and install Git.
- Open the command-line interface, type
git clone https://github.com/DARIAH-DE/Topics.git
to clone Topics into your working directory. - Note: The distribution packages
libfreetype6-dev
andlibpng-dev
and a compiler for C++, e.g. gcc have to be installed. - Open the command-line interface, navigate to the folder Topics and type
pip install . --user
to install the required packages. - Install Jupyter and run it by typing
jupyter notebook
in the command-line. - Access the folder Topics through Jupyter in your browser, select one of the files with suffix
.ipynb
and follow the instructions.
- Download and unzip MALLET.
- Set the environment variable for MALLET.
For more detailed instructions, have a look at this.