Skip to content

Latest commit

 

History

History
executable file
·
79 lines (58 loc) · 4.83 KB

README.md

File metadata and controls

executable file
·
79 lines (58 loc) · 4.83 KB

Topics – Easy Topic Modeling in Python

Topics is a Python library for Text Mining and Topic Modeling. Furthermore, this repository provides a convenient, modular workflow that can be entirely controlled from within and which comes with a well documented Jupyter notebook. Users not yet familiar with programming in Python can test basic Topic Modeling in a Flask-based GUI demonstrator. For a standalone application, which does not require a Python interpreter or any extra installations, have a look at the release-section.

At the moment, this library supports three LDA implementations:

  • lda, which is lightweight and provides basic LDA.
  • MALLET, which is known to be very robust.
  • Gensim, which is attractive because of its multi-core support.

Resources

Installation

To install the latest stable version:

$ pip install git+https://github.com/DARIAH-DE/Topics.git

To install the latest development version:

$ pip install --upgrade git+https://github.com/DARIAH-DE/Topics.git@testing

Also, you can clone the repository:

$ git clone https://github.com/DARIAH-DE/Topics.git

or download the ZIP-archive and install it from its source code:

$ cd Topics
$ python setup.py install

Working with notebooks

Windows

  1. Download and install the latest version of WinPython.

  2. Download and install Git.

  3. Open the WinPython PowerShell Prompt.exe in your WinPython folder and type git clone https://github.com/DARIAH-DE/Topics.git to clone Topics into your WinPython folder.

  4. Type cd .\Topics in WinPython PowerShell to navigate to the Topics folder.

  5. Either: Type pip install . in WinPython PowerShell to install packages required by Topics

  6. Or: Type pip install -r requirements.txt in Winpython PowerShell to install Topics with additional development packages.

  7. Type jupyter notebook in WinPython PowerShell to open Jupyter, select one of the files with suffix .ipynb and follow the instructions.

  8. Note: For the development packages the Python module future is needed. Depending in your WinPython and your Windows version you might have to install future manually.

  9. Therefore, download the latest future-x.xx.x-py3-none-any.whl.

  10. Open the WinPython Control Panel.exe in your WinPython folder.

  11. Install the future-wheel via the WinPython Control Panel.exe.

  12. Troubleshooting: If the installing process fails and you get the error message: Microsoft Visual C++ 10.0 is required please check if you are using python 3.6. (Type 'python -V')

macOS and Linux

  1. Download and install Git.
  2. Open the command-line interface, type git clone https://github.com/DARIAH-DE/Topics.git to clone Topics into your working directory.
  3. Note: The distribution packages libfreetype6-dev and libpng-dev and a compiler for C++, e.g. gcc have to be installed.
  4. Open the command-line interface, navigate to the folder Topics and type pip install . --user to install the required packages.
  5. Install Jupyter and run it by typing jupyter notebook in the command-line.
  6. Access the folder Topics through Jupyter in your browser, select one of the files with suffix .ipynb and follow the instructions.

Working with MALLET

  1. Download and unzip MALLET.
  2. Set the environment variable for MALLET.

For more detailed instructions, have a look at this.