TMELPipeline

This project is a Python implementation of Entity-based Topic Modeling, a new corpus exploration method. The tool identifies a series of descriptive labels for each document in an ontology and generates for each label a topic, which is easy to interpret as it is directly linked to a knowledge base.

The interconnected web-based evaluation platform for gathering annotations and statistics related to the results of the pipeline is available for download on Github via this link: https://github.com/anlausch/TMEvaluationPlatform.

Installation instructions:

install Python 2.7
install Java 7 (that is required by the Stanford TMT!)
install Scala
install Python NLTK 3.2
run the following commands in the Python interpreter:

import nltk nltk.download('punkt')
install MySQL
install mysql-connector for Python
change configuration in settings file to your needs
run program

This project was part of the research that was done on Entity-based Topic Modeling by the Data and Web Science Research Group of the University of Mannheim. More information about our work can be found here: http://dws.informatik.uni-mannheim.de/en/home/.

Please do not forget to cite our work when using it in your project:

Anne Lauscher, Federico Nanni, Pablo Ruiz Fabo and Simone Paolo Ponzetto (2016): Entities as Topic Labels: Combining Entity Linking and Labeled LDA to Improve Topic Interpretability and Evaluability. In: Italian Journal of Computational Linguistics 2(2), pp. 67-88.

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
docs		docs
src		src
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TMELPipeline

About

Releases

Packages

Languages

License

anlausch/TMELPipeline

Folders and files

Latest commit

History

Repository files navigation

TMELPipeline

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages