Skip to content

anlausch/TMELPipeline

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

27 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

TMELPipeline

This project is a Python implementation of Entity-based Topic Modeling, a new corpus exploration method. The tool identifies a series of descriptive labels for each document in an ontology and generates for each label a topic, which is easy to interpret as it is directly linked to a knowledge base.

The interconnected web-based evaluation platform for gathering annotations and statistics related to the results of the pipeline is available for download on Github via this link: https://github.com/anlausch/TMEvaluationPlatform.

Installation instructions:

  • install Python 2.7
  • install Java 7 (that is required by the Stanford TMT!)
  • install Scala
  • install Python NLTK 3.2
  • run the following commands in the Python interpreter:

    import nltk nltk.download('punkt')

  • install MySQL
  • install mysql-connector for Python
  • change configuration in settings file to your needs
  • run program

This project was part of the research that was done on Entity-based Topic Modeling by the Data and Web Science Research Group of the University of Mannheim. More information about our work can be found here: http://dws.informatik.uni-mannheim.de/en/home/.

Please do not forget to cite our work when using it in your project:

Anne Lauscher, Federico Nanni, Pablo Ruiz Fabo and Simone Paolo Ponzetto (2016): Entities as Topic Labels: Combining Entity Linking and Labeled LDA to Improve Topic Interpretability and Evaluability. In: Italian Journal of Computational Linguistics 2(2), pp. 67-88.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published