Skip to content


Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?

Latest commit


Git stats


Failed to load latest commit information.
Latest commit message
Commit time

ToModAPI: Topic Modeling API

This API is built to dynamically perform training, inference, and evaluation for different topic modeling techniques. The API grant common interfaces and command for accessing the different models, make easier to compare them.

A demo is available at


In this repository, we provide:

Each model expose the following functions:

Training the model
m.train(data, num_topics, preprocessing) # => 'success'
Print the list of computed topics
for i, x in enumerate(m.topics):
    print(f'Topic {i}')
    for word, weight in zip(x['words'], x['weights']):
        print(f'- {word} => {weight}')
Access to the info about a specific topic
x = m.topic(0)
words = x['words']
weights= x['weights']
Access to the predictions computed on the training corpus
for i, p in enumerate(m.get_corpus_predictions(topn=3)): # predictions for each document
    print(f'Predictions on document {i}')
    for topic, confidence in p:
        print(f'- Topic {topic} with confidence {confidence}')
        # - Topic 21 with confidence 0.03927058187976461
Predict the topic of a new text
pred = m.predict(text, topn=3)
for topic, confidence in pred:
    print(f'- Topic {topic} with confidence {confidence}')
     # - Topic 21 with confidence 0.03927058187976461
Computing the coherence against a corpus
# coherence: Type of coherence to compute, among <c_v, c_npmi, c_uci, u_mass>. See
pred = m.coherence(mycorpus, metric='c_v')
#  "c_v": 0.5186710138972105,
#  "c_v_std": 0.1810477961008996,
#  "c_v_per_topic": [
#    0.5845048872767505,
#    0.30693460230781777,
#    0.2611738203246824,
#    ...
#  ]
Evaluating against a grount truth
# metric: Metric for computing the evaluation, among <purity, homogeneity, completeness, v-measure, nmi>.
res = m.get_corpus_predictions(topn=1)
v = m.evaluate(res, ground_truth_labels, metric='purity')
# 0.7825333630516738

The possible parameters can differ depending on the model.

Use in a Python enviroment

Install this package

pip install tomodapi

Use it in a Python script

from tomodapi import LdaModel

# init the model
m = LdaModel(model_path=path_location)
# train on a corpus
m.train(my_corpus, preprocessing=False, num_topics=10)
# infer topic of a sentence
best_topics = m.predict("In the time since the industrial revolution the climate has increasingly been affected by human activities that are causing global warming and climate change")
topic,confidence = best_topics[0]
# get top words for a given topic
print(m.topic(topic)) #

If the model_path is not specified, the library will load/save the model from/under models/<model_name>.


A web API is provided for accessing to the library as a service

Install dependencies

You should install 2 dependencies:

Under UNIX, you can use the script.

Start the server


Alternatively, you can run a docker container with

docker-compose -f docker-compose.yml up

The container uses mounted volumes so that you can easily update/access to the computed models and the data files.

Manual Docker installation

docker build -t hyperted/topic .
docker run -p 27020:5000 --env APP_BASE_PATH= -d -v /home/semantic/hyperted/tomodapi/models:/models -v /home/semantic/hyperted/tomodapi/data:/data --name hyperted_topic hyperted/topic

# Uninstall
docker stop hyperted_topic
docker rm hyperted_topic
docker rmi hyperted/topic


If you find this library or API useful in your research, please consider citing our papers:

  • Pasquale Lisena P, Ismail Harrando I., Oussama Kandakji O. & Raphaël Troncy. ToModAPI: A Topic Modeling API to Train, Use and Compare Topic Models. In 2nd Workshop for Natural Language Processing Open Source Software (NLP-OSS), November 19, 2020. - paper - BIB

  • Ismail Harrando, Pasquale Lisena and Raphaël Troncy. Apples to Apples: A Systematic Evaluation of Topic Models. In Recent Advances in Natural Language Processing (RANLP), September 2021. - BIB - appendix


Train, evaluate, and use different unsupervised topic modelling algorithms using a RESTful API.







No releases published


No packages published

Contributors 4