Skip to content

Latest commit






In this approach we use BERTopic modelling on a corpus of country-based evaluation reports. The corpus is scraped and generated using scripts found in scrape-tool repo.


To run the Jupyter notebook; first install the dependencies

pip install -r requirements.txt

the run the Jupyter interface in the source directory

jupyter lab


This notebook implements guided topic modeling using BERTopic and a dictionary to seed and guid the modeling.


This notebook analyses the set of sentences labelled as ambiguous to find latent topics in the ambiguity. It applies BERTopic and SentenceTransformer. KeyBERTInspired is used as a representation model to better name the found topics.

dataset files

The notebook will attempt to download the corpus file and dictionary file from the democracy-dataset repo. The corpus is a CSV file structured as sentence, country, year and source where sentence is a sentence from the report, country, year and source identify where the sentece was extracted from.