Binary Classification of Populist Speech

Project Overview

This project focuses on the classification of speeches as either populist or non-populist using fine-tuned pre-trained language models. We evaluated four models—BERT-tiny, BERT-large, GPT-2, and RoBERTa-large—on a dataset of 500 manually labeled speeches. The best performing model, RoBERTa-large, achieved an accuracy of 88%, demonstrating its effectiveness for this task.

Models Evaluated

BERT-tiny (Google)
BERT-large (Google)
GPT-2 (OpenAI)
RoBERTa-large (Facebook AI)

Each model was fine-tuned using pre-processed speeches, tokenized to fit the input structure of the model, and evaluated based on accuracy and loss metrics.

Dataset

The dataset contains 500 speeches, equally split between populist and non-populist categories, manually labeled by the contributors. The speeches were collected via web scraping and include translated texts from various languages, further enriching the diversity of the data.

Running the Code

The project code was primarily run on Google Colab using a high-RAM A100 GPU, which allowed for:

Batch size of 128 for BERT-tiny
Batch size of 20 for the larger models

The code can be run all together with no issues given the right GPU and Database path. Replicating these results on Colab is straightforward, though smaller GPUs or CPUs will require a reduction in batch size:

Batch size of 8 for single-model runs
Batch size of 2-4 for running all models simultaneously

Results

RoBERTa-large: Best model with an accuracy of 88%.
BERT-large: Accuracy of 71%.
GPT-2: Accuracy of 61%.
BERT-tiny: Accuracy of 59%.

For more details on the performance and methodology, please refer to the Experiments and Results section in the report.

How to Use the Code

Clone this repository.
Ensure that you have PyTorch, transformers, and the necessary libraries installed.
Set up your environment with a high-memory GPU for best performance, or adjust the batch size as outlined above.
Run the code using the provided scripts, ensuring you specify the correct database path.

For further details on implementation, refer to the documentation within the scripts.

Contributors

Alessandro Pala
Lorenzo Cino
Greta Grelli
Alberto Calabrese
Giacomo Filippin

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
Binary_Classification_of_Populist_Discourse.ipynb		Binary_Classification_of_Populist_Discourse.ipynb
DB.xlsx		DB.xlsx
Group Workload.pdf		Group Workload.pdf
LICENSE		LICENSE
README.md		README.md
REPORT.pdf		REPORT.pdf
logo_unipd.png		logo_unipd.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Binary Classification of Populist Speech

Project Overview

Models Evaluated

Dataset

Running the Code

Results

How to Use the Code

Contributors

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Binary Classification of Populist Speech

Project Overview

Models Evaluated

Dataset

Running the Code

Results

How to Use the Code

Contributors

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages