DistilBERT vs BERT on TweetQA data ⚗️

Code for our exam project for the Data Science course as a part of MSc in Cognitive Science at Aarhus University.

In this paper the large version of BERT (Bidirectional Encoder Representations from Transformers by Huggingface 🤗) is compared to its smaller knowledge-distilled version DistilBERT. Both models were fine-tuned to the SQuAD 1.1 task. The aim was to see how well these pre-made models perform on a new dataset that hasn’t been part of their training nor fine-tuning data - the TweetQA dataset by Xiong et al. (2019).

Getting Started

Prerequisites & Installing

Make a virtual environment and install necessary Python modules in your Terminal:

virtualenv bert --python /usr/bin/python3
source bert/bin/activate
pip install -r requirements.txt

In your R seesion, install the package manager pacman using command:

install.packages("pacman")

Description

The script twitterQA_bert_battle.py applies both models to train and dev parts of the TweetQA dataset. The script outputs the model inferences and processing times, model loading times and tokenizer loading times
The notebook Evaluation.ipynb applies automated evaluation metrics GLEU and METEOR scores to model inferences, and outputs GLEU scores for questions with short answers and METEOR scores for questions with longer answers
The R Markdown visualisations.Rmd uses files in processed_data to visualise results and run simple t-tests to compare two models

Built With

Python, Jupyter Notebook and R

Versioning

See requirements.txt

Authors

Anita Kurm and Maris Sala

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
.ipynb_checkpoints		.ipynb_checkpoints
Plots		Plots
TweetQA_data		TweetQA_data
pdfs		pdfs
processed_data		processed_data
.DS_Store		.DS_Store
.gitignore		.gitignore
Evaluation.ipynb		Evaluation.ipynb
README.md		README.md
data-science-bert.Rproj		data-science-bert.Rproj
requirements.txt		requirements.txt
twitterQA_bert_battle.py		twitterQA_bert_battle.py
visualisations.Rmd		visualisations.Rmd

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DistilBERT vs BERT on TweetQA data ⚗️

Getting Started

Prerequisites & Installing

Description

Built With

Versioning

Authors

Acknowledgments

About

Releases

Packages

Languages

anitakurm/data-science-bert

Folders and files

Latest commit

History

Repository files navigation

DistilBERT vs BERT on TweetQA data ⚗️

Getting Started

Prerequisites & Installing

Description

Built With

Versioning

Authors

Acknowledgments

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages