Code for our exam project for the Data Science course as a part of MSc in Cognitive Science at Aarhus University.
In this paper the large version of BERT (Bidirectional Encoder Representations from Transformers by Huggingface 🤗) is compared to its smaller knowledge-distilled version DistilBERT. Both models were fine-tuned to the SQuAD 1.1 task. The aim was to see how well these pre-made models perform on a new dataset that hasn’t been part of their training nor fine-tuning data - the TweetQA dataset by Xiong et al. (2019).
Make a virtual environment and install necessary Python modules in your Terminal:
virtualenv bert --python /usr/bin/python3
source bert/bin/activate
pip install -r requirements.txt
In your R seesion, install the package manager pacman using command:
install.packages("pacman")
-
The script twitterQA_bert_battle.py applies both models to train and dev parts of the TweetQA dataset. The script outputs the model inferences and processing times, model loading times and tokenizer loading times
-
The notebook Evaluation.ipynb applies automated evaluation metrics GLEU and METEOR scores to model inferences, and outputs GLEU scores for questions with short answers and METEOR scores for questions with longer answers
-
The R Markdown visualisations.Rmd uses files in processed_data to visualise results and run simple t-tests to compare two models
Python, Jupyter Notebook and R
See requirements.txt
Anita Kurm and Maris Sala