Skip to content

Data Science exam project repository: knowledge distilled BERT vs large BERT

Notifications You must be signed in to change notification settings

anitakurm/data-science-bert

 
 

Repository files navigation

DistilBERT vs BERT on TweetQA data ⚗️

Code for our exam project for the Data Science course as a part of MSc in Cognitive Science at Aarhus University.

In this paper the large version of BERT (Bidirectional Encoder Representations from Transformers by Huggingface 🤗) is compared to its smaller knowledge-distilled version DistilBERT. Both models were fine-tuned to the SQuAD 1.1 task. The aim was to see how well these pre-made models perform on a new dataset that hasn’t been part of their training nor fine-tuning data - the TweetQA dataset by Xiong et al. (2019).

Getting Started

Prerequisites & Installing

Make a virtual environment and install necessary Python modules in your Terminal:

virtualenv bert --python /usr/bin/python3
source bert/bin/activate
pip install -r requirements.txt

In your R seesion, install the package manager pacman using command:

install.packages("pacman")

Description

Built With

Python, Jupyter Notebook and R

Versioning

See requirements.txt

Authors

Anita Kurm and Maris Sala

Acknowledgments

About

Data Science exam project repository: knowledge distilled BERT vs large BERT

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 97.1%
  • Python 2.9%