Skip to content

Implementation of paper "DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter" by Victor SANH, Lysandre DEBUT, Julien CHAUMOND, Thomas WOLF

Notifications You must be signed in to change notification settings

askaydevs/distillbert-qa

Repository files navigation

DistilBERT-SQuAD

What is DistilBERT?

DistilBERT is a small, fast, cheap and light Transformer model based on Bert architecture. It has 40% less parameters than bert-base-uncased, runs 60% faster while preserving 97% of BERT's performance as measured on the GLUE language understanding benchmark. DistilBERT is trained using knowledge distillation, a technique to compress a large model called the teacher into a smaller model called the student. By distillating Bert, we obtain a smaller Transformer model that bears a lot of similarities with the original BERT model while being lighter, smaller and faster to run. DistilBERT is thus an interesting option to put large-scaled trained Transformer model into production.

Transformers - Hugging Face repository

The Stanford Question Answering Dataset (SQuAD)

Stanford Question Answering Dataset (SQuAD) is a reading comprehension dataset, consisting of questions posed by crowdworkers on a set of Wikipedia articles, where the answer to every question is a segment of text, or span, from the corresponding reading passage, or the question might be unanswerable.

https://rajpurkar.github.io/SQuAD-explorer/

Installation

If you are testing this on your own machine I would recommend you do the setup in a virtual environment, as not to affect the rest of your files.

In Python3 you can set up a virtual environment with

python3 -m venv /path/to/new/virtual/environment

Or by installing virtualenv with pip by doing

pip3 install virtualenv

Then creating the environment with

virtualenv venv

and finally activating it with

source venv/bin/activate

You must have Python3

Install the requirements with:

pip3 install -r requirements.txt

SQuAD Fine-tuned model

The SQuAD fine-tuned model is available here. Download the model by following the google drive link place the downloaded model in model.

alternatively inside the model.py file you can specify the type of model you wish to use, the one I have provided, or a Hugging Face fine-tuned SQuAD model.

distilbert-base-uncased-distilled-squad

You can do this with

model = DistilBertForQuestionAnswering.from_pretrained('distilbert-base-uncased-distilled-squad', config=config)

Making predictions

test.py

You can also make predictions by using the provided demo which is deployed with Flask to handle the interactions with the UI.

script.py

How to train (Distil)BERT

The data for SQuAD can be downloaded with the following links and should be saved in a $SQUAD_DIR directory.

Training on one Tesla V100 16GB GPU, each epoch took around 9 minutes to complete, in comparison training on a single Quadro M4000 the time for each epoch took over 2 hours, so don't be alarmed if your training isn't lightning fast.

export SQUAD_DIR=/path/to/SQUAD

python run_squad.py \
  --model_type distilbert \
  --model_name_or_path distilbert-base-uncased \
  --do_train \
  --do_eval \
  --do_lower_case \
  --train_file $SQUAD_DIR/train-v1.1.json \
  --predict_file $SQUAD_DIR/dev-v1.1.json \
  --per_gpu_train_batch_size 12 \
  --learning_rate 3e-5 \
  --num_train_epochs 2.0 \
  --max_seq_length 384 \
  --doc_stride 128 \
  --output_dir /tmp/debug_squad/

References

About

Implementation of paper "DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter" by Victor SANH, Lysandre DEBUT, Julien CHAUMOND, Thomas WOLF

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published