DistilBERT is a small, fast, cheap and light Transformer model based on Bert architecture. It has 40% less parameters than bert-base-uncased, runs 60% faster while preserving 97% of BERT's performance as measured on the GLUE language understanding benchmark. DistilBERT is trained using knowledge distillation, a technique to compress a large model called the teacher into a smaller model called the student. By distillating Bert, we obtain a smaller Transformer model that bears a lot of similarities with the original BERT model while being lighter, smaller and faster to run. DistilBERT is thus an interesting option to put large-scaled trained Transformer model into production.
Transformers - Hugging Face repository
Stanford Question Answering Dataset (SQuAD) is a reading comprehension dataset, consisting of questions posed by crowdworkers on a set of Wikipedia articles, where the answer to every question is a segment of text, or span, from the corresponding reading passage, or the question might be unanswerable.
https://rajpurkar.github.io/SQuAD-explorer/
If you are testing this on your own machine I would recommend you do the setup in a virtual environment, as not to affect the rest of your files.
In Python3 you can set up a virtual environment with
python3 -m venv /path/to/new/virtual/environment
Or by installing virtualenv with pip by doing
pip3 install virtualenv
Then creating the environment with
virtualenv venv
and finally activating it with
source venv/bin/activate
You must have Python3
Install the requirements with:
pip3 install -r requirements.txt
The SQuAD fine-tuned model is available here. Download the model by following the google drive link place the downloaded model in model
.
alternatively inside the model.py file you can specify the type of model you wish to use, the one I have provided, or a Hugging Face fine-tuned SQuAD model.
distilbert-base-uncased-distilled-squad
You can do this with
model = DistilBertForQuestionAnswering.from_pretrained('distilbert-base-uncased-distilled-squad', config=config)
You can also make predictions by using the provided demo which is deployed with Flask to handle the interactions with the UI.
The data for SQuAD can be downloaded with the following links and should be saved in a $SQUAD_DIR directory.
Training on one Tesla V100 16GB GPU, each epoch took around 9 minutes to complete, in comparison training on a single Quadro M4000 the time for each epoch took over 2 hours, so don't be alarmed if your training isn't lightning fast.
export SQUAD_DIR=/path/to/SQUAD
python run_squad.py \
--model_type distilbert \
--model_name_or_path distilbert-base-uncased \
--do_train \
--do_eval \
--do_lower_case \
--train_file $SQUAD_DIR/train-v1.1.json \
--predict_file $SQUAD_DIR/dev-v1.1.json \
--per_gpu_train_batch_size 12 \
--learning_rate 3e-5 \
--num_train_epochs 2.0 \
--max_seq_length 384 \
--doc_stride 128 \
--output_dir /tmp/debug_squad/