# CS 72 Accelerated Computational Linguistics
## Final Project 
### Improvement Model 1: DistilBERT

Credits:
* DistilBERT model implemented by: <https://huggingface.co/transformers/model_doc/distilbert.html>

* GPU provided by Google Co-lab: <https://colab.research.google.com/> 

* Google Colab notebook written and model trained by: Yakoob Khan '21 (Yakoob.Khan.21@dartmouth.edu)

# Original BERT model architecture from Google paper
### Paper: <https://arxiv.org/pdf/1810.04805.pdf>
### Image Credits: <https://arxiv.org/pdf/1810.04805.pdf>

<img src="../model-architectures/bert-model.png">

## First we import the code into Google Colab by cloning Hugging Face's transformers git repository.

In [1]:
!git clone https://github.com/huggingface/transformers

Cloning into 'transformers'...
remote: Enumerating objects: 8, done.[K
remote: Counting objects: 100% (8/8), done.[K
remote: Compressing objects: 100% (7/7), done.[K
remote: Total 27198 (delta 0), reused 2 (delta 0), pack-reused 27190[K
Receiving objects: 100% (27198/27198), 25.20 MiB | 10.77 MiB/s, done.
Resolving deltas: 100% (18898/18898), done.


## Now we change into the transformers directory.

In [2]:
%cd transformers

/content/transformers


# We use pip to install dependencies required to use the transformers library.

In [3]:
!pip install .
!pip install -r ./examples/requirements.txt

Processing /content/transformers
Collecting tokenizers==0.7.0
[?25l  Downloading https://files.pythonhosted.org/packages/14/e5/a26eb4716523808bb0a799fcfdceb6ebf77a18169d9591b2f46a9adb87d9/tokenizers-0.7.0-cp36-cp36m-manylinux1_x86_64.whl (3.8MB)
[K     |████████████████████████████████| 3.8MB 39kB/s 
Collecting sentencepiece
[?25l  Downloading https://files.pythonhosted.org/packages/d4/a4/d0a884c4300004a78cca907a6ff9a5e9fe4f090f5d95ab341c53d28cbc58/sentencepiece-0.1.91-cp36-cp36m-manylinux1_x86_64.whl (1.1MB)
[K     |████████████████████████████████| 1.1MB 24.4MB/s 
[?25hCollecting sacremoses
[?25l  Downloading https://files.pythonhosted.org/packages/7d/34/09d19aff26edcc8eb2a01bed8e98f13a1537005d31e95233fd48216eed10/sacremoses-0.0.43.tar.gz (883kB)
[K     |████████████████████████████████| 890kB 28.0MB/s 
Building wheels for collected packages: transformers, sacremoses
  Building wheel for transformers (setup.py) ... [?25l[?25hdone
  Created wheel for transformers: filename=tr

## Now we change into the directory containing code for running the SQuAD model.

In [4]:
%cd '/content/transformers/examples/question-answering'

/content/transformers/examples/question-answering


In [5]:
%pwd

'/content/transformers/examples/question-answering'

### * Note: Ensure that the train-v1.1.json, dev-v1.1.json and evaluate-v1.1.py files (found in data directory) are manually uploaded into Google Colab in this directory.

### We are ready to train the BERT model! Training takes about 1 hour per epoch on a single GPU on Google Colab. We trained the model for 5 epochs for a total training time of about 5 hours!
#### * f1 is the F1 Score.
#### * exact is the Exact Match Score
#### * Scroll all the way to the bottom to see the final F1 and EM score!

In [26]:
!export SQUAD_DIR='content/transformers/examples/question-answering/'

!python run_squad.py \
  --model_type distilBERT \
  --model_name_or_path distilbert-base-cased-distilled-squad \
  --do_train \
  --do_eval \
  --train_file train-v1.1.json \
  --predict_file dev-v1.1.json \
  --per_gpu_train_batch_size 64 \
  --learning_rate 3e-5 \
  --num_train_epochs 5 \
  --max_seq_length 150 \
  --doc_stride 64 \
  --output_dir /tmp/debug_squad/ \
  --overwrite_output_dir

[1;30;43mStreaming output truncated to the last 5000 lines.[0m
Iteration:  16% 424/2660 [10:21<54:35,  1.47s/it][A
Iteration:  16% 425/2660 [10:22<54:32,  1.46s/it][A
Iteration:  16% 426/2660 [10:24<54:24,  1.46s/it][A
Iteration:  16% 427/2660 [10:25<54:23,  1.46s/it][A
Iteration:  16% 428/2660 [10:27<54:30,  1.47s/it][A
Iteration:  16% 429/2660 [10:28<54:26,  1.46s/it][A
Iteration:  16% 430/2660 [10:29<54:27,  1.47s/it][A
Iteration:  16% 431/2660 [10:31<54:25,  1.46s/it][A
Iteration:  16% 432/2660 [10:32<54:15,  1.46s/it][A
Iteration:  16% 433/2660 [10:34<54:09,  1.46s/it][A
Iteration:  16% 434/2660 [10:35<54:07,  1.46s/it][A
Iteration:  16% 435/2660 [10:37<54:11,  1.46s/it][A
Iteration:  16% 436/2660 [10:38<54:18,  1.47s/it][A
Iteration:  16% 437/2660 [10:40<54:21,  1.47s/it][A
Iteration:  16% 438/2660 [10:41<54:16,  1.47s/it][A
Iteration:  17% 439/2660 [10:43<54:16,  1.47s/it][A
Iteration:  17% 440/2660 [10:44<54:13,  1.47s/it][A
Iteration:  17% 441/2660 [10:46<54

# Final Results of BERT: 
### * F1 Score: 84.85821083204054 
### * EM Score: 75.94134342478714

### vs.

# Baseline: 
### * F1 Score: 78.98
### * EM Score: 69.73

## BERT gives us a significant improvement over the baseline!