<img src="http://nlmatics.github.io/site_files/nick_post/squad_image.png" alt="squad" width="300"/>

# Dutch SQuAD 2.0 dataset
*Note:* this dataset is a machine-translated version of the SQuAD v2.0 from Stanford University and can be found [here.](https://gitlab.com/niels.rouws/dutch-squad-v2.0) <br>

**How to use:** <br>
It is best to use Google Colab and run the notebook to get results. <br><br>
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/googlecolab/colabtools/blob/master/notebooks/colab-github-demo.ipynb)

### In this notebook:
- Three Bert-based models are tested and evaluated
  - MBert
  - Bertje
  - Robbert
- Hyperparameters are checked and tested

In [None]:
# if using Colab, install necessary libraries
!pip install transformers

In [None]:
# Import libraries
import requests
import pandas as pd
import numpy as np
from pandas.io.json import json_normalize
import json
import time
import matplotlib as plt

### Load data

In [None]:
# Download the Dutch SQuAD2.0 dev set
!wget -P data/squad/ https://gitlab.com/niels.rouws/dutch-squad-v2.0/-/raw/main/nl_squad_dev_filtered.json

# Download the Dutch SQuAD2.0 train set
!wget -P data/squad/ https://gitlab.com/niels.rouws/dutch-squad-v2.0/-/raw/main/nl_squad_train_filtered.json

### Modeling

In [None]:
# import run squad pipeline from HF
%load models/run_squad.py

#### MBert

In [None]:
##########
# model: bert-base-multilingual-cased
# without training
# without domain adaptation
##########
!python /content/run_squad.py  \
    --model_type bert   \
    --model_name_or_path bert-base-multilingual-cased  \
    --output_dir models/bert/bert-base-multilingual-cased \
    --data_dir  data \
    --train_file policyqa-train.json   \
    --predict_file policyqa-test.json   \
    --do_eval   \
    --do_lower_case  \
    --weight_decay 0.01 \
    --learning_rate 3.6e-06 \
    --per_gpu_eval_batch_size 12   \
    --max_seq_length 384   \
    --doc_stride 128

In [None]:
##########
# model: bert-base-multilingual-cased
# with training
# without domain adaptation
##########
!python /content/run_squad.py  \
    --model_type bert   \
    --model_name_or_path bert-base-multilingual-cased  \
    --output_dir models/bert/bert-base-multilingual-cased \
    --data_dir  data \
    --train_file policyqa-train.json   \
    --predict_file policyqa-test.json   \
    --do_train   \
    --do_eval   \
    --do_lower_case  \
    --per_gpu_eval_batch_size 12   \
    --max_seq_length 384   \
    --doc_stride 128

In [None]:
##########
# model: bert-base-multilingual-cased
# without training
# with domain adaptation / hyperparameter tuning
##########
!python /content/run_squad.py  \
    --model_type bert   \
    --model_name_or_path bert-base-multilingual-cased  \
    --output_dir models/bert/bert-base-multilingual-cased \
    --data_dir  data \
    --train_file policyqa-train.json   \
    --predict_file policyqa-test.json   \
    --do_eval   \
    --do_lower_case  \
    --weight_decay 0.01 \
    --learning_rate 3.6e-06 \
    --per_gpu_eval_batch_size 12   \
    --max_seq_length 384   \
    --doc_stride 128

In [None]:
##########
# model: bert-base-multilingual-cased
# with training
# with domain adaptation / hyperparameter tuning
##########
!python /content/run_squad.py  \
    --model_type bert   \
    --model_name_or_path bert-base-multilingual-cased  \
    --output_dir models/bert/bert-base-multilingual-cased \
    --data_dir  data \
    --train_file policyqa-train.json   \
    --predict_file policyqa-test.json   \
    --do_train   \
    --do_eval   \
    --do_lower_case  \
    --weight_decay 0.01 \
    --learning_rate 3.6e-06 \
    --per_gpu_eval_batch_size 12   \
    --max_seq_length 384   \
    --doc_stride 128

##### MBert, finetuned on Dutch SQuAD

In [None]:
##########
# model: bert-base-multilingual-cased-finetuned-dutch-squad2
# without training
# without domain adaptation
##########
!python /content/run_squad.py  \
    --model_type bert   \
    --model_name_or_path henryk/bert-base-multilingual-cased-finetuned-dutch-squad2  \
    --output_dir models/bert/henryk/bert-base-multilingual-cased-finetuned-dutch-squad2 \
    --data_dir  data \
    --predict_file policyqa-test.json   \
    --do_eval   \
    --do_lower_case  \
    --per_gpu_eval_batch_size 12   \
    --max_seq_length 384   \
    --doc_stride 128

In [None]:
##########
# model: bert-base-multilingual-cased-finetuned-dutch-squad2
# with training
# without domain adaptation
##########
!python /content/run_squad.py  \
    --model_type bert   \
    --model_name_or_path henryk/bert-base-multilingual-cased-finetuned-dutch-squad2  \
    --output_dir models/bert/henryk/bert-base-multilingual-cased-finetuned-dutch-squad2 \
    --data_dir  data \
    --train_file policyqa-train.json   \
    --predict_file policyqa-test.json   \
    --do_train   \
    --do_eval   \
    --do_lower_case  \
    --per_gpu_eval_batch_size 12   \
    --max_seq_length 384   \
    --doc_stride 128

In [None]:
##########
# model: bert-base-multilingual-cased-finetuned-dutch-squad2
# without training
# with domain adaptation / hyperparameter tuning
##########
!python /content/run_squad.py  \
    --model_type bert   \
    --model_name_or_path henryk/bert-base-multilingual-cased-finetuned-dutch-squad2  \
    --output_dir models/bert/henryk/bert-base-multilingual-cased-finetuned-dutch-squad2 \
    --data_dir  data \
    --predict_file policyqa-test.json   \
    --do_eval   \
    --do_lower_case  \
    --weight_decay 0.01 \
    --learning_rate 3.6e-06 \
    --per_gpu_eval_batch_size 12   \
    --max_seq_length 384   \
    --doc_stride 128

In [None]:
##########
# model: bert-base-multilingual-cased-finetuned-dutch-squad2
# with training
# with domain adaptation / hyperparameter tuning
##########
!python /content/run_squad.py  \
    --model_type bert   \
    --model_name_or_path henryk/bert-base-multilingual-cased-finetuned-dutch-squad2  \
    --output_dir models/bert/henryk/bert-base-multilingual-cased-finetuned-dutch-squad2 \
    --data_dir  data \
    --train_file policyqa-train.json   \
    --predict_file policyqa-test.json   \
    --do_train   \
    --do_eval   \
    --do_lower_case  \
    --weight_decay 0.01 \
    --learning_rate 3.6e-06 \
    --per_gpu_eval_batch_size 12   \
    --max_seq_length 384   \
    --doc_stride 128

#### Bertje

In [None]:
##########
# model: GroNLP/bert-base-dutch-cased
# without training
# without domain adaptation
##########
!python /content/run_squad.py  \
    --model_type bert   \
    --model_name_or_path GroNLP/bert-base-dutch-cased  \
    --output_dir models/bert/GroNLP/bert-base-dutch-cased \
    --data_dir  data \
    --train_file policyqa-train.json   \
    --predict_file policyqa-test.json   \
    --do_eval   \
    --do_lower_case  \
    --per_gpu_eval_batch_size 12   \
    --max_seq_length 384   \
    --doc_stride 128

In [None]:
##########
# model: GroNLP/bert-base-dutch-cased
# with training
# without domain adaptation
##########
!python /content/run_squad.py  \
    --model_type bert   \
    --model_name_or_path GroNLP/bert-base-dutch-cased  \
    --output_dir models/bert/GroNLP/bert-base-dutch-cased \
    --data_dir  data \
    --train_file policyqa-train.json   \
    --predict_file policyqa-test.json   \
    --do_train   \
    --do_eval   \
    --do_lower_case  \
    --per_gpu_eval_batch_size 12   \
    --max_seq_length 384   \
    --doc_stride 128

In [None]:
##########
# model: GroNLP/bert-base-dutch-cased
# without training
# with domain adaptation / hyperparameter tuning
##########
!python /content/run_squad.py  \
    --model_type bert   \
    --model_name_or_path GroNLP/bert-base-dutch-cased  \
    --output_dir models/bert/GroNLP/bert-base-dutch-cased \
    --data_dir  data \
    --train_file policyqa-train.json   \
    --predict_file policyqa-test.json   \
    --do_eval   \
    --do_lower_case  \
    --weight_decay 0.01 \
    --learning_rate 3.6e-06 \
    --per_gpu_eval_batch_size 12   \
    --max_seq_length 384   \
    --doc_stride 128

In [None]:
##########
# model: GroNLP/bert-base-dutch-cased
# with training
# with domain adaptation / hyperparameter tuning
##########
!python /content/run_squad.py  \
    --model_type bert   \
    --model_name_or_path GroNLP/bert-base-dutch-cased  \
    --output_dir models/bert/GroNLP/bert-base-dutch-cased \
    --data_dir  data \
    --train_file policyqa-train.json   \
    --predict_file policyqa-test.json   \
    --do_train   \
    --do_eval   \
    --do_lower_case  \
    --weight_decay 0.01 \
    --learning_rate 3.6e-06 \
    --per_gpu_eval_batch_size 12   \
    --max_seq_length 384   \
    --doc_stride 128

#### Robbert

In [None]:
##########
# model: pdelobelle/robbert-v2-dutch-base
# without training
# without domain adaptation
##########
!python /content/run_squad.py  \
    --model_type bert   \
    --model_name_or_path pdelobelle/robbert-v2-dutch-base  \
    --output_dir models/bert/pdelobelle/robbert-v2-dutch-base \
    --data_dir  data \
    --train_file policyqa-train.json   \
    --predict_file policyqa-test.json   \
    --do_eval   \
    --do_lower_case  \
    --per_gpu_eval_batch_size 12   \
    --max_seq_length 384   \
    --doc_stride 128

In [None]:
##########
# model: pdelobelle/robbert-v2-dutch-base
# with training
# without domain adaptation
##########
!python /content/run_squad.py  \
    --model_type bert   \
    --model_name_or_path pdelobelle/robbert-v2-dutch-base  \
    --output_dir models/bert/pdelobelle/robbert-v2-dutch-base \
    --data_dir  data \
    --train_file policyqa-train.json   \
    --predict_file policyqa-test.json   \
    --do_train   \
    --do_eval   \
    --do_lower_case  \
    --per_gpu_eval_batch_size 12   \
    --max_seq_length 384   \
    --doc_stride 128

In [None]:
##########
# model: pdelobelle/robbert-v2-dutch-base
# without training
# with domain adaptation / hyperparameter tuning
##########
!python /content/run_squad.py  \
    --model_type bert   \
    --model_name_or_path pdelobelle/robbert-v2-dutch-base  \
    --output_dir models/bert/pdelobelle/robbert-v2-dutch-base \
    --data_dir  data \
    --train_file policyqa-train.json   \
    --predict_file policyqa-test.json   \
    --do_eval   \
    --do_lower_case  \
    --weight_decay 0.01 \
    --learning_rate 3.6e-06 \
    --per_gpu_eval_batch_size 12   \
    --max_seq_length 384   \
    --doc_stride 128

In [None]:
##########
# model: pdelobelle/robbert-v2-dutch-base
# with training
# with domain adaptation / hyperparameter tuning
##########
!python /content/run_squad.py  \
    --model_type bert   \
    --model_name_or_path pdelobelle/robbert-v2-dutch-base  \
    --output_dir models/bert/pdelobelle/robbert-v2-dutch-base \
    --data_dir  data \
    --train_file policyqa-train.json   \
    --predict_file policyqa-test.json   \
    --do_train   \
    --do_eval   \
    --do_lower_case  \
    --weight_decay 0.01 \
    --learning_rate 3.6e-06 \
    --per_gpu_eval_batch_size 12   \
    --max_seq_length 384   \
    --doc_stride 128

### Hyperparameter tuning


In [None]:
# import model
from transformers import AutoModel

model = AutoModel.from_pretrained("henryk/bert-base-multilingual-cased-finetuned-dutch-squad2")

In [None]:
# function for finding optimal hyperparameters using LLRD
def AdamW_LLRD(model):
    
    opt_parameters = []    # To be passed to the optimizer (only parameters of the layers you want to update).
    named_parameters = list(model.named_parameters()) 
        
    # According to AAAMLP book by A. Thakur, we generally do not use any decay 
    # for bias and LayerNorm.weight layers.
    no_decay = ["bias", "LayerNorm.bias", "LayerNorm.weight"]
    init_lr = 3.5e-6 
    head_lr = 3.6e-6
    lr = init_lr
    
    # === Pooler and regressor ======================================================  
    
    params_0 = [p for n,p in named_parameters if ("pooler" in n or "regressor" in n) 
                and any(nd in n for nd in no_decay)]
    params_1 = [p for n,p in named_parameters if ("pooler" in n or "regressor" in n)
                and not any(nd in n for nd in no_decay)]
    
    head_params = {"params": params_0, "lr": head_lr, "weight_decay": 0.0}    
    opt_parameters.append(head_params)
        
    head_params = {"params": params_1, "lr": head_lr, "weight_decay": 0.01}    
    opt_parameters.append(head_params)
                
    # === 12 Hidden layers ==========================================================
    
    for layer in range(11,-1,-1):        
        params_0 = [p for n,p in named_parameters if f"encoder.layer.{layer}." in n 
                    and any(nd in n for nd in no_decay)]
        params_1 = [p for n,p in named_parameters if f"encoder.layer.{layer}." in n 
                    and not any(nd in n for nd in no_decay)]
        
        layer_params = {"params": params_0, "lr": lr, "weight_decay": 0.0}
        opt_parameters.append(layer_params)   
                            
        layer_params = {"params": params_1, "lr": lr, "weight_decay": 0.01}
        opt_parameters.append(layer_params)       
        
        lr *= 0.9     
        
    # === Embeddings layer ==========================================================
    
    params_0 = [p for n,p in named_parameters if "embeddings" in n 
                and any(nd in n for nd in no_decay)]
    params_1 = [p for n,p in named_parameters if "embeddings" in n
                and not any(nd in n for nd in no_decay)]
    
    embed_params = {"params": params_0, "lr": lr, "weight_decay": 0.0} 
    opt_parameters.append(embed_params)
        
    embed_params = {"params": params_1, "lr": lr, "weight_decay": 0.01} 
    opt_parameters.append(embed_params)        
    
    return transformers.AdamW(opt_parameters, lr=init_lr)

In [None]:
AdamW_LLRD(model)

Using this approach, we checked the optimal hyperparameters for every model. <br>
Due to time constraints and computational power, not all options were tested. <br>
Testing of all optimal hyperparameter groups can be seen as Future Work.