# Reducing number of parameters of BERT/SQuAD Model
This notebook shows how to use Low-Rank decomposition to reduce the number of parameters of a BERT/SQuAD model. It assumes that a trained model already exist in the ```Models``` directory. Please refer to the notebook [Question Answering (BERT/SQuAD)](BertSquad.ipynb) for more info about training and using a BERT/SQuAD model.

## Load and evaluate the trained model

In [1]:
from fireball import Model, myPrint
from fireball.datasets.squad import SquadDSet
import time, os

gpus = "0,1,2,3"

testDs = SquadDSet.makeDatasets("Test", batchSize=128, version=1 )

model = Model.makeFromFile("Models/BertSquad.fbm", testDs=testDs, gpus=gpus)   
model.printLayersInfo()
model.initSession()
results = model.evaluate()

Initializing tokenizer from "/data/SQuAD/vocab.txt" ... Done. (Vocab Size: 30522)

Reading from "Models/BertSquad.fbm" ... Done.
Creating the fireball model "Bert-SQuAD" ... Done.

Scope            InShape       Comments                 OutShape      Activ.   Post Act.        # of Params
---------------  ------------  -----------------------  ------------  -------  ---------------  -----------
IN_EMB           ≤512 2                                 ≤512 768      None                      23,835,648 
S1_L1_LN         ≤512 768                               ≤512 768      None     DO:0.1           1,536      
S2_L1_BERT       ≤512 768      768/3072, 12 heads       ≤512 768      GELU                      7,087,872  
S2_L2_BERT       ≤512 768      768/3072, 12 heads       ≤512 768      GELU                      7,087,872  
S2_L3_BERT       ≤512 768      768/3072, 12 heads       ≤512 768      GELU                      7,087,872  
S2_L4_BERT       ≤512 768      768/3072, 12 heads       ≤512 76

## Reducing number of parameters
Here we apply Low-Rank Decomposition on different layers of the model to reduce the number of parameters. We first create a list of layers we want to apply Low-Rank Decomposition, and use the MSE value 0.0002 for all the selected layers. We then pass this information to the ```createLrModel``` method to create a new fireball model saved to the file ```Models/BertSquadR.fbm```.

In [2]:
layers = ['IN_EMB',
          'S2_L1_BERT', 'S2_L2_BERT', 'S2_L3_BERT',
          'S2_L4_BERT', 'S2_L5_BERT', 'S2_L6_BERT',
          'S2_L7_BERT', 'S2_L8_BERT', 'S2_L9_BERT',
          'S2_L10_BERT', 'S2_L11_BERT', 'S2_L12_BERT',
          'S3_L1_FC']
mse = 0.0002
layerParams = [ (layer, mse) for layer in layers]

myPrint('Now reducing number of network parameters ... ')
t0 = time.time()
model.createLrModel("Models/BertSquadR.fbm", layerParams)
myPrint('Done. (%.2f Seconds)'%(time.time()-t0)) 

Now reducing number of network parameters ... 
  IN_EMB => LR(512), MSE=0.000201, Params: 23440896->16020480 (Reduction: 31.7%)
  S2_L1_BERT
    SelfQuery => LR(296), MSE=0.000196, Params: 589824->454656 (Reduction: 22.9%)
    SelfKey => LR(296), MSE=0.000197, Params: 589824->454656 (Reduction: 22.9%)
    SelfValue => LR(208), MSE=0.000202, Params: 589824->319488 (Reduction: 45.8%)
    SelfOut => LR(192), MSE=0.000194, Params: 589824->294912 (Reduction: 50.0%)
    Intermediate => LR(464), MSE=0.000198, Params: 2359296->1781760 (Reduction: 24.5%)
    Out => LR(464), MSE=0.000204, Params: 2359296->1781760 (Reduction: 24.5%)
  S2_L2_BERT
    SelfQuery => LR(304), MSE=0.000199, Params: 589824->466944 (Reduction: 20.8%)
    SelfKey => LR(288), MSE=0.000205, Params: 589824->442368 (Reduction: 25.0%)
    SelfValue => LR(200), MSE=0.000200, Params: 589824->307200 (Reduction: 47.9%)
    SelfOut => LR(192), MSE=0.000196, Params: 589824->294912 (Reduction: 50.0%)
    Intermediate => LR(480), MSE=

Compare the new number of parameters with the original 108,893,186. 

## Evaluating the new model
Let's see the impact of this reduction to the performance of the model.

In [3]:
model = Model.makeFromFile("Models/BertSquadR.fbm", testDs=testDs, gpus=gpus)   
model.printLayersInfo()
model.initSession()
results = model.evaluate()


Reading from "Models/BertSquadR.fbm" ... Done.
Creating the fireball model "Bert-SQuAD" ... Done.

Scope            InShape       Comments                 OutShape      Activ.   Post Act.        # of Params
---------------  ------------  -----------------------  ------------  -------  ---------------  -----------
IN_EMB           ≤512 2        LR512                    ≤512 768      None                      16,415,232 
S1_L1_LN         ≤512 768                               ≤512 768      None     DO:0.1           1,536      
S2_L1_BERT       ≤512 768      768/3072, 12 heads       ≤512 768      GELU                      5,097,216  
S2_L2_BERT       ≤512 768      768/3072, 12 heads       ≤512 768      GELU                      5,177,088  
S2_L3_BERT       ≤512 768      768/3072, 12 heads       ≤512 768      GELU                      5,207,808  
S2_L4_BERT       ≤512 768      768/3072, 12 heads       ≤512 768      GELU                      5,398,272  
S2_L5_BERT       ≤512 768      768/3

## Re-train and evaluate
Here we make a new model from ```ResNet50RR.fbm``` for re-training. We then call the ```train``` method of the model to start the re-training. This usually takes around 1 hour on a 4-GPU machine.

If the trained model ```BertSquadRR.fbm``` is already available in the ```Models``` directory, this cell shows the results of last training. If you want to force it to do the training again, you can un-remark the line at the beginning of the cell to delete the existing file. Note that the re-training can take up to 1 hour on a 4-GPU machine.

In [4]:
# Un-remark the following line if you want to delete the existing re-trained model and re-train it again.
# if os.path.exists( "Models/BertSquadRR.fbm" ): os.remove( "Models/BertSquadRR.fbm" )

trainDs = SquadDSet.makeDatasets("Train", batchSize=128, version=1 )

model = Model.makeFromFile("Models/BertSquadR.fbm", trainDs=trainDs, testDs=testDs,
                           batchSize=32, numEpochs=2,
                           regFactor=0.0001,
                           learningRate=(2e-5,4e-6), optimizer='Adam',
                           saveModelFileName="Models/BertSquadRR.fbm",  # Save the re-training ...
                           savePeriod=1,                                # ... every epoch
                           saveBest=False,
                           gpus=gpus)
model.printNetConfig()
model.initSession()
model.train()
results = model.evaluate()

Initializing tokenizer from "/data/SQuAD/vocab.txt" ... Done. (Vocab Size: 30522)

Reading from "Models/BertSquadR.fbm" ... Done.
Creating the fireball model "Bert-SQuAD" ... Done.

Network configuration:
  Input:                     A tuple of TokenIds and TokenTypes.
  Output:                    2 logit vectors (with length ≤ 512) for start and end indexes of the answer.
  Network Layers:            16
  Tower Devices:             GPU0, GPU1, GPU2, GPU3
  Total Network Parameters:  81,965,570
  Total Parameter Tensors:   272
  Trainable Tensors:         272
  Training Samples:          87,844
  Test Samples:              10,833
  Num Epochs:                2
  Batch Size:                32
  L2 Reg. Factor:            0.0001
  Global Drop Rate:          0   
  Learning Rate: (Exponential Decay)
    Initial Value:           0.00002      
    Final Value:             0.000004     
  Optimizer:                 Adam
  Save model information to: Models/BertSquadRR.fbm

+--------+---------

## Where do I go from here?

[Pruning BERT/SQuAD Model](BertSquad-Prune.ipynb)

[Quantizing BERT/SQuAD Model](BertSquad-Quantize.ipynb)

[Exporting BERT/SQuAD Model to ONNX](BertSquad-ONNX.ipynb)

[Exporting BERT/SQuAD Model to TensorFlow](BertSquad-TF.ipynb)

[Exporting BERT/SQuAD Model to CoreML](BertSquad-CoreML.ipynb)

________________

[Fireball Playgrounds](../Contents.ipynb)

[Question Answering (BERT/SQuAD)](BertSquad.ipynb)
