# Quantizing the SSD Model
This notebook shows how to quantize a pre-trained Fireball model using Codebook Quantization. It assumes 
that a trained ```SSD``` model already exists in the ```Models``` directory. Please refer to the notebook [Object Detection with SSD](SSD.ipynb) for more info about using a pretrained SSD model.

If you want to quantize a Low-Rank model, you can use [this](SSD-Reduce.ipynb) notebook to reduce the number of parameters in ```SSD```.

Model quantization reduces the size of the model by using less number of bits for each floating point parameter. Fireball uses a codebook quantization method based on K-Means clustering algorithm.

[quantizeModel](https://interdigitalinc.github.io/Fireball/html/source/model.html#fireball.model.Model.quantizeModel) is a class method that receives the file names of input and output to the 
quantization process. It also receives the quantization parameters such as ```minBits```, ```maxBits```, 
and ```mseUb```.

Fireball can create models with 2-bit to 12-bit quantization (Codebook sizes 4 to 4096). For the quantized
model to be compatible with [CoreML](https://developer.apple.com/documentation/coreml), we need to make sure the codebook size is a power of 2, less than or equal to 256, and only "weight" parameters are quantized (not biases)

## Quantizing a pretrained model
The code in the following cell quantizes the model specified by ```orgFileName``` and creates a new quantized model.

For each parameter tensor of the model, we try quantization bits 2 to 8 and find the best quantization that satisfies the specified MSE value.

To get better quantization (smaller model) increase ```mse```; to get better performance (larger model)
use a smaller ```mse```.

In [1]:
from fireball import Model
gpus='upto4'

orgFileName = "Models/SSD512RRPR.fbm"    # Reduced - Retrained - Pruned - Retrained

quantizedFileName = orgFileName.replace('.fbm', 'Q.fbm')  # Append 'Q' to the filename for "Quantized"

qResults = Model.quantizeModel(orgFileName, quantizedFileName,
                               minBits=2, maxBits=8, mseUb=3.2e-6, reuseEmptyClusters=True)


Reading model parameters from "Models/SSD512RRPR.fbm" ... Done.
Quantizing 92 tensors using 36 workers ... 
   Quantization Parameters:
        mseUb .............. 3.2e-06
        pdfFactor .......... 0.1
        reuseEmptyClusters . True
        weightsOnly ........ True
        minBits ............ 2
        maxBits ............ 8
Quantization complete (6.35 Sec.)
Now saving to "Models/SSD512RRPRQ.fbm" ... Done.

Size of Data: 66,089,335 -> 18,922,286 bytes
Model File Size: 66,097,263 -> 18,931,811 bytes


Compare the data size before and after quantization. 

## Evaluate the quantized model
Let's see the impact on model performance.

In [2]:
from fireball.datasets.coco import CocoDSet

trainDs,testDs = CocoDSet.makeDatasets('Train,Test', batchSize=64, resolution=512, keepAr=False, numWorkers=4)

model = Model.makeFromFile(quantizedFileName, testDs=testDs, gpus=gpus)
model.initSession()
results = model.evaluate()


Reading from "Models/SSD512RRPRQ.fbm" ... Done.
Creating the fireball model "SSD512" ... Done.
  Processed 5000 Sample. (Time: 54.84 Sec.)                              

Evaluating inference results for 5000 images ... 
  Calculating IoUs - Done (7.3 Seconds)                       
  Finding matches - Done (116.1 Seconds)                     
  Processing the matches - Done (3.6 Seconds)                    
Done (127.0 Seconds)

Average Precision (AP):
    IoU=0.50:0.95   Area: All      MaxDet: 100  = 0.238
    IoU=0.50        Area: All      MaxDet: 100  = 0.456
    IoU=0.75        Area: All      MaxDet: 100  = 0.226
    IoU=0.50:0.95   Area: Small    MaxDet: 100  = 0.092
    IoU=0.50:0.95   Area: Medium   MaxDet: 100  = 0.278
    IoU=0.50:0.95   Area: Large    MaxDet: 100  = 0.347
Average Recall (AR):
    IoU=0.50:0.95   Area: All      MaxDet: 1    = 0.221
    IoU=0.50:0.95   Area: All      MaxDet: 10   = 0.340
    IoU=0.50:0.95   Area: All      MaxDet: 100  = 0.365
    IoU=0.50:0.95

## Re-train and evaluate
Fireball can retrain the quantized models by modifying the quantization codebooks. Here we first create new quantized model for training and then call the [train](https://interdigitalinc.github.io/Fireball/html/source/model.html#fireball.model.Model.train) method of the model to start the training. Note that the re-training can take up to 2 hours on a 4-GPU machine.

In [3]:
model = Model.makeFromFile(quantizedFileName, trainDs=trainDs, testDs=testDs,
                           numEpochs=5,
                           learningRate=(1e-9,1e-11),
                           optimizer="Momentum",
                           gpus=gpus)
model.printNetConfig()
model.initSession()
model.train()
results = model.evaluate()

retrainedFileName = quantizedFileName.replace('.fbm', 'R.fbm')  # Append 'R' to the filename for "Re-trained"
model.save(retrainedFileName)



Reading from "Models/SSD512RRPRQ.fbm" ... Done.
Creating the fireball model "SSD512" ... Done.

Network configuration:
  Input:                     Color images of size 512x512
  Output:                    A tuple of class labels, boxes, class probabilities, and number of detections.
  Network Layers:            28
  Tower Devices:             GPU0, GPU1, GPU2, GPU3
  Total Network Parameters:  15,799,194
  Total Parameter Tensors:   92
  Trainable Tensors:         92
  Training Samples:          82,783
  Test Samples:              5,000
  Num Epochs:                5
  Batch Size:                64
  L2 Reg. Factor:            0     
  Global Drop Rate:          0   
  Learning Rate: (Exponential Decay)
    Initial Value:           0.000000001  
    Final Value:             0.00000000001
  Optimizer:                 Momentum

+--------+---------+---------------+-----------+-------------------+
| Epoch  | Batch   | Learning Rate | Loss      | Valid/Test mAP    |
+--------+---------+--

## Also look at

[Exporting SSD Model to ONNX](SSD-ONNX.ipynb)

[Exporting SSD Model to TensorFlow](SSD-TF.ipynb)

[Exporting SSD Model to CoreML](SSD-CoreML.ipynb)

---

[Fireball Playgrounds](../Contents.ipynb)

[Object Detection with SSD](SSD.ipynb)

[Reducing number of parameters of SSD Model](SSD-Reduce.ipynb)

[Pruning SSD Model](SSD-Prune.ipynb)
