# Quantizing the MobileNetV2 Model
This notebook shows how to quantize a pre-trained Fireball model using Codebook Quantization. It assumes 
that a trained ```MobileNetV2``` model already exists in the ```Models``` directory. Please refer to the notebook
[Image Classification with MobileNetV2](MobileNetV2.ipynb) for more info about using a pretrained MobileNetV2 model.

If you want to quantize a Low-Rank model, you can use [this](MobileNetV2-Reduce.ipynb) notebook
to reduce the number of parameters in ```MobileNetV2```.

Model quantization reduces the size of the model by using less number of bits for each floating 
point parameter. Fireball uses a codebook quantization method based on K-Means clustering algorithm.

```quantizeModel``` is a class method that receives the file names of input and output to the 
quantization process. It also receives the quantization parameters such as ```minBits```, ```maxBits```, 
```mse```, and ```pdfFactor```.

Fireball can create models with 2-bit to 12-bit quantization (Codebook sizes 4 to 4096). For the quantized
model to be compatible with ```CoreML```, we need to make sure the codebook size is a power of 2, less than or equal to 256, and only "weight" parameters are quantized (not biases)

## Quantizing a pretrained model
The code in the following cell quantizes the model specified by ```orgFileName``` and creates a new quantized model.

For each parameter tensor of the model, we try quantization bits 2 to 8 and find the best quantization that satisfies the specified MSE value.

To get better quantization (smaller model) increase ```mse```; to get better performance (larger model)
use a smaller ```mse```.

In [1]:
from fireball import Model
gpus='0,1,2,3'

# orgFileName = "Models/MobileNetV2.fbm"        # Original model
# orgFileName = "Models/MobileNetV2P.fbm"       # Pruned
# orgFileName = "Models/MobileNetV2PR.fbm"      # Pruned - Retrained
# orgFileName = "Models/MobileNetV2R.fbm"       # Reduced
# orgFileName = "Models/MobileNetV2RP.fbm"      # Reduced - Pruned
# orgFileName = "Models/MobileNetV2RP.fbm"      # Reduced - Pruned - Retrained
# orgFileName = "Models/MobileNetV2RR.fbm"      # Reduced - Retrained
# orgFileName = "Models/MobileNetV2RRP.fbm"     # Reduced - Retrained - Pruned
orgFileName = "Models/MobileNetV2RRPR.fbm"    # Reduced - Retrained - Pruned - Retrained


quantizedFileName = orgFileName.replace('.fbm', 'Q.fbm')  # Append 'Q' to the filename for "Quantized"


qResults = Model.quantizeModel(orgFileName, quantizedFileName,
                               mseUb=0.000016, minBits=2, maxBits=8, reuseEmptyClusters=True)


Reading model parameters from "Models/MobileNetV2RRPR.fbm" ... Done.
Quantizing 264 tensors using 20 workers ... 
   Quantization Parameters:
        mseUb .............. 1.6e-05
        pdfFactor .......... 0.1
        reuseEmptyClusters . True
        weightsOnly ........ True
        minBits ............ 2
        maxBits ............ 8
Quantization complete (1.01 Sec.)).
Now saving to "Models/MobileNetV2RRPRQ.fbm" ... Done.

Size of Data: 7,680,163 -> 2,581,447 bytes
Model File Size: 7,726,371 -> 2,628,765 bytes


Compare the data size before and after quantization. 

## Evaluate the quantized model
Let's see the impact on model performance.

In [2]:
from fireball import Model
from fireball.datasets.imagenet import ImageNetDSet

# Create the test dataset for evaluation.
testDs = ImageNetDSet.makeDatasets('Test', batchSize=256, preProcessing='Crop256Tf', numWorkers=8)

model = Model.makeFromFile(quantizedFileName, testDs=testDs, gpus=gpus)
model.initSession()
results = model.evaluate(topK=5)


Reading from "Models/MobileNetV2RRPRQ.fbm" ... Done.
Creating the fireball model "MobileNetV2" ... Done.
  Processed 50000 Sample. (Time: 52.89 Sec.)                              

Observed Accuracy: 0.589240
Top-5 Accuracy:   0.818640


## Re-train and evaluate
Fireball can retrain the quantized models by modifying the quantization codebooks. The following cell creates a "tune" dataset by sampling from the training dataset and uses it to "fine-tune" the quantized model for 5 epochs.

If the trained model specified by ```quantizedFileName``` is already available in the ```Models``` directory, this cell shows the results of last training. If you want to force it to do the training again, you can un-remark the line at the beginning of the cell to delete the existing file. Since we use the "tuning" dataset instead of "training" dataset, this is much faster (Under 10 minutes)

In [3]:
tuneDs = ImageNetDSet.makeDatasets('tune', batchSize=256, preProcessing='Crop256Tf', numWorkers=8)
print(tuneDs)

model = Model.makeFromFile(quantizedFileName, trainDs=tuneDs, testDs=testDs,
                           numEpochs=1,
                           learningRate=1e-13,
                           optimizer="Momentum",
                           gpus=gpus)
model.initSession()
model.train()
results = model.evaluate(topK=5)

retrainedFileName = quantizedFileName.replace('.fbm', 'R.fbm')  # Append 'R' to the filename for "Re-trained"
model.save(retrainedFileName)

ImageNetDSet Dataset Info:
    Dataset Name ................................... tune
    Dataset Location ............................... /home/shahab/data/ImageNet/
    Number of Classes .............................. 1000
    Number of Samples .............................. 64000
    Sample Shape ................................... (224, 224, 3)
    Preprocessing .................................. Crop256Tf
    Number of Workers .............................. 8


Reading from "Models/MobileNetV2RRPRQ.fbm" ... Done.
Creating the fireball model "MobileNetV2" ... Done.
+--------+---------+---------------+-----------+-------------------+
| Epoch  | Batch   | Learning Rate | Loss      | Valid/Test Error  |
+--------+---------+---------------+-----------+-------------------+
| 1      | 249     | 9.9999998e-14 | 0.0487485 | N/A        36.71% |
+--------+---------+---------------+-----------+-------------------+
Total Training Time: 240.47 Seconds
  Processed 50000 Sample. (Time: 42.62 Sec.)

## Where do I go from here?

[Exporting MobileNetV2 Model to ONNX](MobileNetV2-ONNX.ipynb)

[Exporting MobileNetV2 Model to TensorFlow](MobileNetV2-TF.ipynb)

[Exporting MobileNetV2 Model to CoreML](MobileNetV2-CoreML.ipynb)

---

[Fireball Playgrounds](../Contents.ipynb)

[Image Classification with MobileNetV2](MobileNetV2.ipynb)

[Reducing number of parameters of MobileNetV2 Model](MobileNetV2-Reduce.ipynb)

[Pruning MobileNetV2 Model](MobileNetV2-Prune.ipynb)
