# Quantizing the ResNet50 Model
This notebook shows how to quantize a pre-trained Fireball model using Codebook Quantization. It assumes 
that a trained ```ResNet50``` model already exists in the ```Models``` directory. Please refer to the notebook
[Image Classification with ResNet50](ResNet50.ipynb) for more info about using a pretrained ResNet50 model.

If you want to quantize a Low-Rank model, you can use [this](ResNet50-Reduce.ipynb) notebook
to reduce the number of parameters in ```ResNet50```.

Model quantization reduces the size of the model by using less number of bits for each floating 
point parameter. Fireball uses a codebook quantization method based on K-Means clustering algorithm.

[quantizeModel](https://interdigitalinc.github.io/Fireball/html/source/model.html#fireball.model.Model.quantizeModel) is a class method that receives the file names of input and output to the 
quantization process. It also receives the quantization parameters such as ```minBits```, ```maxBits```, 
and ```mseUb```.

Fireball can create models with 2-bit to 12-bit quantization (Codebook sizes 4 to 4096). For the quantized
model to be compatible with [CoreML](https://developer.apple.com/documentation/coreml), we need to make sure the codebook size is a power of 2, less than or equal to 256, and only "weight" parameters are quantized (not biases)

## Quantizing a pretrained model
The code in the following cell quantizes the model specified by ```orgFileName``` and creates a new quantized model.

For each parameter tensor of the model, we try quantization bits 2 to 8 and find the best quantization that satisfies the specified MSE value.

To get better quantization (smaller model) increase ```mse```; to get better performance (larger model)
use a smaller ```mse```.

In [1]:
from fireball import Model
gpus='0,1,2,3'

# orgFileName = "Models/ResNet50.fbm"        # Original model
# orgFileName = "Models/ResNet50P.fbm"       # Pruned
# orgFileName = "Models/ResNet50PR.fbm"      # Pruned - Retrained
# orgFileName = "Models/ResNet50R.fbm"       # Reduced
# orgFileName = "Models/ResNet50RP.fbm"      # Reduced - Pruned
# orgFileName = "Models/ResNet50RP.fbm"      # Reduced - Pruned - Retrained
# orgFileName = "Models/ResNet50RR.fbm"      # Reduced - Retrained
# orgFileName = "Models/ResNet50RRP.fbm"     # Reduced - Retrained - Pruned
orgFileName = "Models/ResNet50RRPR.fbm"    # Reduced - Retrained - Pruned - Retrained

quantizedFileName = orgFileName.replace('.fbm', 'Q.fbm')  # Append 'Q' to the filename for "Quantized"

# quantizing the model
qResults = Model.quantizeModel(orgFileName, quantizedFileName, 
                               mseUb=0.000005, minBits=3, maxBits=8, reuseEmptyClusters=True)


Reading model parameters from "Models/ResNet50RRPR.fbm" ... Done.
Quantizing 363 tensors using 20 workers ... 
   Quantization Parameters:
        mseUb .............. 5e-06
        pdfFactor .......... 0.1
        reuseEmptyClusters . True
        weightsOnly ........ True
        minBits ............ 3
        maxBits ............ 8
Quantization complete (4.67 Sec.)).
Now saving to "Models/ResNet50RRPRQ.fbm" ... Done.

Size of Data: 33,070,285 -> 9,970,210 bytes
Model File Size: 33,103,189 -> 10,005,727 bytes


Compare the data size before and after quantization. 

## Evaluate the quantized model
Let's see the impact on model performance.

In [2]:
from fireball import Model
from fireball.datasets.imagenet import ImageNetDSet

# Create the test dataset for evaluation.
testDs = ImageNetDSet.makeDatasets('Test', batchSize=256, preProcessing='Crop256Cafe', numWorkers=8)

model = Model.makeFromFile(quantizedFileName, testDs=testDs, gpus=gpus)
model.initSession()
results = model.evaluate(topK=5)


Reading from "Models/ResNet50RRPRQ.fbm" ... Done.
Creating the fireball model "ResNet50" ... Done.
  Processed 50000 Sample. (Time: 71.15 Sec.)                              

Observed Accuracy: 0.680320
Top-5 Accuracy:   0.886840


## Re-train and evaluate
Fireball can re-train the quantized models by modifying the quantization codebooks during the re-training. The following cell creates a "tune" dataset by sampling from the training dataset and uses it to "fine-tune" the quantized model for one epoch.

In [3]:
tuneDs = ImageNetDSet.makeDatasets('tune', batchSize=256, preProcessing='Crop256Cafe', numWorkers=8)
print(tuneDs)

model = Model.makeFromFile(quantizedFileName, trainDs=tuneDs, testDs=testDs,
                           numEpochs=1,
                           learningRate=1e-13,
                           optimizer="Adam",
                           gpus=gpus)
model.initSession()
model.train()
results = model.evaluate(topK=5)

retrainedFileName = quantizedFileName.replace('.fbm', 'R.fbm')  # Append 'R' to the filename for "Re-trained"
model.save(retrainedFileName)

ImageNetDSet Dataset Info:
    Dataset Name ................................... tune
    Dataset Location ............................... /home/shahab/data/ImageNet/
    Number of Classes .............................. 1000
    Number of Samples .............................. 64000
    Sample Shape ................................... (224, 224, 3)
    Preprocessing .................................. Crop256Cafe
    Number of Workers .............................. 8


Reading from "Models/ResNet50RRPRQ.fbm" ... Done.
Creating the fireball model "ResNet50" ... Done.
+--------+---------+---------------+-----------+-------------------+
| Epoch  | Batch   | Learning Rate | Loss      | Valid/Test Error  |
+--------+---------+---------------+-----------+-------------------+
| 1      | 249     | 9.9999998e-14 | 0.0799867 | N/A        31.75% |
+--------+---------+---------------+-----------+-------------------+
Total Training Time: 290.36 Seconds
  Processed 50000 Sample. (Time: 55.74 Sec.)    

## Where do I go from here?

[Exporting ResNet50 Model to ONNX](ResNet50-ONNX.ipynb)

[Exporting ResNet50 Model to TensorFlow](ResNet50-TF.ipynb)

[Exporting ResNet50 Model to CoreML](ResNet50-CoreML.ipynb)

---

[Fireball Playgrounds](../Contents.ipynb)

[Image Classification with ResNet50](ResNet50.ipynb)

[Reducing number of parameters of ResNet50 Model](ResNet50-Reduce.ipynb)

[Pruning ResNet50 Model](ResNet50-Prune.ipynb)
