#  Quantization on pretrained models with using fiftyone for downloading datasets

https://voxel51.com/docs/fiftyone/tutorials/evaluate_detections.html

## Setup

If you haven't already, install FiftyOne:

In [1]:
!pip install fiftyone

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting fiftyone
  Downloading fiftyone-0.18.0-py3-none-any.whl (2.3 MB)
[K     |████████████████████████████████| 2.3 MB 13.9 MB/s 
Collecting kaleido
  Downloading kaleido-0.2.1-py2.py3-none-manylinux1_x86_64.whl (79.9 MB)
[K     |████████████████████████████████| 79.9 MB 134 kB/s 
[?25hCollecting mongoengine==0.24.2
  Downloading mongoengine-0.24.2-py3-none-any.whl (108 kB)
[K     |████████████████████████████████| 108 kB 71.2 MB/s 
[?25hCollecting motor>=2.3
  Downloading motor-3.1.1-py3-none-any.whl (56 kB)
[K     |████████████████████████████████| 56 kB 5.3 MB/s 
Collecting pprintpp
  Downloading pprintpp-0.4.0-py2.py3-none-any.whl (16 kB)
Collecting aiofiles
  Downloading aiofiles-22.1.0-py3-none-any.whl (14 kB)
Collecting Deprecated
  Downloading Deprecated-1.2.13-py2.py3-none-any.whl (9.6 kB)
Collecting fiftyone-brain<0.10,>=0.9.2
  Downloading fiftyone_brain-0.9.2-py3-n

In this tutorial, we'll use an off-the-shelf [Faster R-CNN detection model](https://pytorch.org/docs/stable/torchvision/models.html#faster-r-cnn) provided by PyTorch. To use it, you'll need to install `torch` and `torchvision`, if necessary.

In [2]:
!pip install torch torchvision

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


The following snippet will download the pretrained model from the web and load it:

In [3]:
import torch
import torchvision

# Run the model on GPU if it is available
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

# Load a pre-trained Faster R-CNN model
#model = torchvision.models.detection.fasterrcnn_resnet50_fpn(pretrained=True)
model = torchvision.models.detection.ssd300_vgg16(weights="SSD300_VGG16_Weights.COCO_V1")
model.to(device)
model.eval()

print("Model ready")

Downloading: "https://download.pytorch.org/models/ssd300_vgg16_coco-b556d3b4.pth" to /root/.cache/torch/hub/checkpoints/ssd300_vgg16_coco-b556d3b4.pth


  0%|          | 0.00/136M [00:00<?, ?B/s]

Model ready


In [4]:
print("cuda:0" if torch.cuda.is_available() else "cpu")

cuda:0


We'll perform our analysis on the validation split of the [COCO dataset](https://cocodataset.org/#home), which is conveniently available for download via the [FiftyOne Dataset Zoo](https://voxel51.com/docs/fiftyone/user_guide/dataset_zoo/datasets.html#coco-2017).

The snippet below will download the validation split and load it into FiftyOne.

In [5]:
import fiftyone as fo
import fiftyone.zoo as foz

dataset = foz.load_zoo_dataset(
    "coco-2017",
    split="validation",
    dataset_name="evaluate-detections-tutorial",
)
dataset.persistent = True

Migrating database to v0.18.0


INFO:fiftyone.migrations.runner:Migrating database to v0.18.0


Downloading split 'validation' to '/root/fiftyone/coco-2017/validation' if necessary


INFO:fiftyone.zoo.datasets:Downloading split 'validation' to '/root/fiftyone/coco-2017/validation' if necessary


Downloading annotations to '/root/fiftyone/coco-2017/tmp-download/annotations_trainval2017.zip'


INFO:fiftyone.utils.coco:Downloading annotations to '/root/fiftyone/coco-2017/tmp-download/annotations_trainval2017.zip'


 100% |██████|    1.9Gb/1.9Gb [8.3s elapsed, 0s remaining, 252.2Mb/s]       


INFO:eta.core.utils: 100% |██████|    1.9Gb/1.9Gb [8.3s elapsed, 0s remaining, 252.2Mb/s]       


Extracting annotations to '/root/fiftyone/coco-2017/raw/instances_val2017.json'


INFO:fiftyone.utils.coco:Extracting annotations to '/root/fiftyone/coco-2017/raw/instances_val2017.json'


Downloading images to '/root/fiftyone/coco-2017/tmp-download/val2017.zip'


INFO:fiftyone.utils.coco:Downloading images to '/root/fiftyone/coco-2017/tmp-download/val2017.zip'


 100% |██████|    6.1Gb/6.1Gb [24.8s elapsed, 0s remaining, 258.5Mb/s]      


INFO:eta.core.utils: 100% |██████|    6.1Gb/6.1Gb [24.8s elapsed, 0s remaining, 258.5Mb/s]      


Extracting images to '/root/fiftyone/coco-2017/validation/data'


INFO:fiftyone.utils.coco:Extracting images to '/root/fiftyone/coco-2017/validation/data'


Writing annotations to '/root/fiftyone/coco-2017/validation/labels.json'


INFO:fiftyone.utils.coco:Writing annotations to '/root/fiftyone/coco-2017/validation/labels.json'


Dataset info written to '/root/fiftyone/coco-2017/info.json'


INFO:fiftyone.zoo.datasets:Dataset info written to '/root/fiftyone/coco-2017/info.json'


Loading 'coco-2017' split 'validation'


INFO:fiftyone.zoo.datasets:Loading 'coco-2017' split 'validation'


 100% |███████████████| 5000/5000 [40.9s elapsed, 0s remaining, 139.9 samples/s]      


INFO:eta.core.utils: 100% |███████████████| 5000/5000 [40.9s elapsed, 0s remaining, 139.9 samples/s]      


Dataset 'evaluate-detections-tutorial' created


INFO:fiftyone.zoo.datasets:Dataset 'evaluate-detections-tutorial' created


Let's inspect the dataset to see what we downloaded:

In [6]:
# Print some information about the dataset
print(dataset)

Name:        evaluate-detections-tutorial
Media type:  image
Num samples: 5000
Persistent:  True
Tags:        []
Sample fields:
    id:           fiftyone.core.fields.ObjectIdField
    filepath:     fiftyone.core.fields.StringField
    tags:         fiftyone.core.fields.ListField(fiftyone.core.fields.StringField)
    metadata:     fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.metadata.ImageMetadata)
    ground_truth: fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.labels.Detections)


In [7]:
# Print a ground truth detection
sample = dataset.first()
print(sample.ground_truth.detections[0])

<Detection: {
    'id': '637cb2e1429314bebada4cdd',
    'attributes': {},
    'tags': [],
    'label': 'potted plant',
    'bounding_box': [
        0.37028125,
        0.3345305164319249,
        0.038593749999999996,
        0.16314553990610328,
    ],
    'mask': None,
    'confidence': None,
    'index': None,
    'supercategory': 'furniture',
    'iscrowd': 0,
}>


Note that the ground truth detections are stored in the `ground_truth` field of the samples.

Before we go further, let's launch the [FiftyOne App](https://voxel51.com/docs/fiftyone/user_guide/app.html) and use the GUI to explore the dataset visually:

In [8]:
session = fo.launch_app(dataset)

## Add predictions to dataset

Now let's generate some predictions to analyze.

The code below performs inference with the Faster R-CNN model on a randomly chosen subset of 1000 samples from the dataset and stores the resulting predictions in a `faster_rcnn` field of the samples. 

In [9]:
# Choose a random subset of 1000 samples to add predictions to
predictions_view = dataset.take(1000, seed=51)

In [10]:
print(predictions_view.first())

<SampleView: {
    'id': '637cb2ed429314bebada76a2',
    'media_type': 'image',
    'filepath': '/root/fiftyone/coco-2017/validation/data/000000151051.jpg',
    'tags': ['validation'],
    'metadata': <ImageMetadata: {
        'size_bytes': None,
        'mime_type': None,
        'width': 640,
        'height': 478,
        'num_channels': None,
    }>,
    'ground_truth': <Detections: {
        'detections': [
            <Detection: {
                'id': '637cb2ed429314bebada767e',
                'attributes': {},
                'tags': [],
                'label': 'person',
                'bounding_box': [
                    0.20631249999999998,
                    0.42043933054393307,
                    0.2070625,
                    0.2446234309623431,
                ],
                'mask': None,
                'confidence': None,
                'index': None,
                'supercategory': 'person',
                'iscrowd': 0,
            }>,
            <Detect

In [11]:
from PIL import Image
from torchvision.transforms import functional as func

import fiftyone as fo

In [12]:
# Get class list
classes = dataset.default_classes

# Add predictions to samples
with fo.ProgressBar() as pb:
    for sample in pb(predictions_view):
        # Load image
        image = Image.open(sample.filepath)
        image = func.to_tensor(image).to(device)
        c, h, w = image.shape
        
        # Perform inference
        preds = model([image])[0]
        labels = preds["labels"].cpu().detach().numpy()
        scores = preds["scores"].cpu().detach().numpy()
        boxes = preds["boxes"].cpu().detach().numpy()
        
        # Convert detections to FiftyOne format
        detections = []
        for label, score, box in zip(labels, scores, boxes):
            # Convert to [top-left-x, top-left-y, width, height]
            # in relative coordinates in [0, 1] x [0, 1]
            x1, y1, x2, y2 = box
            rel_box = [x1 / w, y1 / h, (x2 - x1) / w, (y2 - y1) / h]

            detections.append(
                fo.Detection(
                    label=classes[label],
                    bounding_box=rel_box,
                    confidence=score
                )
            )
        
        # Save predictions to dataset
        sample["faster_rcnn"] = fo.Detections(detections=detections)
        sample.save()

print("Finished adding predictions")

 100% |███████████████| 1000/1000 [3.1m elapsed, 0s remaining, 5.5 samples/s]      


INFO:eta.core.utils: 100% |███████████████| 1000/1000 [3.1m elapsed, 0s remaining, 5.5 samples/s]      


Finished adding predictions


Let's load `predictions_view` in the App to visualize the predictions that we added:

### Confidence thresholding in Python

FiftyOne also provides the ability to [write expressions](https://voxel51.com/docs/fiftyone/user_guide/using_views.html#filtering) that match, filter, and sort detections based on their attributes. See [using DatasetViews](https://voxel51.com/docs/fiftyone/user_guide/using_views.html) for full details.

For example, we can programmatically generate a view that contains only detections whose `confidence` is at least `0.75` as follows:

In [13]:
from fiftyone import ViewField as F

# Only contains detections with confidence >= 0.75
high_conf_view = predictions_view.filter_labels("faster_rcnn", F("confidence") > 0.75, only_matches=False)

Note the `only_matches=False` argument. When filtering labels, any samples that no longer contain labels would normally be removed from the view. However, this is not desired when performing evaluations since it can skew your results between views. We set `only_matches=False` so that all samples will be retained, even if some no longer contain labels.

In [14]:
# Print some information about the view
print(high_conf_view)

Dataset:     evaluate-detections-tutorial
Media type:  image
Num samples: 1000
Sample fields:
    id:           fiftyone.core.fields.ObjectIdField
    filepath:     fiftyone.core.fields.StringField
    tags:         fiftyone.core.fields.ListField(fiftyone.core.fields.StringField)
    metadata:     fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.metadata.ImageMetadata)
    ground_truth: fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.labels.Detections)
    faster_rcnn:  fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.labels.Detections)
View stages:
    1. Take(size=1000, seed=51)
    2. FilterLabels(field='faster_rcnn', filter={'$gt': ['$$this.confidence', 0.75]}, only_matches=False, trajectories=False)


In [15]:
# Print a prediction from the view to verify that its confidence is > 0.75
sample = high_conf_view.first()
print(sample.faster_rcnn.detections[0])

<Detection: {
    'id': '637cb318429314bebadaf012',
    'attributes': {},
    'tags': [],
    'label': 'person',
    'bounding_box': [
        0.20957474708557128,
        0.4165088541836918,
        0.207650089263916,
        0.27272866859595646,
    ],
    'mask': None,
    'confidence': 0.9438897967338562,
    'index': None,
}>


Now let's load our view in the App to view the predictions that we programmatically selected:

## Evaluate detections

Now that we have samples with ground truth and predicted objects, let's use FiftyOne to evaluate the quality of the detections.

FiftyOne provides a powerful [evaluation API](https://voxel51.com/docs/fiftyone/user_guide/evaluation.html) that contains a collection of methods for performing evaluation of model predictions. Since we're working with object detections here, we'll use [detection evaluation](https://voxel51.com/docs/fiftyone/user_guide/evaluation.html#detections).

### Running evaluation

We can run evaluation on our samples via [evaluate_detections()](https://voxel51.com/docs/fiftyone/api/fiftyone.core.collections.html#fiftyone.core.collections.SampleCollection.evaluate_detections). Note that this method is available on both the `Dataset` and `DatasetView` classes, which means that we can run evaluation on our `high_conf_view` to assess the quality of only the high confidence predictions in our dataset.

By default, this method will use the [COCO evaluation protocol](https://cocodataset.org/#detection-eval), plus some extra goodies that we will use later.

In [16]:
# Evaluate the predictions in the `faster_rcnn` field of our `high_conf_view`
# with respect to the objects in the `ground_truth` field
results = high_conf_view.evaluate_detections(
    "faster_rcnn",
    gt_field="ground_truth",
    eval_key="eval",
    compute_mAP=True,
)

Evaluating detections...


INFO:fiftyone.utils.eval.detection:Evaluating detections...


 100% |███████████████| 1000/1000 [27.3s elapsed, 0s remaining, 37.9 samples/s]      


INFO:eta.core.utils: 100% |███████████████| 1000/1000 [27.3s elapsed, 0s remaining, 37.9 samples/s]      


Performing IoU sweep...


INFO:fiftyone.utils.eval.coco:Performing IoU sweep...


 100% |███████████████| 1000/1000 [14.4s elapsed, 0s remaining, 66.6 samples/s]      


INFO:eta.core.utils: 100% |███████████████| 1000/1000 [14.4s elapsed, 0s remaining, 66.6 samples/s]      


### Aggregate results

The `results` object returned by the evaluation routine provides a number of convenient methods for analyzing our predictions.

For example, let's print a classification report for the top-10 most common classes in the dataset:

In [17]:
# Get the 10 most common classes in the dataset
counts = dataset.count_values("ground_truth.detections.label")
classes_top10 = sorted(counts, key=counts.get, reverse=True)[:10]

# Print a classification report for the top-10 classes
results.print_report(classes=classes_top10)

               precision    recall  f1-score   support

       person       0.98      0.28      0.43      2176
          car       0.96      0.12      0.21       457
        chair       0.87      0.09      0.16       304
         book       1.00      0.00      0.01       283
       bottle       0.71      0.03      0.05       187
          cup       0.95      0.09      0.16       222
 dining table       0.71      0.12      0.20       127
traffic light       1.00      0.03      0.05       116
         bowl       0.70      0.06      0.11       119
      handbag       0.00      0.00      0.00        95

    micro avg       0.96      0.18      0.30      4086
    macro avg       0.79      0.08      0.14      4086
 weighted avg       0.92      0.18      0.29      4086



We can also compute the mean average-precision (mAP) of our detector:

In [18]:
print(results.mAP())

0.14640761435595998


Since [evaluate_detections()](https://voxel51.com/docs/fiftyone/api/fiftyone.core.collections.html#fiftyone.core.collections.SampleCollection.evaluate_detections) uses the official [COCO evaluation protocol](https://cocodataset.org/#detection-eval), this mAP value will match what `pycocotools` would report.

We can also view some precision-recall (PR) curves for specific classes of our model:

In [19]:
plot = results.plot_pr_curves(classes=["person", "car"])
plot.show()

In [20]:
print(model)

SSD(
  (backbone): SSDFeatureExtractorVGG(
    (features): Sequential(
      (0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (1): ReLU(inplace=True)
      (2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (3): ReLU(inplace=True)
      (4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
      (5): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (6): ReLU(inplace=True)
      (7): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (8): ReLU(inplace=True)
      (9): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
      (10): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (11): ReLU(inplace=True)
      (12): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (13): ReLU(inplace=True)
      (14): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (15): ReLU(inplace=

In [21]:
import os

def print_size_of_model(model_):
    """ Prints the real size of the model """
    torch.save(model_.state_dict(), "temp.p")
    print('Size (MB):', os.path.getsize("temp.p")/1e6)
    os.remove('temp.p')

print_size_of_model(model)

Size (MB): 142.592912


Problem with quantization in Torch is that Torch doesn't know how to change dtype of Conv2D.

In [22]:
import copy

orig_model = copy.deepcopy(model)

## FP16


In [23]:
import torch.quantization
from torch.quantization import QuantStub, DeQuantStub

print(torch.quantization.default_qconfig)

QConfig(activation=functools.partial(<class 'torch.ao.quantization.observer.MinMaxObserver'>, quant_min=0, quant_max=127){}, weight=functools.partial(<class 'torch.ao.quantization.observer.MinMaxObserver'>, dtype=torch.qint8, qscheme=torch.per_tensor_symmetric){})


In [24]:
"""
model = copy.deepcopy(orig_model)
model.eval()

# Fuse Conv, bn and relu
#myModel.fuse_model()

#torch.quantization.fuse_modules(model, inplace=True)

#model_int8 = torch.nn.Sequential(torch.quantization.QuantStub(), model, torch.quantization.DeQuantStub())
model_fp16 = torch.quantization.QuantWrapper(model)

# Specify quantization configuration
# Start with simple min/max range estimation and per-tensor quantization of weights
#model_fp16.qconfig = torch.ao.quantization.default_qconfig

model_fp16.qconfig = torch.ao.quantization.QConfig(
    activation=torch.ao.quantization.observer.MinMaxObserver.with_args(dtype=torch.qint8),
    weight=torch.ao.quantization.observer.default_observer.with_args(dtype=torch.qint8))

print(model_fp16.qconfig)
torch.ao.quantization.prepare(model_fp16, inplace=True)

with torch.inference_mode():
  for _ in range(10):
    x = torch.rand(1,3, 640, 478).to(device)
    model_fp16(x)

# Convert to quantized model
torch.ao.quantization.convert(model_fp16, inplace=True)
"""
print("")




In [25]:
"""
model_fp16 = torch.quantization.quantize_dynamic(
    model,  # the original model
    {torch.nn.Linear, torch.nn.Conv2d},  # a set of layers to dynamically quantize
    dtype=torch.float16
)  # the target dtype for quantized weights
"""

model_fp16 = copy.deepcopy(orig_model).type(torch.float16)

print_size_of_model(model_fp16)

Size (MB): 71.309264


In [26]:
#predictions_view = dataset.take(1000, seed=50)

In [27]:
# Get class list
classes = dataset.default_classes

# Add predictions to samples
with fo.ProgressBar() as pb:
    for sample in pb(predictions_view):
        # Load image
        image = Image.open(sample.filepath)
        image = func.to_tensor(image).to(device).to(torch.float16)
        c, h, w = image.shape
        
        # Perform inference
        preds = model_fp16([image])[0]
        labels = preds["labels"].cpu().detach().numpy()
        scores = preds["scores"].cpu().detach().numpy()
        boxes = preds["boxes"].cpu().detach().numpy()
        
        # Convert detections to FiftyOne format
        detections = []
        for label, score, box in zip(labels, scores, boxes):
            # Convert to [top-left-x, top-left-y, width, height]
            # in relative coordinates in [0, 1] x [0, 1]
            x1, y1, x2, y2 = box
            rel_box = [x1 / w, y1 / h, (x2 - x1) / w, (y2 - y1) / h]

            detections.append(
                fo.Detection(
                    label=classes[label],
                    bounding_box=rel_box,
                    confidence=score
                )
            )
        
        # Save predictions to dataset
        sample["faster_rcnn"] = fo.Detections(detections=detections)
        sample.save()

print("Finished adding predictions")

 100% |███████████████| 1000/1000 [4.1m elapsed, 0s remaining, 4.2 samples/s]       


INFO:eta.core.utils: 100% |███████████████| 1000/1000 [4.1m elapsed, 0s remaining, 4.2 samples/s]       


Finished adding predictions


In [28]:
high_conf_view_fp16 = predictions_view.filter_labels("faster_rcnn", F("confidence") > 0.75, only_matches=False)
print(high_conf_view_fp16)

Dataset:     evaluate-detections-tutorial
Media type:  image
Num samples: 1000
Sample fields:
    id:           fiftyone.core.fields.ObjectIdField
    filepath:     fiftyone.core.fields.StringField
    tags:         fiftyone.core.fields.ListField(fiftyone.core.fields.StringField)
    metadata:     fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.metadata.ImageMetadata)
    ground_truth: fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.labels.Detections)
    faster_rcnn:  fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.labels.Detections)
    eval_tp:      fiftyone.core.fields.IntField
    eval_fp:      fiftyone.core.fields.IntField
    eval_fn:      fiftyone.core.fields.IntField
View stages:
    1. Take(size=1000, seed=51)
    2. FilterLabels(field='faster_rcnn', filter={'$gt': ['$$this.confidence', 0.75]}, only_matches=False, trajectories=False)


In [29]:
results_fp16 = high_conf_view_fp16.evaluate_detections(
    "faster_rcnn",
    gt_field="ground_truth",
    eval_key="eval",
    compute_mAP=True,
)

Evaluating detections...


INFO:fiftyone.utils.eval.detection:Evaluating detections...


 100% |███████████████| 1000/1000 [24.6s elapsed, 0s remaining, 37.7 samples/s]      


INFO:eta.core.utils: 100% |███████████████| 1000/1000 [24.6s elapsed, 0s remaining, 37.7 samples/s]      


Performing IoU sweep...


INFO:fiftyone.utils.eval.coco:Performing IoU sweep...


 100% |███████████████| 1000/1000 [14.3s elapsed, 0s remaining, 67.9 samples/s]      


INFO:eta.core.utils: 100% |███████████████| 1000/1000 [14.3s elapsed, 0s remaining, 67.9 samples/s]      


In [30]:
# Get the 10 most common classes in the dataset
counts = dataset.count_values("ground_truth.detections.label")
classes_top10 = sorted(counts, key=counts.get, reverse=True)[:10]

# Print a classification report for the top-10 classes
results_fp16.print_report(classes=classes_top10)

               precision    recall  f1-score   support

       person       0.98      0.28      0.43      2175
          car       0.96      0.12      0.21       457
        chair       0.87      0.09      0.16       304
         book       1.00      0.00      0.01       283
       bottle       0.71      0.03      0.05       187
          cup       0.95      0.09      0.16       222
 dining table       0.75      0.12      0.20       127
traffic light       1.00      0.03      0.05       116
         bowl       0.70      0.06      0.11       119
      handbag       0.00      0.00      0.00        95

    micro avg       0.96      0.18      0.30      4085
    macro avg       0.79      0.08      0.14      4085
 weighted avg       0.92      0.18      0.29      4085



In [31]:
print(results_fp16.mAP())

0.14625477027883566


## INT8

In [32]:
"""
per_channel_quantized_model = copy.deepcopy(orig_model)
per_channel_quantized_model.eval()
#per_channel_quantized_model.fuse_model()
per_channel_quantized_model.qconfig = torch.ao.quantization.get_default_qconfig('fbgemm')
print(per_channel_quantized_model.qconfig)

torch.ao.quantization.prepare(per_channel_quantized_model, inplace=True)

torch.ao.quantization.convert(per_channel_quantized_model, inplace=True)

print("")
"""

'\nper_channel_quantized_model = copy.deepcopy(orig_model)\nper_channel_quantized_model.eval()\n#per_channel_quantized_model.fuse_model()\nper_channel_quantized_model.qconfig = torch.ao.quantization.get_default_qconfig(\'fbgemm\')\nprint(per_channel_quantized_model.qconfig)\n\ntorch.ao.quantization.prepare(per_channel_quantized_model, inplace=True)\n\ntorch.ao.quantization.convert(per_channel_quantized_model, inplace=True)\n\nprint("")\n'

In [33]:
"""
model_int8 = copy.deepcopy(orig_model)

model_int8.eval().to('cpu')
#model_int8.fuse_model()
model_int8.qconfig = torch.quantization.get_default_qconfig('fbgemm')
torch.quantization.prepare(model_int8, inplace=True)
torch.quantization.convert(model_int8, inplace=True)
"""

"\nmodel_int8 = copy.deepcopy(orig_model)\n\nmodel_int8.eval().to('cpu')\n#model_int8.fuse_model()\nmodel_int8.qconfig = torch.quantization.get_default_qconfig('fbgemm')\ntorch.quantization.prepare(model_int8, inplace=True)\ntorch.quantization.convert(model_int8, inplace=True)\n"

In [34]:

model = copy.deepcopy(orig_model)
model.eval()

# Fuse Conv, bn and relu
#myModel.fuse_model()

#torch.quantization.fuse_modules(model, inplace=True)

#model_int8 = torch.nn.Sequential(torch.quantization.QuantStub(), model, torch.quantization.DeQuantStub())
model_int8 = torch.quantization.QuantWrapper(model)

# Specify quantization configuration
# Start with simple min/max range estimation and per-tensor quantization of weights
#model_int8.qconfig = torch.ao.quantization.default_qconfig

model_int8.qconfig = torch.ao.quantization.QConfig(
    activation=torch.ao.quantization.observer.MinMaxObserver.with_args(dtype=torch.qint8),
    weight=torch.ao.quantization.observer.default_observer.with_args(dtype=torch.qint8))

print(model_int8.qconfig)
torch.ao.quantization.prepare(model_int8, inplace=True)

with torch.inference_mode():
  for _ in range(10):
    x = torch.rand(1,3, 640, 478).to(device)
    model_int8(x)

# Convert to quantized model
torch.ao.quantization.convert(model_int8, inplace=True)

print("")

QConfig(activation=functools.partial(<class 'torch.ao.quantization.observer.MinMaxObserver'>, dtype=torch.qint8){}, weight=functools.partial(functools.partial(<class 'torch.ao.quantization.observer.MinMaxObserver'>, quant_min=0, quant_max=127){}, dtype=torch.qint8){})



In [35]:
"""
x = torch.rand(1,3, 640, 478).to("cpu")
print(x)
model_int8 = model_int8.type(torch.int8)
model_int8(x)
"""

'\nx = torch.rand(1,3, 640, 478).to("cpu")\nprint(x)\nmodel_int8 = model_int8.type(torch.int8)\nmodel_int8(x)\n'

In [36]:
"""
import functools

conf = torch.ao.quantization.default_qconfig
print(conf)

#conf.activation = functools.partial(torch.ao.quantization.observer.MinMaxObserver, quant_min=0, quant_max=255)
print(torch.ao.quantization.QConfig(
    activation=torch.ao.quantization.observer.MinMaxObserver.with_args(dtype=torch.qint8),
    weight=torch.ao.quantization.observer.default_observer.with_args(dtype=torch.qint8)))
"""

'\nimport functools\n\nconf = torch.ao.quantization.default_qconfig\nprint(conf)\n\n#conf.activation = functools.partial(torch.ao.quantization.observer.MinMaxObserver, quant_min=0, quant_max=255)\nprint(torch.ao.quantization.QConfig(\n    activation=torch.ao.quantization.observer.MinMaxObserver.with_args(dtype=torch.qint8),\n    weight=torch.ao.quantization.observer.default_observer.with_args(dtype=torch.qint8)))\n'

In [37]:
"""
from torch.quantization import quantize_fx
m = copy.deepcopy(orig_model)
m.eval()
qconfig_dict = {"": torch.quantization.get_default_qconfig("fbgemm")}
# Prepare
model_prepared = quantize_fx.prepare_fx(m, qconfig_dict)
# Calibrate - Use representative (validation) data.

model_int8 = quantize_fx.convert_fx(model_prepared)
"""

'\nfrom torch.quantization import quantize_fx\nm = copy.deepcopy(orig_model)\nm.eval()\nqconfig_dict = {"": torch.quantization.get_default_qconfig("fbgemm")}\n# Prepare\nmodel_prepared = quantize_fx.prepare_fx(m, qconfig_dict)\n# Calibrate - Use representative (validation) data.\n\nmodel_int8 = quantize_fx.convert_fx(model_prepared)\n'

In [38]:
model_int8 = torch.quantization.quantize_dynamic(
    orig_model,  # the original model
    {torch.nn.Linear, torch.nn.Conv2d},  # a set of layers to dynamically quantize
    dtype=torch.qint8)

In [39]:
print_size_of_model(model_int8)

Size (MB): 142.592912


As you can see, the model is quantised in the code representation, however it still doesn't work.

The size should be around 37 MB

In [40]:
print(model)

SSD(
  (backbone): SSDFeatureExtractorVGG(
    (features): Sequential(
      (0): QuantizedConv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), scale=3.4091553688049316, zero_point=-3, padding=(1, 1))
      (1): ReLU(inplace=True)
      (2): QuantizedConv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), scale=11.399210929870605, zero_point=-12, padding=(1, 1))
      (3): ReLU(inplace=True)
      (4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
      (5): QuantizedConv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), scale=27.139928817749023, zero_point=32, padding=(1, 1))
      (6): ReLU(inplace=True)
      (7): QuantizedConv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), scale=31.4085693359375, zero_point=6, padding=(1, 1))
      (8): ReLU(inplace=True)
      (9): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
      (10): QuantizedConv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), scale=29.787199020385742, zero_point=31, padding=(1, 1))

In [41]:
#predictions_view = dataset.take(1000, seed=52)

In [42]:
import numpy as np

# Get class list
classes = dataset.default_classes


# Add predictions to samples
with fo.ProgressBar() as pb:
    for sample in pb(predictions_view):
        # Load image
        image = Image.open(sample.filepath)
        #image = (func.to_tensor(image).to(device)*255).to(torch.int8)
        image = func.to_tensor(image).to(device)
        
        #print(image)
        #print(type(image))
        #print(image.shape)

        c, h, w = image.shape

        images = torch.stack([image])
        #print(images)
        
        # Perform inference
        preds = model_int8(images)[0]
        labels = preds["labels"].cpu().detach().numpy()
        scores = preds["scores"].cpu().detach().numpy()
        boxes = preds["boxes"].cpu().detach().numpy()
        
        # Convert detections to FiftyOne format
        detections = []
        for label, score, box in zip(labels, scores, boxes):
            # Convert to [top-left-x, top-left-y, width, height]
            # in relative coordinates in [0, 1] x [0, 1]
            x1, y1, x2, y2 = box
            rel_box = [x1 / w, y1 / h, (x2 - x1) / w, (y2 - y1) / h]

            detections.append(
                fo.Detection(
                    label=classes[label],
                    bounding_box=rel_box,
                    confidence=score
                )
            )
        
        # Save predictions to dataset
        sample["faster_rcnn"] = fo.Detections(detections=detections)
        sample.save()

print("Finished adding predictions")

 100% |███████████████| 1000/1000 [4.7m elapsed, 0s remaining, 4.1 samples/s]      


INFO:eta.core.utils: 100% |███████████████| 1000/1000 [4.7m elapsed, 0s remaining, 4.1 samples/s]      


Finished adding predictions


We expected the runtime to decrease to around 1.3 minutes, however due to some bugs in Pytorch we were unable to get to this.

In [43]:
high_conf_view_int8 = predictions_view.filter_labels("faster_rcnn", F("confidence") > 0.75, only_matches=False)
print(high_conf_view_int8)

Dataset:     evaluate-detections-tutorial
Media type:  image
Num samples: 1000
Sample fields:
    id:           fiftyone.core.fields.ObjectIdField
    filepath:     fiftyone.core.fields.StringField
    tags:         fiftyone.core.fields.ListField(fiftyone.core.fields.StringField)
    metadata:     fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.metadata.ImageMetadata)
    ground_truth: fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.labels.Detections)
    faster_rcnn:  fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.labels.Detections)
    eval_tp:      fiftyone.core.fields.IntField
    eval_fp:      fiftyone.core.fields.IntField
    eval_fn:      fiftyone.core.fields.IntField
View stages:
    1. Take(size=1000, seed=51)
    2. FilterLabels(field='faster_rcnn', filter={'$gt': ['$$this.confidence', 0.75]}, only_matches=False, trajectories=False)


In [44]:
results_int8 = high_conf_view_int8.evaluate_detections(
    "faster_rcnn",
    gt_field="ground_truth",
    eval_key="eval",
    compute_mAP=True,
)

Evaluating detections...


INFO:fiftyone.utils.eval.detection:Evaluating detections...


 100% |███████████████| 1000/1000 [27.0s elapsed, 0s remaining, 38.4 samples/s]      


INFO:eta.core.utils: 100% |███████████████| 1000/1000 [27.0s elapsed, 0s remaining, 38.4 samples/s]      


Performing IoU sweep...


INFO:fiftyone.utils.eval.coco:Performing IoU sweep...


 100% |███████████████| 1000/1000 [14.4s elapsed, 0s remaining, 66.2 samples/s]      


INFO:eta.core.utils: 100% |███████████████| 1000/1000 [14.4s elapsed, 0s remaining, 66.2 samples/s]      


In [45]:
# Get the 10 most common classes in the dataset
counts = dataset.count_values("ground_truth.detections.label")
classes_top10 = sorted(counts, key=counts.get, reverse=True)[:10]

# Print a classification report for the top-10 classes
results_int8.print_report(classes=classes_top10)

               precision    recall  f1-score   support

       person       0.98      0.28      0.43      2176
          car       0.96      0.12      0.21       457
        chair       0.87      0.09      0.16       304
         book       1.00      0.00      0.01       283
       bottle       0.71      0.03      0.05       187
          cup       0.95      0.09      0.16       222
 dining table       0.71      0.12      0.20       127
traffic light       1.00      0.03      0.05       116
         bowl       0.70      0.06      0.11       119
      handbag       0.00      0.00      0.00        95

    micro avg       0.96      0.18      0.30      4086
    macro avg       0.79      0.08      0.14      4086
 weighted avg       0.92      0.18      0.29      4086



In [46]:
print(results_int8.mAP())

0.14640761435595998
