<a href="https://colab.research.google.com/github/BedinEduardo/Colab_Repositories/blob/master/PyTorch_Backends.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Backend OverView

ExecuTorch backends provide hardware acceleration for a specific hardware target.
In order to achieve maximum performance on target hardware, ExecuTorch optimizes the model for a specifc backend during the export and lowering process.
This means that the resulting .pte file is specialized for the specific hardware.
In order to deploy to multiple backends, such as Core ML on iOS and Arm CPU or Android. - Generate a dedicated .pte file for each.

## XNNPACK Backend

The XNNPACK Backend delegate is the ExecuTorch solution for CPU execution on mobile CPUs.

### Features
* Wide operator supports on Arm and x86 CPU, available on any modelr mobile phone
* Support for a wide variety of quantization schemes and quantized operators.
* Supports fp32 and fp16 activations
* Supports 9-bit quantization

### Target Requirements
* ARM64 on Android, iOS, macOS, Linux, Windows
* ARMv6 on Linux
* x86 and x86-64 Windows, linux, macOS, Androind, iOS simulator

### Usinf the XNNPACK Backend

To target XNNPACK backend during the export and lowering process, pass an instance of the `XnnpackPartioner` to `to_edge_transform_and_lower`

In [None]:
!pip install executorch



In [None]:
import torch
import torchvision.models as models
from torchvision.models.mobilenetv2 import MobileNet_V2_Weights
from executorch.backends.xnnpack.partition.xnnpack_partitioner import XnnpackPartitioner
from executorch.exir import to_edge_transform_and_lower

In [None]:
mobilenet_v2 = models.mobilenetv2.mobilenet_v2(weights=MobileNet_V2_Weights.DEFAULT).eval()
sample_inputs = (torch.randn(1,3,224,224),)

In [None]:
et_program = to_edge_transform_and_lower(
    torch.export.export(mobilenet_v2, sample_inputs),
    partitioner=[XnnpackPartitioner()],
).to_executorch()

In [None]:
with open("mv2_xnn_pack.pte","wb") as file:
  et_program.write_to_file(file)

## Partioner API

The XNNPACK partioner API allows for configuration of the model delegation to XNNPACK.
Passing the `XnnpackPartioner` instance with no additonal parameters will run as much of the model as possible on the XNNPACK backend.

* `configs`: Control which operators are delegated to XNNPACK
* `configs_precision`: filter operators by data type. Default `ConfigPrecisionType.FP32`, `ConfigPrecisionType.STATIC_QUANT`, or `ConfigPrecisionType.DYNAMIC_QUANT`
* `per_op_mode`: If true, emit individual delegate calls for every operator.
* `verbose`: If true, print additional information during lowering.

## Testing the Model

After generating the XNNPACK-delegate .pte, the model can be tested from Python using the executorch runtime python bindings.

## Quantization

The XNNPACK delegate can also be used as a backend to execute symmetrically quantized models.
`XNNPACKQuantizer`. `Quantizers` are backend specific, which means the `XNNPACKQuantizer`is configured to quantize to leverage the quantized operators offered by the XNNPACK library.

### Supported Quantization Schemes

The XNNPACK delegate supports the following quantization schemes:
* 8-bit symmetric weights with 8-bit assymetric activations - PF2E quantization flow.

### 8-bit Quantization using PT2 Flow

1. Build instance `XnnpackQuantizer` class
2. `torch.export.export_for_trainig` prepare for quantization
3. `prepare_pt2e` model for quantization
4. For static quantization, run the prepared model with representative samples to calibrate the quantized tensor activation ranges.
5. `convert_pt2e`
6. Export and lower the model using the standard flow.


In [None]:
import torch
from torchvision.models.mobilenetv2 import MobileNet_V2_Weights
from executorch.backends.xnnpack.quantizer.xnnpack_quantizer import XNNPACKQuantizer
from executorch.backends.xnnpack.partition.xnnpack_partitioner import XnnpackPartitioner
from executorch.exir import to_edge_transform_and_lower
from torch.ao.quantization.quantize_pt2e import convert_pt2e, prepare_pt2e
from torch.ao.quantization.quantizer.xnnpack_quantizer import get_symmetric_quantization_config

In [None]:
model = models.mobilenetv2.mobilenet_v2(weights=MobileNet_V2_Weights.DEFAULT).eval()
sample_inputs = (torch.randn(1, 3, 224, 224), )

In [None]:
qparams = get_symmetric_quantization_config(is_per_channel=True)
quantizer = XNNPACKQuantizer()
quantizer.set_global(qparams)

<executorch.backends.xnnpack.quantizer.xnnpack_quantizer.XNNPACKQuantizer at 0x7ba4c6884a10>

In [None]:
train_ep = torch.export.export_for_training(model, sample_inputs).module()
prepared_model = prepare_pt2e(train_ep, quantizer)



In [None]:
for cal_sample in [torch.randn(1,3,224,224)]:
  prepared_model(cal_sample)

In [None]:
quantized_model = convert_pt2e(prepared_model)

et_program = to_edge_transform_and_lower( # (6)
    torch.export.export(quantized_model, sample_inputs),
    partitioner=[XnnpackPartitioner()],
).to_executorch()

AttributeError: 'GraphModule' object has no attribute '_frozen_param0'