# L3: Preparing for on-device deployment


<p style="background-color:#fff6e4; padding:15px; border-width:3px; border-color:#f5ecda; border-style:solid; border-radius:6px"> ⏳ <b>Note <code>(Kernel Starting)</code>:</b> This notebook takes about 30 seconds to be ready to use. You may start and watch the video while you wait.</p>


## Capture trained model

In [1]:
from qai_hub_models.models.ffnet_40s import Model as FFNet_40s

# Load from pre-trained weights
ffnet_40s = FFNet_40s.from_pretrained()

Loading pretrained model state dict from /home/jovyan/.qaihm/models/ffnet/v1/ffnet40S/ffnet40S_dBBB_cityscapes_state_dict_quarts.pth
Initializing ffnnet40S_dBBB_mobile weights


In [2]:
import torch
input_shape = (1, 3, 1024, 2048)
example_inputs = torch.rand(input_shape)

In [3]:
traced_model = torch.jit.trace(ffnet_40s, example_inputs)

In [4]:
traced_model

FFNet40S(
  original_name=FFNet40S
  (model): FFNet(
    original_name=FFNet
    (backbone_model): ResNetS(
      original_name=ResNetS
      (conv0): Conv2d(original_name=Conv2d)
      (bn0): BatchNorm2d(original_name=BatchNorm2d)
      (relu0): ReLU(original_name=ReLU)
      (conv1): Conv2d(original_name=Conv2d)
      (bn1): BatchNorm2d(original_name=BatchNorm2d)
      (relu1): ReLU(original_name=ReLU)
      (layer1): Sequential(
        original_name=Sequential
        (0): BasicBlock(
          original_name=BasicBlock
          (conv1): Conv2d(original_name=Conv2d)
          (bn1): BatchNorm2d(original_name=BatchNorm2d)
          (conv2): Conv2d(original_name=Conv2d)
          (bn2): BatchNorm2d(original_name=BatchNorm2d)
          (relu): ReLU(original_name=ReLU)
          (downsample): Sequential(
            original_name=Sequential
            (0): Conv2d(original_name=Conv2d)
            (1): BatchNorm2d(original_name=BatchNorm2d)
          )
        )
        (1): BasicBlock

## Compile for device

<p style="background-color:#fff6ff; padding:15px; border-width:3px; border-color:#efe6ef; border-style:solid; border-radius:6px"> 💻 &nbsp; <b>Access Utils File and Helper Functions:</b> To access the files for this notebook, 1) click on the <em>"File"</em> option on the top menu of the notebook and then 2) click on <em>"Open"</em>. For more help, please see the <em>"Appendix - Tips and Help"</em> Lesson.</p>

In [5]:
import qai_hub
import qai_hub_models

from utils import get_ai_hub_api_token
ai_hub_api_token = get_ai_hub_api_token()

!qai-hub configure --api_token $ai_hub_api_token

qai-hub configuration saved to /home/jovyan/.qai_hub/client.ini
[api]
api_token = eyJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJhcHAiLCJzdWIiOiIxMTAzMzQ2IiwiYXVkIjoiV0VCIiwiaWF0IjoxNzE4NDg5MTAwLCJleHAiOjE3MjEwODExMDB9.9dYxjFVxkjE0xJP6UsLuUusZ65tv6JcOHJFwr0uLsqA
api_url = https://app.aihub.qualcomm.com
web_url = https://app.aihub.qualcomm.com
verbose = True




In [6]:
for device in qai_hub.get_devices():
    print(device.name)

Google Pixel 3 (Family)
Google Pixel 3
Google Pixel 3a
Google Pixel 3 XL
Google Pixel 4
Google Pixel 4
Google Pixel 4a
Google Pixel 5
Samsung Galaxy Tab S7
Samsung Galaxy Tab A8 (2021)
Samsung Galaxy Note 20 (Intl)
Samsung Galaxy S21 (Family)
Samsung Galaxy S21
Samsung Galaxy S21+
Samsung Galaxy S21 Ultra
Xiaomi Redmi Note 10 5G
Google Pixel 3a XL
Google Pixel 4a
Google Pixel 5 (Family)
Google Pixel 5
Google Pixel 5a 5G
Google Pixel 6
Samsung Galaxy A53 5G
Samsung Galaxy A73 5G
RB3 Gen 2 (Proxy)
QCS6490 (Proxy)
RB5 (Proxy)
QCS8250 (Proxy)
QCS8550 (Proxy)
Samsung Galaxy S21 (Family)
Samsung Galaxy S21
Samsung Galaxy S21 Ultra
Samsung Galaxy S22 (Family)
Samsung Galaxy S22 Ultra 5G
Samsung Galaxy S22 5G
Samsung Galaxy S22+ 5G
Samsung Galaxy Tab S8
Xiaomi 12 (Family)
Xiaomi 12
Xiaomi 12 Pro
Google Pixel 6 (Family)
Google Pixel 6
Google Pixel 6a
Google Pixel 7 (Family)
Google Pixel 7
Google Pixel 7 Pro
Samsung Galaxy A14 5G
Samsung Galaxy S22 5G
QCS8450 (Proxy)
XR2 Gen 2 (Proxy)
Samsung Ga

<p style="background-color:#fff6e4; padding:15px; border-width:3px; border-color:#f5ecda; border-style:solid; border-radius:6px"> ⏳ <b>Note:</b> To spread the load across various devices, we are selecting a random device. Feel free to change it to any other device you prefer.</p>

In [7]:
devices = [
    "Samsung Galaxy S22 Ultra 5G",
    "Samsung Galaxy S22 5G",
    "Samsung Galaxy S22+ 5G",
    "Samsung Galaxy Tab S8",
    "Xiaomi 12",
    "Xiaomi 12 Pro",
    "Samsung Galaxy S22 5G",
    "Samsung Galaxy S23",
    "Samsung Galaxy S23+",
    "Samsung Galaxy S23 Ultra",
    "Samsung Galaxy S24",
    "Samsung Galaxy S24 Ultra",
    "Samsung Galaxy S24+",
]

import random
selected_device = random.choice(devices)
print(selected_device)

Samsung Galaxy S24 Ultra


In [8]:
device = qai_hub.Device(selected_device)

# Compile for target device
compile_job = qai_hub.submit_compile_job(
    model=traced_model,                        # Traced PyTorch model
    input_specs={"image": input_shape},        # Input specification
    device=device,                             # Device
)

Uploading model: 100%|[34m██████████[0m| 53.6M/53.6M [00:00<00:00, 59.8MB/s]


Scheduled compile job (jegnx6mj5) successfully. To see the status and results:
    https://app.aihub.qualcomm.com/jobs/jegnx6mj5/



In [9]:
# Download and save the target model for use on-device
target_model = compile_job.get_target_model()

Waiting for compile job (jegnx6mj5) completion. Type Ctrl+C to stop waiting at any time.
    ✅ SUCCESS                          


## Exercise: Try different runtimes 

In [10]:
compile_options="--target_runtime tflite"                  # Uses TensorFlow Lite
compile_options="--target_runtime onnx"                    # Uses ONNX runtime
compile_options="--target_runtime qnn_lib_aarch64_android" # Runs with Qualcomm AI Engine

compile_job_expt = qai_hub.submit_compile_job(
    model=traced_model,                        # Traced PyTorch model
    input_specs={"image": input_shape},        # Input specification
    device=device,                             # Device
    options=compile_options,
)

Uploading model: 100%|[34m██████████[0m| 53.6M/53.6M [00:01<00:00, 54.2MB/s]


Scheduled compile job (jopr9v2kp) successfully. To see the status and results:
    https://app.aihub.qualcomm.com/jobs/jopr9v2kp/



Expore more compiler options <a href=https://app.aihub.qualcomm.com/docs/hub/compile_examples.html#compiling-pytorch-to-tflite> here</a>.

## On-Device Performance Profiling

In [11]:
from qai_hub_models.utils.printing import print_profile_metrics_from_job

# Choose device
device = qai_hub.Device(selected_device)

# Runs a performance profile on-device
profile_job = qai_hub.submit_profile_job(
    model=target_model,                       # Compiled model
    device=device,                            # Device
)

# Print summary
profile_data = profile_job.download_profile()
print_profile_metrics_from_job(profile_job, profile_data)

Scheduled profiling job (jep2jk965) successfully. To see the status and results:
    https://app.aihub.qualcomm.com/jobs/jep2jk965/

Waiting for profile job (jep2jk965) completion. Type Ctrl+C to stop waiting at any time.
    ✅ SUCCESS                          

------------------------------------------------------------
Performance results on-device for Job_Jegnx6Mj5_Optimized_Tflite.
------------------------------------------------------------
Device                          : Samsung Galaxy S24 Ultra (14)
Runtime                         : TFLITE                       
Estimated inference time (ms)   : 20.2                         
Estimated peak memory usage (MB): [1, 102]                     
Total # Ops                     : 94                           
Compute Unit(s)                 : NPU (94 ops)                 
------------------------------------------------------------
More details: https://app.aihub.qualcomm.com/jobs/jep2jk965/



## Exercise: Try different compute units

In [12]:
profile_options="--compute_unit cpu"     # Use cpu 
profile_options="--compute_unit gpu"     # Use gpu (with cpu fallback) 
profile_options="--compute_unit npu"     # Use npu (with cpu fallback) 

# Runs a performance profile on-device
profile_job_expt = qai_hub.submit_profile_job(
    model=target_model,                     # Compiled model
    device=device,                          # Device
    options=profile_options,
)

Scheduled profiling job (jqpyn1j0g) successfully. To see the status and results:
    https://app.aihub.qualcomm.com/jobs/jqpyn1j0g/



## On-Device Inference

In [13]:
sample_inputs = ffnet_40s.sample_inputs()
sample_inputs

{'image': [array([[[[0.88226926, 0.91500396, 0.38286376, ..., 0.512196  ,
            0.590911  , 0.97124064],
           [0.732221  , 0.6075113 , 0.39883906, ..., 0.67519605,
            0.2057851 , 0.50269604],
           [0.14578342, 0.9024411 , 0.9216884 , ..., 0.0463466 ,
            0.16189855, 0.39058334],
           ...,
           [0.3613289 , 0.7921508 , 0.87907416, ..., 0.632077  ,
            0.41313088, 0.15653533],
           [0.91184497, 0.04659021, 0.24430835, ..., 0.7468805 ,
            0.18837255, 0.72869563],
           [0.3692274 , 0.43727338, 0.12155724, ..., 0.51339453,
            0.77286595, 0.01265824]],
  
          [[0.6359211 , 0.5285664 , 0.02858812, ..., 0.77734596,
            0.1799339 , 0.8858922 ],
           [0.26677412, 0.3122918 , 0.73481244, ..., 0.83492815,
            0.9238087 , 0.7294582 ],
           [0.61014926, 0.5510514 , 0.3115641 , ..., 0.42396313,
            0.26608187, 0.71161884],
           ...,
           [0.34457868, 0.92750466, 0

In [14]:
torch_inputs = torch.Tensor(sample_inputs['image'][0])
torch_outputs = ffnet_40s(torch_inputs)
torch_outputs

tensor([[[[ -0.0779,  -0.8674,  -0.8154,  ...,   1.6322,   1.2770,   0.6463],
          [  0.1655,  -0.8052,  -0.8711,  ...,   3.4390,   2.9765,   1.9314],
          [  0.0763,  -1.1176,  -0.8498,  ...,   3.9516,   2.9577,   2.5406],
          ...,
          [  2.7787,   3.7968,   5.3963,  ...,   7.6916,   6.3624,   3.6923],
          [  2.6265,   3.8781,   5.6066,  ...,   7.1262,   6.2831,   3.5242],
          [  2.2885,   3.2415,   3.7434,  ...,   3.7370,   3.2692,   2.1062]],

         [[ -5.7459,  -7.7366,  -8.0728,  ...,  -9.5495,  -8.7964,  -5.6498],
          [ -7.5505, -10.1804, -10.6720,  ..., -12.6330, -12.1779,  -8.3495],
          [ -7.8820, -10.9470, -11.4877,  ..., -14.0328, -12.7077,  -8.8373],
          ...,
          [ -9.7284, -14.2336, -16.0370,  ..., -12.5573, -11.8637,  -9.0820],
          [ -8.4841, -13.1502, -14.1376,  ..., -13.0398, -11.3843,  -8.1362],
          [ -5.5457,  -8.4489,  -9.2149,  ...,  -8.7973,  -7.4970,  -5.5476]],

         [[  0.0555,  -0.3228,

In [15]:
inference_job = qai_hub.submit_inference_job(
        model=target_model,          # Compiled model
        inputs=sample_inputs,        # Sample input
        device=device,               # Device
)

Uploading dataset: 100%|[34m██████████[0m| 21.5M/21.5M [00:00<00:00, 40.8MB/s]


Scheduled inference job (j2p0kzl05) successfully. To see the status and results:
    https://app.aihub.qualcomm.com/jobs/j2p0kzl05/



In [16]:
ondevice_outputs = inference_job.download_output_data()
ondevice_outputs['output_0']

Waiting for inference job (j2p0kzl05) completion. Type Ctrl+C to stop waiting at any time.
    ✅ SUCCESS                          


dataset-dz7zwov57.h5: 100%|[34m██████████[0m| 1.11M/1.11M [00:00<00:00, 10.1MB/s]


[array([[[[ -0.07495118,  -0.8657227 ,  -0.8110352 , ...,   1.6279298 ,
             1.2734376 ,   0.64355475],
          [  0.16870119,  -0.8007813 ,  -0.8637696 , ...,   3.4296877 ,
             2.9628909 ,   1.9238282 ],
          [  0.08038331,  -1.1142579 ,  -0.8481446 , ...,   3.9375002 ,
             2.9472659 ,   2.529297  ],
          ...,
          [  2.779297  ,   3.7968752 ,   5.3906255 , ...,   7.6875005 ,
             6.355469  ,   3.685547  ],
          [  2.6250002 ,   3.8750002 ,   5.5976567 , ...,   7.117188  ,
             6.277344  ,   3.5175784 ],
          [  2.2890627 ,   3.2402346 ,   3.7382815 , ...,   3.7265627 ,
             3.2597659 ,   2.0996096 ]],
 
         [[ -5.742188  ,  -7.7343755 ,  -8.070313  , ...,  -9.546876  ,
            -8.789063  ,  -5.648438  ],
          [ -7.5468755 , -10.171876  , -10.671876  , ..., -12.625001  ,
           -12.164063  ,  -8.343751  ],
          [ -7.8789067 , -10.937501  , -11.484376  , ..., -14.015626  ,
           -12

In [17]:
from qai_hub_models.utils.printing import print_inference_metrics
print_inference_metrics(inference_job, ondevice_outputs, torch_outputs)


Comparing on-device vs. local-cpu inference for Job_Jegnx6Mj5_Optimized_Tflite.
+---------------+----------------------------+--------+
| output_name   | shape                      |   psnr |
| output_0      | torch.Size([19, 128, 256]) |  63.37 |
+---------------+----------------------------+--------+

- psnr: Peak Signal-to-Noise Ratio (PSNR). >30 dB is typically considered good.

More details: https://app.aihub.qualcomm.com/jobs/j2p0kzl05/


## Get ready for deployment!

In [18]:
target_model = compile_job.get_target_model()
_ = target_model.download("FFNet_40s.tflite")

job_jegnx6mj5_optimized_tflite_mjqyj93xq.tflite: 100%|[34m██████████[0m| 53.1M/53.1M [00:00<00:00, 65.7MB/s]
