# Model Profiler API Examples

This demonstrates how to use the [profile_model](https://siliconlabs.github.io/mltk/docs/python_api/operations/profile.html) API.

Refer to the [Model Profiler](https://siliconlabs.github.io/mltk/docs/guides/model_profiler.html) guide for more details.

__NOTES:__  
- Click here: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/siliconlabs/mltk/blob/master/mltk/examples/profile_model.ipynb) to run this example interactively in your browser  
- Refer to the [Notebook Examples Guide](https://siliconlabs.github.io/mltk/docs/guides/notebook_examples_guide.html) for how to run this example locally in VSCode  

## Install MLTK Python Package

In [1]:
# Install the MLTK Python package (if necessary)
!pip install --upgrade silabs-mltk

## Import Python Packages

In [1]:
# Import the standard Python packages used by the examples
import os
import urllib
import shutil
import tempfile

# Import the necessary MLTK APIs
from mltk.core import profile_model
from mltk.utils.commander import query_platform

## Download .tflite model file

A `.tflite` model file is required to run these examples.  
The following code downloads a model.

__NOTE:__ Update `TFLITE_MODEL_URL` or `tflite_path` to point to your model if necesary

In [2]:
# Use .tflite mode found here:
# https://github.com/siliconlabs/mltk/tree/master/mltk/utils/test_helper/data/
# NOTE: Update this URL to point to your model if necessary
TFLITE_MODEL_URL = 'https://github.com/siliconlabs/mltk/raw/master/mltk/utils/test_helper/data/image_example1.tflite'

# Download the .tflite file and save to the temp dir
tflite_path = os.path.normpath(f'{tempfile.gettempdir()}/image_example1.tflite')
with open(tflite_path, 'wb') as dst:
    with urllib.request.urlopen(TFLITE_MODEL_URL) as src:
        shutil.copyfileobj(src, dst)

## Example 1: Profile .tflite file in basic simulator

This example profiles the `.tflite` model file in the "basic simulator" of the model profiler.

In [3]:
# Profile the tflite model using the "basic simulator"
# NOTE: Update tflite_path to point to your model if necessary
profiling_results = profile_model(tflite_path, return_estimates=True)

# Print the profiling results
print(profiling_results)

Profiling Summary
Name: image_example1
Accelerator: None
Input Shape: 1x96x96x1
Input Data Type: int8
Output Shape: 1x3
Output Data Type: int8
Flash, Model File Size (bytes): 15.7k
RAM, Runtime Memory Size (bytes): 71.5k
Operation Count: 2.6M
Multiply-Accumulate Count: 1.2M
Layer Count: 8
Unsupported Layer Count: 0
CPU Cycle Count: 13.1M
CPU Utilization (%): 0.0
Clock Rate (hz): 78.0M
Energy (J): 2.3m
J/Op: 884.5p
J/MAC: 2.0n

Model Layers
+-------+-----------------+--------+--------+------------+------------+-------------------------+--------------+-----------------------------------------------------+
| Index | OpCode          | # Ops  | # MACs | CPU Cycles | Energy (J) | Input Shape             | Output Shape | Options                                             |
+-------+-----------------+--------+--------+------------+------------+-------------------------+--------------+-----------------------------------------------------+
| 0     | conv_2d         | 1.2M   | 497.7k | 10.0M    

## Example 2: Profile .tflite file in MVP hardware simulator

This example profiles the `.tflite` model file in the MVP hardware accelerator simulator of the model profiler.

In [4]:
# Profile the tflite model using the MVP hardware accelerator simulator
# NOTE: Update tflite_path to point to your model if necessary
profiling_results = profile_model(tflite_path, accelerator='MVP', return_estimates=True)

# Print the profiling results
print(profiling_results)

Profiling Summary
Name: image_example1
Accelerator: MVP
Input Shape: 1x96x96x1
Input Data Type: int8
Output Shape: 1x3
Output Data Type: int8
Flash, Model File Size (bytes): 15.7k
RAM, Runtime Memory Size (bytes): 85.3k
Operation Count: 2.6M
Multiply-Accumulate Count: 1.2M
Layer Count: 8
Unsupported Layer Count: 0
Accelerator Cycle Count: 1.1M
CPU Cycle Count: 81.3k
CPU Utilization (%): 0.0
Clock Rate (hz): 78.0M
Energy (J): 153.0u
J/Op: 57.9p
J/MAC: 127.8p

Model Layers
+-------+-----------------+--------+--------+------------+------------+------------+-------------------------+--------------+-----------------------------------------------------+
| Index | OpCode          | # Ops  | # MACs | Acc Cycles | CPU Cycles | Energy (J) | Input Shape             | Output Shape | Options                                             |
+-------+-----------------+--------+--------+------------+------------+------------+-------------------------+--------------+---------------------------------------

## Example 3: Profile .tflite file on physical device

This example profiles the `.tflite` model file on a physically connected embedded device.

__NOTE:__ A supported development board must be connected and properly enumerated for this example to work.

In [5]:
# Determine the currently connected device
# Just print an error and return if no device is connected
try:
    platform_name = query_platform()
except Exception as e:
    print(f'Failed to determine connected device, err:\n{e}')
    sys.exit(0)

print(f'Conencted device platform: {platform_name}')

accelerator = None
if platform_name in ('brd2601a', 'brd4186b'):
    # Use the MVP hardware accelerator if the platform supports it
    accelerator = 'MVP'

# Profile the tflite model on the physical device
profiling_results = profile_model(
    tflite_path,
    accelerator=accelerator,
    use_device=True
)

# Print the profiling results
print(profiling_results)

Conencted device platform: brd2601
Profiling Summary
Name: image_example1
Accelerator: None
Input Shape: 1x96x96x1
Input Data Type: int8
Output Shape: 1x3
Output Data Type: int8
Flash, Model File Size (bytes): 15.7k
RAM, Runtime Memory Size (bytes): 71.4k
Operation Count: 2.6M
Multiply-Accumulate Count: 1.2M
Layer Count: 8
Unsupported Layer Count: 0
CPU Cycle Count: 9.5M
CPU Utilization (%): 100.0
Clock Rate (hz): 78.0M
Time (s): 119.7m
Ops/s: 22.1M
MACs/s: 10.0M
Inference/s: 8.4

Model Layers
+-------+-----------------+--------+--------+------------+----------+-------------------------+--------------+-----------------------------------------------------+
| Index | OpCode          | # Ops  | # MACs | CPU Cycles | Time (s) | Input Shape             | Output Shape | Options                                             |
+-------+-----------------+--------+--------+------------+----------+-------------------------+--------------+-----------------------------------------------------+
| 0   

## Example 4: Profile model before training

Training a model can be very time-consuming, and it is useful to know how efficiently a 
model will execute on an embedded device before investing time and energy into training it.  
For this reason, the MLTK [profile_model](https://siliconlabs.github.io/mltk/docs/python_api/operations/profile.html) API features a `build` argument to build a model and profile it _before_ the model is fully trained.

In this example, the [image_example1](https://siliconlabs.github.io/mltk/docs/python_api/models/examples/image_example1.html) model is built
at command-execution-time and profiled in the MVP hardware simulator.  
Note that _only_ the [model specification](https://siliconlabs.github.io/mltk/docs/guides/model_specification.html)  script is required, 
it does _not_ need to be trained first.

In [6]:
# Build the image_example1 model then profile it using the MVP hardware accelerator simulator
# NOTE: Since build=True, the model does NOT need to be trained first
profiling_results = profile_model('image_example1', accelerator='MVP', build=True, return_estimates=True)

# Print the profiling results
print(profiling_results)

Training:   0%|           0/3 ETA: ?s,  ?epochs/s

Epoch 1/3


0/3           ETA: ?s - 

Epoch 2/3


0/3           ETA: ?s - 

Epoch 3/3


0/3           ETA: ?s - 



INFO:tensorflow:Assets written to: E:\tmpc8yu6n46\assets


INFO:tensorflow:Assets written to: E:\tmpc8yu6n46\assets


Profiling Summary
Name: my_model
Accelerator: MVP
Input Shape: 1x96x96x1
Input Data Type: float32
Output Shape: 1x3
Output Data Type: float32
Flash, Model File Size (bytes): 15.4k
RAM, Runtime Memory Size (bytes): 85.4k
Operation Count: 2.7M
Multiply-Accumulate Count: 1.2M
Layer Count: 10
Unsupported Layer Count: 0
Accelerator Cycle Count: 1.1M
CPU Cycle Count: 415.3k
CPU Utilization (%): 0.0
Clock Rate (hz): 78.0M
Energy (J): 219.6u
J/Op: 82.0p
J/MAC: 183.5p

Model Layers
+-------+-----------------+--------+--------+------------+------------+------------+-------------------------+--------------+-----------------------------------------------------+
| Index | OpCode          | # Ops  | # MACs | Acc Cycles | CPU Cycles | Energy (J) | Input Shape             | Output Shape | Options                                             |
+-------+-----------------+--------+--------+------------+------------+------------+-------------------------+--------------+-------------------------------------