# 1. Install the python library
- Using CMD window is a better choice for long log 
- Better to install in a new python env 
- Ref: https://github.com/ultralytics/ultralytics/tree/main
- Ref: https://github.com/PINTO0309/onnx2tf
- Ref: https://pypi.org/project/ethos-u-vela/

In [None]:
!pip install ultralytics

In [None]:
!pip install ethos-u-vela==3.10.0

In [None]:
!pip install -U tensorflow==2.15.0
!pip install -U onnx==1.15.0
!pip install -U nvidia-pyindex
!pip install -U onnx-graphsurgeon
!pip install -U onnxruntime==1.16.3
!pip install -U onnxsim==0.4.33
!pip install -U simple_onnx_processing_tools
!pip install -U onnx2tf
!pip install -U h5py==3.7.0
!pip install -U psutil==5.9.5
!pip install -U ml_dtypes==0.2.0

# 2. Prepare the yolov8-n model

In [1]:
from ultralytics import YOLO




In [3]:
# Load a model
model = YOLO('yolov8n.pt')  # load an official model
#model = YOLO('path/to/best.pt')  # load a custom trained model

# Export the model as onnx
model.export(format='onnx', imgsz=320)

# Export the model as tflite
# <ISSUE>: This int8.tflite doesn't have quantization parameters, so can't be converted to vela at the next step.
#model.export(format='tflite', int8=True)

Ultralytics YOLOv8.0.232 🚀 Python-3.10.9 torch-2.1.2+cpu CPU (12th Gen Intel Core(TM) i7-12700)
YOLOv8n summary (fused): 168 layers, 3151904 parameters, 0 gradients, 8.7 GFLOPs

[34m[1mPyTorch:[0m starting from 'yolov8n.pt' with input shape (1, 3, 320, 320) BCHW and output shape(s) (1, 84, 2100) (6.2 MB)

[34m[1mONNX:[0m starting export with onnx 1.15.0 opset 17...
[34m[1mONNX:[0m export success ✅ 0.4s, saved as 'yolov8n.onnx' (12.1 MB)

Export complete (1.9s)
Results saved to [1mC:\Users\USER\Desktop\ML\ultralytics[0m
Predict:         yolo predict task=detect model=yolov8n.onnx imgsz=320  
Validate:        yolo val task=detect model=yolov8n.onnx imgsz=320 data=coco.yaml  
Visualize:       https://netron.app


'yolov8n.onnx'

# 3. Convert to tflite int8 model
- Move the `yolov8n.onnx` to a directory you like
- Using CMD window is a better choice for long log

In [6]:
import os

cov_dir_name = 'yolov8n_saved_model_320'
cwd = os.getcwd()
os.chdir(os.path.join(cwd, cov_dir_name))

#!onnx2tf -i yolov8n.onnx -oiqt -qt per-tensor
!onnx2tf -i yolov8n.onnx -nuo -oiqt -qt per-tensor

os.chdir(cwd) # Change back to original path


[32mAutomatic generation of each OP name complete![0m


[32mINFO:[0m [32minput_op_name[0m: images [32mshape[0m: [1, 3, 320, 320] [32mdtype[0m: float32

[32mINFO:[0m [32m2 / 234[0m
[32mINFO:[0m [35monnx_op_type[0m: Conv[35m onnx_op_name[0m: wa/model.0/conv/Conv
[32mINFO:[0m [36m input_name.1[0m: images [36mshape[0m: [1, 3, 320, 320] [36mdtype[0m: float32
[32mINFO:[0m [36m input_name.2[0m: model.0.conv.weight [36mshape[0m: [16, 3, 3, 3] [36mdtype[0m: float32
[32mINFO:[0m [36m input_name.3[0m: model.0.conv.bias [36mshape[0m: [16] [36mdtype[0m: float32
[32mINFO:[0m [36m output_name.1[0m: wa/model.0/conv/Conv_output_0 [36mshape[0m: [1, 16, 160, 160] [36mdtype[0m: float32
[32mINFO:[0m [35mtf_op_type[0m: convolution_v2
[32mINFO:[0m [34m input.1.input[0m: [34mname[0m: tf.compat.v1.pad/Pad:0 [34mshape[0m: (1, 322, 322, 3) [34mdtype[0m: <dtype: 'float32'> 
[32mINFO:[0m [34m input.2.weights[0m: [34mshape[0m: (3, 3, 3, 16) [


Summary on the non-converted ops:
---------------------------------
 * Accepted dialects: tfl, builtin, func
 * Non-Converted Ops: 148, Total Ops 395, % non-converted = 37.47 %
 * 148 ARITH ops

- arith.constant:  148 occurrences  (f32: 131, i32: 17)



  (f32: 8)
  (f32: 19)
  (f32: 64)
  (f32: 58)
  (f32: 3)
  (f32: 59)
  (f32: 7)
  (f32: 5)
  (f32: 2)
  (f32: 1)
  (f32: 8)
  (f32: 4)
  (f32: 2)
  (f32: 4)
Summary on the non-converted ops:
---------------------------------
 * Accepted dialects: tfl, builtin, func
 * Non-Converted Ops: 148, Total Ops 526, % non-converted = 28.14 %
 * 148 ARITH ops

- arith.constant:  148 occurrences  (f16: 131, i32: 17)



  (f32: 8)
  (f32: 19)
  (f32: 64)
  (f32: 131)
  (f32: 58)
  (f32: 3)
  (f32: 59)
  (f32: 7)
  (f32: 5)
  (f32: 2)
  (f32: 1)
  (f32: 8)
  (f32: 4)
  (f32: 2)
  (f32: 4)
Summary on the non-converted ops:
---------------------------------
 * Accepted dialects: tfl, builtin, func
 * Non-Converted Ops: 86, Total Ops 395, % non-conver

# 4. Convert to vela tflite
- This yolov8-n model need ethos-u-vela==3.10.0

In [7]:
import os

cov_dir_name = 'yolov8n_saved_model_320'
cwd = os.getcwd()
os.chdir(os.path.join(cwd, cov_dir_name, 'saved_model'))

!vela --accelerator-config ethos-u55-128 yolov8n_full_integer_quant.tflite --optimise Size
#!vela --accelerator-config ethos-u55-128 yolov8n_full_integer_quant.tflite --optimise Performance

os.chdir(cwd) # Change back to original path

 - The following shape/permutations are supported for transpose:
        When ifm rank is 2: WxC -> CxW
        When ifm rank is 3: HxWxC -> WxHxC, 1xWxC -> 1xCxW, Hx1xC -> Cx1xH
        When ifm rank is 4: 1xHxWxC -> 1xWxHxC, 1x1xWxC -> 1x1xCxW, 1xHx1xC -> 1xCx1xW
   Op has ifm_shape: [1, 4, 16, 2100] and permutation is: [0 1 3 2]
 - The following shape/permutations are supported for transpose:
        When ifm rank is 2: WxC -> CxW
        When ifm rank is 3: HxWxC -> WxHxC, 1xWxC -> 1xCxW, Hx1xC -> Cx1xH
        When ifm rank is 4: 1xHxWxC -> 1xWxHxC, 1x1xWxC -> 1x1xCxW, 1xHx1xC -> 1xCx1xW
   Op has ifm_shape: [1, 40, 40, 144] and permutation is: [0 3 1 2]
 - The following shape/permutations are supported for transpose:
        When ifm rank is 2: WxC -> CxW
        When ifm rank is 3: HxWxC -> WxHxC, 1xWxC -> 1xCxW, Hx1xC -> Cx1xH
        When ifm rank is 4: 1xHxWxC -> 1xWxHxC, 1x1xWxC -> 1x1xCxW, 1xHx1xC -> 1xCx1xW
   Op has ifm_shape: [1, 20, 20, 144] and permutation is: [0 3 1 2