### This notebook addresses how to convert a model from keras .h5 to onnx within the context of YoloV3

If you need support for converting darknet to keras, please refer to [this github](https://github.com/qqwweee/keras-yolo3).

We will the library [tf2onnx](https://github.com/onnx/tensorflow-onnx). If not installed, run the following command:

In [None]:
!pip install -U tf2onnx

In [1]:
import tensorflow as tf
import tf2onnx

2023-07-01 18:55:17.679382: I tensorflow/core/util/port.cc:110] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2023-07-01 18:55:17.716552: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2023-07-01 18:55:17.866815: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2023-07-01 18:55:17.867863: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


##### Though keras models tend to support dynamic shapes, many edge devices do not. So we convert the model to a static shape and preprocess the images appropriately before invoking the model during inference.

In [2]:
model = tf.keras.models.load_model('yolov3-aerial.h5')
model_config = model.get_config()
model_config['layers'][0]['config']['batch_input_shape'] = (1, 416, 416, 3)

2023-07-01 18:53:36.898327: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:982] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2023-07-01 18:53:37.008705: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1956] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...




In [3]:
new_model = model.__class__.from_config(model_config)

# copy over all weights
weights = [layer.get_weights() for layer in model.layers[1:]]
for layer, weights in zip(new_model.layers[1:], weights):
    layer.set_weights(weights)

##### Compiling the model should automatically calculate the shapes of the internal layers now that we defined the shape of the input layer

In [4]:
new_model.compile()

In [5]:
new_model.summary()

Model: "model"
__________________________________________________________________________________________________
 Layer (type)                   Output Shape         Param #     Connected to                     
 image_input (InputLayer)       [(1, 416, 416, 3)]   0           []                               
                                                                                                  
 conv2d (Conv2D)                (1, 416, 416, 32)    864         ['image_input[0][0]']            
                                                                                                  
 batch_normalization (BatchNorm  (1, 416, 416, 32)   128         ['conv2d[0][0]']                 
 alization)                                                                                       
                                                                                                  
 leaky_re_lu (LeakyReLU)        (1, 416, 416, 32)    0           ['batch_normalization[0][0]']

In [6]:
# save compiled model
new_model.save('yolov3-aerial-compiled.h5')

In [2]:
model = tf.keras.models.load_model('yolov3-aerial-compiled.h5')
model.summary()

2023-07-01 18:55:25.458247: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:982] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2023-07-01 18:55:25.586996: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1956] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...


Model: "model"
__________________________________________________________________________________________________
 Layer (type)                   Output Shape         Param #     Connected to                     
 image_input (InputLayer)       [(1, 416, 416, 3)]   0           []                               
                                                                                                  
 conv2d (Conv2D)                (1, 416, 416, 32)    864         ['image_input[0][0]']            
                                                                                                  
 batch_normalization (BatchNorm  (1, 416, 416, 32)   128         ['conv2d[0][0]']                 
 alization)                                                                                       
                                                                                                  
 leaky_re_lu (LeakyReLU)        (1, 416, 416, 32)    0           ['batch_normalization[0][0]']

##### We can now convert the model to onnx format

It is worth noting that, while onnx tends to be a good intermediate format for various applications, not all targets support the same set of operations. For instance, tensorrt does not support a 'tf_half_pixel_for_nn' operation. We can account for this by setting the target to 'tensorrt' (tf2onnx will automatically replace unsupported operations with their nearest equivalents). It is important to account for this when converting the model to your target platform.

In [3]:
tf2onnx.convert.from_keras(model, output_path="yolov3-aerial.onnx", target="tensorrt")

2023-07-01 18:55:38.774940: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:982] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2023-07-01 18:55:38.774986: I tensorflow/core/grappler/devices.cc:66] Number of eligible GPUs (core count >= 8, compute capability >= 0.0): 1
2023-07-01 18:55:38.775103: I tensorflow/core/grappler/clusters/single_machine.cc:358] Starting new session
2023-07-01 18:55:38.775471: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:982] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2023-07-01 18:55:38.775484: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1956] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/

##### We can now test the model is working as expected

In [None]:
import onnxruntime as onnx_rt
import numpy as np

sess = onnx_rt.InferenceSession("yolov3-aerial.onnx", providers=["CUDAExecutionProvider"])

2023-05-19 19:26:28.793807100 [W:onnxruntime:Default, onnxruntime_pybind_state.cc:541 CreateExecutionProviderInstance] Failed to create CUDAExecutionProvider. Please reference https://onnxruntime.ai/docs/reference/execution-providers/CUDA-ExecutionProvider.html#requirements to ensure all dependencies are met.
2023-05-19 19:26:28.975900707 [W:onnxruntime:Default, upsamplebase.h:102 UpsampleBase] `tf_half_pixel_for_nn` is deprecated since opset 13, yet this opset 13 model uses the deprecated attribute
2023-05-19 19:26:28.975921657 [W:onnxruntime:Default, upsamplebase.h:102 UpsampleBase] `tf_half_pixel_for_nn` is deprecated since opset 13, yet this opset 13 model uses the deprecated attribute


In [2]:
%%timeit

outputs = sess.run(None, {"image_input": np.random.rand(1, 416, 416, 3).astype(np.float32)})

129 ms ± 1.8 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
