<a href="https://colab.research.google.com/github/JannisWolf/evaluating-edge-accelerators/blob/JannisWolf-models/model_conversion.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In this notebook the conversion from a pytorch model to keras, tf lite and onnx happens

In [1]:
import torch
import torch.nn as nn
import torch.nn.functional as functional
import torch.optim as optim

import numpy as np

In [2]:
# remove if not using colab
from google.colab import drive
drive.mount('/content/gdrive')

Drive already mounted at /content/gdrive; to attempt to forcibly remount, call drive.mount("/content/gdrive", force_remount=True).


In [3]:
inputSize = 3*1024

In [4]:
# auto encoder class
# if relu activation is needed change here in the forward method leaky_relu to relu

class GrindNet(nn.Module):
    def __init__(self, **kwargs):
        super().__init__()
        sizes = [inputSize, 256]
        self.layers = nn.ModuleList()

        for size1, size2 in zip(sizes, sizes[1:]):
            self.layers.append(nn.Linear(in_features=size1, out_features=size2))
        for size1, size2 in zip(reversed(sizes), list(reversed(sizes))[1:]):
            self.layers.append(nn.Linear(in_features=size1, out_features=size2))
  
    def forward(self, x):
        for layer in self.layers:
            x = layer(x)
            x = functional.leaky_relu(x)
        return x

In [11]:
model = GrindNet()
path = F"/content/gdrive/My Drive/data/autoencoder/"
model_name = "net-state-3072-scaled2-standard"

In [12]:
# load pytorch model
model.load_state_dict(torch.load(path+model_name))
model.eval()

GrindNet(
  (layers): ModuleList(
    (0): Linear(in_features=3072, out_features=256, bias=True)
    (1): Linear(in_features=256, out_features=3072, bias=True)
  )
)

In [13]:
# save back to onnx
x=torch.randn(*(inputSize,))
with torch.no_grad():
     torch.onnx.export(model,
     x,
     path + "models/model.onnx",
     export_params=True,
     opset_version=10,
     do_constant_folding=True,
     input_names=['input'],
     output_names=['output'])

In [14]:
# easy way to convert to keras/tensorflow was using pytorch2keras which needed exaclty the right onnx version (optimizer class in newer onnx is deprecated)
!pip install onnx==1.8.1 onnx2keras pytorch2keras



In [15]:
import onnx
from onnx2keras import onnx_to_keras
from pytorch2keras.converter import pytorch_to_keras
from torch.autograd import Variable
import tensorflow as tf

In [16]:
np.random.seed(42)
input_np = np.random.uniform(0, 1, (3072)).astype('float32')
a = Variable(torch.FloatTensor(input_np))
print(a)

tensor([0.3745, 0.9507, 0.7320,  ..., 0.6760, 0.7066, 0.6100])


In [17]:
# convert from pytorch to keras model (needs a typical input -> a)
k_model = pytorch_to_keras(model, a, input_shapes=[(3072,)], verbose=True)

INFO:pytorch2keras:Converter is called.
DEBUG:pytorch2keras:Input_names:
DEBUG:pytorch2keras:['input_0']
DEBUG:pytorch2keras:Output_names:
DEBUG:pytorch2keras:['output_0']
INFO:onnx2keras:Converter is called.
DEBUG:onnx2keras:List input shapes:
DEBUG:onnx2keras:[(3072,)]
DEBUG:onnx2keras:List inputs:
DEBUG:onnx2keras:Input 0 -> input_0.
DEBUG:onnx2keras:List outputs:
DEBUG:onnx2keras:Output 0 -> output_0.
DEBUG:onnx2keras:Gathering weights to dictionary.
DEBUG:onnx2keras:Found weight layers.0.weight with shape (256, 3072).
DEBUG:onnx2keras:Found weight layers.0.bias with shape (256,).
DEBUG:onnx2keras:Found weight layers.1.weight with shape (3072, 256).
DEBUG:onnx2keras:Found weight layers.1.bias with shape (3072,).
DEBUG:onnx2keras:Found input input_0 with shape (3072,)
DEBUG:onnx2keras:######
DEBUG:onnx2keras:...
DEBUG:onnx2keras:Converting ONNX operation
DEBUG:onnx2keras:type: Transpose
DEBUG:onnx2keras:node_name: 5
DEBUG:onnx2keras:node_params: {'perm': [1, 0], 'change_ordering': F

graph(%input_0 : Float(3072, strides=[1], requires_grad=0, device=cpu),
      %layers.0.weight : Float(256, 3072, strides=[3072, 1], requires_grad=1, device=cpu),
      %layers.0.bias : Float(256, strides=[1], requires_grad=1, device=cpu),
      %layers.1.weight : Float(3072, 256, strides=[256, 1], requires_grad=1, device=cpu),
      %layers.1.bias : Float(3072, strides=[1], requires_grad=1, device=cpu)):
  %5 : Float(3072, 256, strides=[256, 1], device=cpu) = onnx::Transpose[perm=[1, 0]](%layers.0.weight)
  %6 : Float(256, strides=[1], device=cpu) = onnx::MatMul(%input_0, %5)
  %7 : Float(256, strides=[1], requires_grad=1, device=cpu) = onnx::Add(%layers.0.bias, %6) # /usr/local/lib/python3.7/dist-packages/torch/nn/functional.py:1847:0
  %8 : Float(256, strides=[1], requires_grad=1, device=cpu) = onnx::LeakyRelu[alpha=0.01](%7) # /usr/local/lib/python3.7/dist-packages/torch/nn/functional.py:1474:0
  %9 : Float(256, 3072, strides=[3072, 1], device=cpu) = onnx::Transpose[perm=[1, 0]](%l

DEBUG:onnx2keras:Output TF Layer -> KerasTensor(type_spec=TensorSpec(shape=(None, 256), dtype=tf.float32, name=None), name='6/MatMul:0', description="created by layer '6'")
DEBUG:onnx2keras:######
DEBUG:onnx2keras:...
DEBUG:onnx2keras:Converting ONNX operation
DEBUG:onnx2keras:type: Add
DEBUG:onnx2keras:node_name: 7
DEBUG:onnx2keras:node_params: {'change_ordering': False, 'name_policy': None}
DEBUG:onnx2keras:...
DEBUG:onnx2keras:Check if all inputs are available:
DEBUG:onnx2keras:Check input 0 (name layers.0.bias).
DEBUG:onnx2keras:The input not found in layers / model inputs.
DEBUG:onnx2keras:Found in weights, add as a numpy constant.
DEBUG:onnx2keras:Check input 1 (name 6).
DEBUG:onnx2keras:... found all, continue
DEBUG:onnx2keras:add:Convert inputs to Keras/TF layers if needed.
DEBUG:onnx2keras:Output TF Layer -> KerasTensor(type_spec=TensorSpec(shape=(None, 256), dtype=tf.float32, name=None), name='7/Add:0', description="created by layer '7'")
DEBUG:onnx2keras:######
DEBUG:onnx2ke

Tensor("Placeholder:0", shape=(256,), dtype=float32) Tensor("Placeholder_1:0", shape=(None, 256), dtype=float32)
Tensor("Placeholder:0", shape=(3072,), dtype=float32) Tensor("Placeholder_1:0", shape=(None, 3072), dtype=float32)


In [18]:
# controll if conversion was succesful (Lambda layer with no trainable parameter are apperently only const layer)
k_model.summary()

Model: "model"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
input_0 (InputLayer)            [(None, 3072)]       0                                            
__________________________________________________________________________________________________
7_const1 (Lambda)               (256,)               0           input_0[0][0]                    
__________________________________________________________________________________________________
6 (Dense)                       (None, 256)          786432      input_0[0][0]                    
__________________________________________________________________________________________________
7 (Lambda)                      (None, 256)          0           7_const1[0][0]                   
                                                                 6[0][0]                      

In [19]:
# save tensorflow model
k_model.save(path + "models/keras_model")





Tensor("model/7_const1/Const:0", shape=(256,), dtype=float32) Tensor("model/6/MatMul:0", shape=(None, 256), dtype=float32)
Tensor("model/11_const1/Const:0", shape=(3072,), dtype=float32) Tensor("model/10/MatMul:0", shape=(None, 3072), dtype=float32)
Tensor("inputs:0", shape=(256,), dtype=float32) Tensor("inputs_1:0", shape=(None, 256), dtype=float32)
Tensor("inputs:0", shape=(3072,), dtype=float32) Tensor("inputs_1:0", shape=(None, 3072), dtype=float32)
Tensor("inputs:0", shape=(3072,), dtype=float32) Tensor("inputs_1:0", shape=(None, 3072), dtype=float32)
Tensor("inputs:0", shape=(256,), dtype=float32) Tensor("inputs_1:0", shape=(None, 256), dtype=float32)
Tensor("7_const1/Const:0", shape=(256,), dtype=float32) Tensor("6/MatMul:0", shape=(None, 256), dtype=float32)
Tensor("11_const1/Const:0", shape=(3072,), dtype=float32) Tensor("10/MatMul:0", shape=(None, 3072), dtype=float32)
Tensor("7_const1/Const:0", shape=(256,), dtype=float32) Tensor("6/MatMul:0", shape=(None, 256), dtype=float3

INFO:tensorflow:Assets written to: /content/gdrive/My Drive/data/autoencoder/models/keras_model/assets


In [20]:
# just some checks if pytorch and tensorflow output matches and how precisely (can be skipped)

# convert pytorch tensor to tf tensor
a_tf = tf.convert_to_tensor([input_np]) 

# run inference on the model
p = model(a)
k = k_model.predict(a_tf)

# print results
print("Input Pytorch {}".format(a))
print("Input Tensorflow tensor({})".format(a_tf[0]))
print("Output Pytorch {}".format(p))
print("Output Tensorflow tensor({})".format(k[0]))

# equal function as the precision differs
def equal(l1, l2, p=False):
  '''
  Checks to which precision it is equal
  '''
  diff = abs(l1 - l2)
  max_diff = np.max(diff)
  if p:
    print("Maximum difference is {}".format(max_diff))
  for i in range(10):
    prec = 10**-i
    if max_diff > prec:
      p = i
      break
  return "Equal until 10^-{}.".format(p)

# check if the values are the same
print(equal(k,p.detach().numpy()))

Tensor("model/7_const1/Const:0", shape=(256,), dtype=float32) Tensor("model/6/MatMul:0", shape=(None, 256), dtype=float32)
Tensor("model/11_const1/Const:0", shape=(3072,), dtype=float32) Tensor("model/10/MatMul:0", shape=(None, 3072), dtype=float32)
Input Pytorch tensor([0.3745, 0.9507, 0.7320,  ..., 0.6760, 0.7066, 0.6100])
Input Tensorflow tensor([0.37454012 0.9507143  0.7319939  ... 0.6760263  0.7066299  0.6100074 ])
Output Pytorch tensor([ 2.2478,  2.2217,  3.5181,  ..., -0.0254,  3.6830,  5.7795],
       grad_fn=<LeakyReluBackward0>)
Output Tensorflow tensor([ 2.2477582   2.2216947   3.5181472  ... -0.02544572  3.6830046
  5.779535  ])
Equal until 10^-5.


In [21]:
# prerequisites of the tensorflow lite inference
SAVED_MODEL_PATH = path + 'models/model'
TFLITE_FILE_PATH = path + 'models/model.tflite'

tf.saved_model.save(
    k_model, SAVED_MODEL_PATH)

converter = tf.lite.TFLiteConverter.from_saved_model(SAVED_MODEL_PATH)
tflite_model = converter.convert()

with open(TFLITE_FILE_PATH, 'wb') as f:
  f.write(tflite_model)

Tensor("model/7_const1/Const:0", shape=(256,), dtype=float32) Tensor("model/6/MatMul:0", shape=(None, 256), dtype=float32)
Tensor("model/11_const1/Const:0", shape=(3072,), dtype=float32) Tensor("model/10/MatMul:0", shape=(None, 3072), dtype=float32)
Tensor("inputs:0", shape=(256,), dtype=float32) Tensor("inputs_1:0", shape=(None, 256), dtype=float32)
Tensor("inputs:0", shape=(3072,), dtype=float32) Tensor("inputs_1:0", shape=(None, 3072), dtype=float32)
Tensor("inputs:0", shape=(3072,), dtype=float32) Tensor("inputs_1:0", shape=(None, 3072), dtype=float32)
Tensor("inputs:0", shape=(256,), dtype=float32) Tensor("inputs_1:0", shape=(None, 256), dtype=float32)
Tensor("7_const1/Const:0", shape=(256,), dtype=float32) Tensor("6/MatMul:0", shape=(None, 256), dtype=float32)
Tensor("11_const1/Const:0", shape=(3072,), dtype=float32) Tensor("10/MatMul:0", shape=(None, 3072), dtype=float32)
Tensor("7_const1/Const:0", shape=(256,), dtype=float32) Tensor("6/MatMul:0", shape=(None, 256), dtype=float3


FOR DEVS: If you are overwriting _tracking_metadata in your class, this property has been used to save metadata in the SavedModel. The metadta field will be deprecated soon, so please move the metadata to a different file.


INFO:tensorflow:Assets written to: /content/gdrive/My Drive/data/autoencoder/models/model/assets


INFO:tensorflow:Assets written to: /content/gdrive/My Drive/data/autoencoder/models/model/assets


In [22]:
# Load the TFLite model in TFLite Interpreter
interpreter = tf.lite.Interpreter(TFLITE_FILE_PATH)
interpreter.allocate_tensors()

# Get input and output tensors.
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()

# Test the model on random input data.
input_shape = input_details[0]['shape']
input_data = np.array(np.random.random_sample(input_shape), dtype=np.float32)
interpreter.set_tensor(input_details[0]['index'], [input_np])

interpreter.invoke()

# The function `get_tensor()` returns a copy of the tensor data.
# Use `tensor()` in order to get a pointer to the tensor.
output_data = interpreter.get_tensor(output_details[0]['index'])
print(output_data)

[[ 2.2477562   2.2216935   3.518146   ... -0.02544577  3.682998
   5.7795305 ]]


In [23]:
# test output difference of pytorch vs tf vs tflite

print("Pytorch vs. Tensorflow")
print(equal(k,p.detach().numpy(), p=True) + '\n')
print("Pytorch vs. Tensorflow lite")
print(equal(output_data,p.detach().numpy(), p=True) + '\n')
print("Tensorflow vs Tensorflow lite")
print(equal(k, output_data, p=True) + '\n')

Pytorch vs. Tensorflow
Maximum difference is 1.5497207641601562e-05
Equal until 10^-5.

Pytorch vs. Tensorflow lite
Maximum difference is 9.5367431640625e-06
Equal until 10^-6.

Tensorflow vs Tensorflow lite
Maximum difference is 1.71661376953125e-05
Equal until 10^-5.



In [24]:
# test data for quantization
path2data = path + 'testseqs.npz'

def loadTensor(fileName):
  if fileName.endswith("npz"):
    return np.load(fileName, encoding="bytes", allow_pickle=True)["arr_0"]

def flattenData(data):
  data = list(data)
  for i, seq in enumerate(data):
    seq = seq[1:]
    data[i] = seq.flatten()
  data = np.array(data)
  return data

def preprocess(data):
  for seq in data:
    maximum = np.max(seq)
    minimum = np.min(seq)
    seq[:] -= minimum
    seq[:] /= maximum
  return data

testData = loadTensor(path2data)
testData = flattenData(testData)
testData = preprocess(testData)
testData = testData.astype(np.float32)
testData

array([[7.02596903e-02, 7.88275078e-02, 6.28098249e-02, ...,
        8.46290410e-01, 7.18571186e-01, 7.89748132e-01],
       [1.19231574e-01, 1.10167868e-01, 1.11724623e-01, ...,
        1.18165521e-03, 5.64435381e-04, 0.00000000e+00],
       [6.11173287e-02, 5.88974915e-02, 6.58349395e-02, ...,
        8.42723668e-01, 8.39252174e-01, 6.73859239e-01],
       ...,
       [6.24613203e-02, 6.42610490e-02, 6.10087290e-02, ...,
        8.42233777e-01, 8.37665260e-01, 6.41479552e-01],
       [1.16295986e-01, 1.08052090e-01, 1.19746648e-01, ...,
        1.15959579e-03, 1.19444542e-02, 0.00000000e+00],
       [1.02119006e-01, 1.11704506e-01, 9.95167419e-02, ...,
        0.00000000e+00, 2.75775641e-02, 1.08543202e-01]], dtype=float32)

In [25]:
# performs also the same on the real data..
print(model(torch.tensor(testData)))
print(k_model.predict(testData))

tensor([[ 6.6296e-02,  7.4708e-02,  7.6669e-02,  ...,  8.3556e-01,
          8.0965e-01,  7.8180e-01],
        [ 1.0906e-01,  1.1374e-01,  1.2028e-01,  ..., -1.1100e-03,
         -1.4907e-03, -3.5280e-03],
        [ 5.7227e-02,  6.5812e-02,  6.5342e-02,  ...,  8.3165e-01,
          8.0111e-01,  7.6677e-01],
        ...,
        [ 5.4487e-02,  6.3310e-02,  6.4840e-02,  ...,  8.3590e-01,
          7.9097e-01,  7.4725e-01],
        [ 1.0580e-01,  1.1005e-01,  1.1634e-01,  ..., -1.2727e-03,
         -1.7938e-03, -4.0187e-03],
        [ 9.5722e-02,  1.0589e-01,  1.1079e-01,  ...,  2.2146e-01,
         -3.4933e-05, -3.8865e-03]], grad_fn=<LeakyReluBackward0>)
[[ 6.62963837e-02  7.47077242e-02  7.66690522e-02 ...  8.35561275e-01
   8.09653521e-01  7.81801105e-01]
 [ 1.09055340e-01  1.13743551e-01  1.20280266e-01 ... -1.11002685e-03
  -1.49066595e-03 -3.52803431e-03]
 [ 5.72268665e-02  6.58124462e-02  6.53415322e-02 ...  8.31651568e-01
   8.01107287e-01  7.66771555e-01]
 ...
 [ 5.44868559e-02 

In [26]:
# helper function for reconstruction loss
def rmse(a,b):
  tmp = (a-b)**2
  tmp = np.sum(tmp, axis=1)
  return np.sqrt(tmp)/np.shape(a)[1]

print(rmse(testData, k_model.predict(testData)).mean())
print(rmse(model(torch.tensor(testData)).detach().numpy(), testData).mean())

0.0002584081
0.00025840814


In [27]:
# representative dataset generator needed for post training quantization
def representative_dataset():
    for d in testData:
      yield [d.astype(np.float32)]

In [28]:
# here the quantization happens
TFLITE_QUANT_FILE_PATH = path + 'models/model_quant.tflite'

converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.representative_dataset = representative_dataset
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
converter.inference_input_type = tf.int8  # or tf.uint8
converter.inference_output_type = tf.int8  # or tf.uint8
tflite_quant_model = converter.convert()

with open(TFLITE_QUANT_FILE_PATH, 'wb') as f:
  f.write(tflite_quant_model)

In [29]:
# Load the TFLite model in TFLite Interpreter
interpreter_quant = tf.lite.Interpreter(TFLITE_QUANT_FILE_PATH)
interpreter_quant.allocate_tensors()

# Get input and output tensors.
input_details = interpreter_quant.get_input_details()
output_details = interpreter_quant.get_output_details()

# Test the model on random input data.
input_shape = input_details[0]['shape']
input_data = np.array(np.random.random_sample(input_shape), dtype=np.float32)

# Check if the input type is quantized, then rescale input data to uint8
if input_details[0]['dtype'] == np.int8:
  input_scale, input_zero_point = input_details[0]["quantization"]
  test = testData[0] / input_scale + input_zero_point

test = np.expand_dims(test, axis=0).astype(input_details[0]["dtype"])

interpreter_quant.set_tensor(input_details[0]['index'], test.astype(np.int8))

interpreter_quant.invoke()

# The function `get_tensor()` returns a copy of the tensor data.
# Use `tensor()` in order to get a pointer to the tensor.
output_data = interpreter_quant.get_tensor(output_details[0]['index'])

# Convert back to normal precision
if input_details[0]['dtype'] == np.int8:
  output_scale, output_zero_point = output_details[0]["quantization"]
  output_data = (output_data - input_zero_point) * input_scale

print(output_data)

[[ 0.06623438  0.08017846  0.0871505  ... -0.17778702 -0.19870314
  -0.24750742]]


In [30]:
print(model(torch.tensor(testData[0]))[:6].detach().numpy())
print(testData[0][:6])
print(output_data[0][:6])

[0.06629637 0.07470772 0.07666905 0.07890725 0.09392202 0.05697285]
[0.07025969 0.07882751 0.06280982 0.06567997 0.06773302 0.066687  ]
[0.06623438 0.08017846 0.0871505  0.05926234 0.12898274 0.03137418]


In [31]:
input_details

[{'dtype': numpy.int8,
  'index': 0,
  'name': 'serving_default_input_0:0',
  'quantization': (0.0034860200248658657, -128),
  'quantization_parameters': {'quantized_dimension': 0,
   'scales': array([0.00348602], dtype=float32),
   'zero_points': array([-128], dtype=int32)},
  'shape': array([   1, 3072], dtype=int32),
  'shape_signature': array([  -1, 3072], dtype=int32),
  'sparsity_parameters': {}}]

In [32]:
output_details

[{'dtype': numpy.int8,
  'index': 8,
  'name': 'StatefulPartitionedCall:0',
  'quantization': (0.003861609846353531, -125),
  'quantization_parameters': {'quantized_dimension': 0,
   'scales': array([0.00386161], dtype=float32),
   'zero_points': array([-125], dtype=int32)},
  'shape': array([   1, 3072], dtype=int32),
  'shape_signature': array([  -1, 3072], dtype=int32),
  'sparsity_parameters': {}}]

In [33]:
!pip install onnxruntime



In [34]:
# testing the validity of the onnx model file
import onnxruntime

ort_session = onnxruntime.InferenceSession(path + 'models/model.onnx')

def to_numpy(tensor):
    return tensor.detach().cpu().numpy() if tensor.requires_grad else tensor.cpu().numpy()

# compute ONNX Runtime output prediction
ort_inputs = {ort_session.get_inputs()[0].name: to_numpy(x)}
ort_outs = ort_session.run(None, ort_inputs)

torch_out = model(x)

# compare ONNX Runtime and PyTorch results
np.testing.assert_allclose(to_numpy(torch_out), ort_outs[0], rtol=1e-03, atol=1e-05)

print("Exported model has been tested with ONNXRuntime, and the result looks good!")

Exported model has been tested with ONNXRuntime, and the result looks good!


In [35]:
torch_out

tensor([-2.4425e-02,  3.9142e-01, -7.7355e-03,  ...,  3.7237e+01,
         6.7363e+01,  9.6777e+01], grad_fn=<LeakyReluBackward0>)

In [36]:
ort_outs[0]

array([-2.4424529e-02,  3.9141583e-01, -7.7355183e-03, ...,
        3.7236710e+01,  6.7362564e+01,  9.6776894e+01], dtype=float32)

In [37]:
with open('/content/gdrive/My Drive/data/autoencoder/test.npy', 'wb') as f:
    np.save(f, x)

In [38]:
!curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add -

!echo "deb https://packages.cloud.google.com/apt coral-edgetpu-stable main" | sudo tee /etc/apt/sources.list.d/coral-edgetpu.list

!sudo apt-get update

!sudo apt-get install edgetpu-compiler

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0100  2537  100  2537    0     0  72485      0 --:--:-- --:--:-- --:--:-- 74617
OK
deb https://packages.cloud.google.com/apt coral-edgetpu-stable main
Hit:1 http://security.ubuntu.com/ubuntu bionic-security InRelease
Hit:2 https://packages.cloud.google.com/apt coral-edgetpu-stable InRelease
Ign:3 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64  InRelease
Hit:4 http://ppa.launchpad.net/c2d4u.team/c2d4u4.0+/ubuntu bionic InRelease
Hit:5 http://archive.ubuntu.com/ubuntu bionic InRelease
Ign:6 https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64  InRelease
Hit:7 https://cloud.r-project.org/bin/linux/ubuntu bionic-cran40/ InRelease
Hit:8 https://developer.download.nvidia.com/compute/cuda/repos/ubun

In [39]:
# convert tf lite to edge tpu model (change the paths accordingly). Here it should be paid attention which operation are running on the TPU and CPU
!edgetpu_compiler "/content/gdrive/My Drive/data/autoencoder/models/model_quant.tflite" -o "/content/gdrive/My Drive/data/autoencoder/models" 

Edge TPU Compiler version 16.0.384591198
Started a compilation timeout timer of 180 seconds.

Model compiled successfully in 216 ms.

Input model: /content/gdrive/My Drive/data/autoencoder/models/model_quant.tflite
Input size: 1.51MiB
Output model: /content/gdrive/My Drive/data/autoencoder/models/model_quant_edgetpu.tflite
Output size: 1.55MiB
On-chip memory used for caching model parameters: 769.00KiB
On-chip memory remaining for caching model parameters: 6.99MiB
Off-chip memory used for streaming uncached model parameters: 0.00B
Number of Edge TPU subgraphs: 1
Total number of operations: 4
Operation log: /content/gdrive/My Drive/data/autoencoder/models/model_quant_edgetpu.log

Model successfully compiled but not all operations are supported by the Edge TPU. A percentage of the model will instead run on the CPU, which is slower. If possible, consider updating your model to use only operations supported by the Edge TPU. For details, visit g.co/coral/model-reqs.
Number of operations tha

And that's it. We generated from a pytorch model: ONNX model, Keras model, TF lite model, TF lite quantized model and a TF lite model for the edge TPU.