<a href="https://colab.research.google.com/github/Danny-Dasilva/Train_Custom_Model/blob/master/Copie_de_EdgeTPU_with_Keras1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Build a model by using Keras and convert it to the Edge TPU tflite file.

### Install EdgeTPU Compiler

In [1]:
%%bash

echo "deb https://packages.cloud.google.com/apt coral-edgetpu-stable main" | sudo tee /etc/apt/sources.list.d/coral-edgetpu.list
sudo apt-key adv --keyserver keyserver.ubuntu.com --recv-keys 6A030B21BA07F4FB

sudo apt update > /dev/null
sudo apt install edgetpu > /dev/null

deb https://packages.cloud.google.com/apt coral-edgetpu-stable main
Executing: /tmp/apt-key-gpghome.p74GU8Z3f4/gpg.1.sh --keyserver keyserver.ubuntu.com --recv-keys 6A030B21BA07F4FB


gpg: key 6A030B21BA07F4FB: public key "Google Cloud Packages Automatic Signing Key <gc-team@google.com>" imported
gpg: Total number processed: 1
gpg:               imported: 1




debconf: unable to initialize frontend: Dialog
debconf: (No usable dialog-like program is installed, so the dialog based frontend cannot be used. at /usr/share/perl5/Debconf/FrontEnd/Dialog.pm line 76, <> line 3.)
debconf: falling back to frontend: Readline
debconf: unable to initialize frontend: Readline
debconf: (This frontend requires a controlling tty.)
debconf: falling back to frontend: Teletype
dpkg-preconfigure: unable to re-open stdin: 


## Edge TPU with Keras

build very simple model in this notebook.

- data: Fashion MNISt
- input shape: 28 x 28
- output shape: 10
- hidden layers: only 1 dense layer

In [4]:
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.datasets import cifar10
import numpy as np
import matplotlib.pyplot as plt

print(tf.__version__)

1.14.0


In [5]:
(train_images, train_labels), (test_images, test_labels) = cifar10.load_data()

Downloading data from https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz


In [0]:
train_images = train_images / 255.0
test_images = test_images / 255.0

In [8]:
print(train_images.shape)

(50000, 32, 32, 3)


In [0]:
fashion_mnist = keras.datasets.fashion_mnist
(train_images, train_labels), (test_images, test_labels) = fashion_mnist.load_data()

train_images = train_images / 255.0
test_images = test_images / 255.0

train_images = np.reshape(train_images, [-1, 28, 28, 1])
test_images = np.reshape(test_images, [-1, 28, 28, 1])

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/train-labels-idx1-ubyte.gz
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/train-images-idx3-ubyte.gz
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/t10k-labels-idx1-ubyte.gz
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/t10k-images-idx3-ubyte.gz


### Build the model

- define build_keras_model function since we have to build model 2 times (for train and eval)

In [0]:
def build_keras_model():
    return keras.Sequential([
        tf.keras.layers.Conv2D(16, kernel_size=3, activation="relu", padding="same", use_bias=False, input_shape=(32, 32, 3)),
        keras.layers.BatchNormalization(fused=False),
        tf.keras.layers.Conv2D(32, kernel_size=3, strides=2, activation="relu", padding="same", use_bias=False),
        keras.layers.BatchNormalization(fused=False),
        tf.keras.layers.Conv2D(64, kernel_size=3, strides=2, activation="relu", padding="same", use_bias=False),
        tf.keras.layers.AveragePooling2D(pool_size=7),
        tf.keras.layers.Flatten(),
        keras.layers.Dense(10, activation='softmax')
    ])

## Train model and save it's checkpoints

- Use new Session and Graph to ensure that we can use absolutory same name of variables for train and eval phase.
- call `tf.contrib.quantize.create_training_graph` after building model since we want to do Quantization Aware Training

In [11]:
# train
train_graph = tf.Graph()
train_sess = tf.Session(graph=train_graph)

keras.backend.set_session(train_sess)
with train_graph.as_default():
    train_model = build_keras_model()
    
    tf.contrib.quantize.create_training_graph(input_graph=train_graph, quant_delay=100)
    train_sess.run(tf.global_variables_initializer())    

    train_model.compile(
        optimizer='adam',
        loss='sparse_categorical_crossentropy',
        metrics=['accuracy']
    )
    train_model.fit(train_images, train_labels, epochs=5)
    
    # save graph and checkpoints
    saver = tf.train.Saver()
    saver.save(train_sess, 'checkpoints')

Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:
  * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
  * https://github.com/tensorflow/addons
  * https://github.com/tensorflow/io (for I/O related ops)
If you depend on functionality not listed there, please file an issue.

INFO:tensorflow:Inserting fake quant op activation_Mul_quant after batch_normalization/batchnorm/mul_1
INFO:tensorflow:Inserting fake quant op activation_Add_quant after batch_normalization/batchnorm/add_1
INFO:tensorflow:Inserting fake quant op activation_Mul_quant after batch_normalization_1/batchnorm/mul_1
INFO:tensorflow:Inserting fake quant op activation_Add_quant after batch_normalization_1/batchnorm/add_1
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


In [19]:
with train_graph.as_default():
    print('sample result of original model')
    print(train_model.predict(test_images[:1]))

sample result of original model


FailedPreconditionError: ignored

### Freeze model and save it

- Create new Session and Graph
- Call `tf.contrib.quantize.create_eval_graph` and get graph_def after building model before saver.restore
- Call `saver.restore` to load the trained weights.
   - saver.restore may add unneeded variables to the graph. So we have to get the graph_def before save.restore is called.
- We can use `tf.graph_util.convert_variables_to_constants` to freeze the graph_def

In [13]:
# eval
eval_graph = tf.Graph()
eval_sess = tf.Session(graph=eval_graph)

keras.backend.set_session(eval_sess)

with eval_graph.as_default():
    keras.backend.set_learning_phase(0)
    eval_model = build_keras_model()
    tf.contrib.quantize.create_eval_graph(input_graph=eval_graph)
    eval_graph_def = eval_graph.as_graph_def()
    saver = tf.train.Saver()
    saver.restore(eval_sess, 'checkpoints')

    frozen_graph_def = tf.graph_util.convert_variables_to_constants(
        eval_sess,
        eval_graph_def,
        [eval_model.output.op.name]
    )

    with open('frozen_model.pb', 'wb') as f:
        f.write(frozen_graph_def.SerializeToString())

INFO:tensorflow:Inserting fake quant op activation_Mul_quant after batch_normalization/batchnorm/mul_1
INFO:tensorflow:Inserting fake quant op activation_Add_quant after batch_normalization/batchnorm/add_1
INFO:tensorflow:Inserting fake quant op activation_Mul_quant after batch_normalization_1/batchnorm/mul_1
INFO:tensorflow:Inserting fake quant op activation_Add_quant after batch_normalization_1/batchnorm/add_1
Instructions for updating:
Use standard file APIs to check for files with this prefix.
INFO:tensorflow:Restoring parameters from checkpoints
Instructions for updating:
Use `tf.compat.v1.graph_util.convert_variables_to_constants`
Instructions for updating:
Use `tf.compat.v1.graph_util.extract_sub_graph`
INFO:tensorflow:Froze 37 variables.
INFO:tensorflow:Converted 37 variables to const ops.


### Generate tflite file

- use QUANTIZED_UINT8 option
- Quantization Aware training adds min/max information. So we don't need  default_ranges_min default_ranges_max 
- We don't need call freeze_graph.py since the graph is already freezed.

In [14]:
%%bash

tflite_convert \
    --output_file=model.tflite \
    --graph_def_file=frozen_model.pb \
    --inference_type=QUANTIZED_UINT8 \
    --input_arrays=conv2d_input \
    --output_arrays=dense/Softmax \
    --mean_values=0 \
    --std_dev_values=255

2019-10-02 21:00:15.052071: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcuda.so.1
2019-10-02 21:00:15.082148: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-10-02 21:00:15.082949: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties: 
name: Tesla K80 major: 3 minor: 7 memoryClockRate(GHz): 0.8235
pciBusID: 0000:00:04.0
2019-10-02 21:00:15.083284: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0
2019-10-02 21:00:15.084647: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10.0
2019-10-02 21:00:15.085943: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcufft.so.10

### Check generated tflite file.
.
- Use TFLiteInterpreter to check the generated file is valid

In [15]:
# load TFLite file
interpreter = tf.lite.Interpreter(model_path=f'model.tflite')
# Allocate memory. 
interpreter.allocate_tensors()

# get some informations .
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()

print(input_details)
print(output_details)

[{'name': 'conv2d_input', 'index': 18, 'shape': array([ 1, 32, 32,  3], dtype=int32), 'dtype': <class 'numpy.uint8'>, 'quantization': (0.003921568859368563, 0)}]
[{'name': 'dense/Softmax', 'index': 21, 'shape': array([ 1, 10], dtype=int32), 'dtype': <class 'numpy.uint8'>, 'quantization': (0.00390625, 0)}]


- I'm not sure how to use quantization attribute in input/output_details. But maybe
  - If quantization attribute is (a, b), then the input data f should be transform to (f/a + b) and casted to uint8

In [0]:
def quantize(detail, data):
    shape = detail['shape']
    dtype = detail['dtype']
    a, b = detail['quantization']
    
    return (data/a + b).astype(dtype).reshape(shape)


def dequantize(detail, data):
    a, b = detail['quantization']
    
    return (data - b)*a

In [17]:
quantized_input = quantize(input_details[0], test_images[:1])
interpreter.set_tensor(input_details[0]['index'], quantized_input)

interpreter.invoke()

# The results are stored on 'index' of output_details
quantized_output = interpreter.get_tensor(output_details[0]['index'])

print('sample result of quantized model')
print(dequantize(output_details[0], quantized_output))

sample result of quantized model
[[0.00390625 0.0390625  0.06640625 0.1015625  0.015625   0.
  0.76953125 0.00390625 0.00390625 0.        ]]


### Compile the tflite file using EdgeTPU Compiler 

In [20]:
%%bash

edgetpu_compiler 'model.tflite'

Edge TPU Compiler version 2.0.267685300

Model compiled successfully in 35 ms.

Input model: model.tflite
Input size: 28.31KiB
Output model: model_edgetpu.tflite
Output size: 76.56KiB
On-chip memory available for caching model parameters: 7.95MiB
On-chip memory used for caching model parameters: 34.25KiB
Off-chip memory used for streaming uncached model parameters: 1.50KiB
Number of Edge TPU subgraphs: 1
Total number of operations: 10
Operation log: model_edgetpu.log
See the operation log file for individual operation details.


We can download the generated file.

In [0]:
from google.colab import files

files.download('model_edgetpu.tflite')