<a class="anchor" id="top"></a>
# QKeras-Mod Explained
Author: Luca Urbinati, Date: 18/01/2024, v.1.0, Email: luca.urbinati@polito.it
***

### Content of this notebook:
[Chapter 1](#ch1): how to design a quantized model (with and without fused batch normalization) using a modified version of QKeras [1] that quantizes weights <b>and activations</b> to integers starting from a Keras model;

[Chapter 2](#ch2): <b>compare inference results</b> between the Keras model and the quantized one;

[Chapter 3](#ch3): how to <b>extract quantization factors</b> (scaling factors and zero points) from each layer of the QKeras model to match the uniform quantization theory of Tensorflow Lite [2][3];

[Chapter 4](#ch4): how to use <b>AutoQKeras</b> to search for the best mixed-precision quantized model.

### Requirements before to start
- Read [this QKeras tutorial](https://github.com/google/qkeras/blob/master/notebook/QKerasTutorial.ipynb) to become confident with QKeras.
- Install the conda environment [qkeras-env.yml](https://github.com/LucaUrbinati44/qkeras-mod/blob/main/qkeras-env.yml) provided in this repo and activate it (_conda activate qkeras-env_).
- Apply the patch to QKeras' installation to have access to the modified version of QKeras (see the [README](https://github.com/LucaUrbinati44/qkeras-mod/blob/main/README.md)). 

### Main features of QKeras:
- quantization-aware training
- per-channel quantization for weights (not for activations [4])
- by default weights quantization; if properly used, activations quantization too
- it uses affine quantization mapping formula, uniform quantization and 2*max(abs(tensor)) as floating-point range instead of the most common max-min

### Publications using this code
- Luca Urbinati and Mario R. Casu, "High-Level Design of Precision-Scalable DNN Accelerators Based on Sum-Together Multiplier", in the review process.

### References
[1] QKeras: https://github.com/google/qkeras

[2] B. Jacob et al., "Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference," arXiv:1712.05877 [cs, stat], Dec. 2017. Available: http://arxiv.org/abs/1712.05877

[3] Mao, Lei. "Quantization for Neural Networks". Lei Mao’s Log Book, May 17, 2020, https://leimao.github.io/article/Neural-Networks-Quantization/

[4] M. Nagel, M. Fournarakis, R. A. Amjad, Y. Bondarenko, M. van Baalen, and T. Blankevoort, “A White Paper on Neural Network Quantization.” arXiv, Jun. 15, 2021. Available: http://arxiv.org/abs/2106.08295

[5] H. Wu, P. Judd, X. Zhang, M. Isaev, and P. Micikevicius, “Integer Quantization for Deep Learning Inference: Principles and Empirical Evaluation,” arXiv:2004.09602 [cs, stat], Apr. 2020, Accessed: Dec. 22, 2021. [Online]. Available: http://arxiv.org/abs/2004.09602.

***
<a class="anchor" id="ch0"></a>
# 0) Import libraries, define functions and create folders

Go to next: [Ch. 1](#ch1).

Go to others: [Ch. 0](#ch0), [Ch. 1](#ch1), [Ch. 2](#ch2), [Ch. 3](#ch3), [Ch. 4](#ch4).

Go to [Top](#top).

In [1]:
import random
import numpy as np
import sys
import os
from IPython.utils import io
import math

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.models import Sequential
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.layers import *
from tensorflow.keras.models import Model

from qkeras import *
from qkeras.utils import *

tf.keras.backend.set_floatx('float64')
tf.keras.backend.floatx()

np.set_printoptions(threshold=sys.maxsize, precision=128, suppress=True)

if tf.config.list_physical_devices('GPU') == []:
    print("No GPU available")
else:
    print("GPU available")

RuntimeError: module compiled against API version 0xe but this version of numpy is 0xd

RuntimeError: module compiled against API version 0xe but this version of numpy is 0xd

No GPU available


2024-01-18 17:39:57.306367: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set


In [2]:
def set_seed(seed: int = 42) -> None:
    random.seed(seed)
    np.random.seed(seed)
    tf.random.set_seed(seed)
    tf.experimental.numpy.random.seed(seed)
    # When running on the CuDNN backend, two further options must be set
    os.environ['TF_CUDNN_DETERMINISTIC'] = '1'
    os.environ['TF_DETERMINISTIC_OPS'] = '1'
    # Set a fixed value for the hash seed
    os.environ["PYTHONHASHSEED"] = str(seed)
    print(f"Random seed set as {seed}")

set_seed(0)

Random seed set as 0


In [3]:
from tensorflow.keras.datasets import mnist

def get_data():
    (x_train, y_train), (x_test, y_test) = mnist.load_data()
    x_train = x_train.reshape(x_train.shape + (1,)).astype("float32")
    x_test = x_test.reshape(x_test.shape + (1,)).astype("float32")

    x_train /= 256.0
    x_test /= 256.0

    x_mean = np.mean(x_train, axis=0)

    x_train -= x_mean
    x_test -= x_mean

    nb_classes = np.max(y_train)+1
    y_train = to_categorical(y_train, nb_classes)
    y_test = to_categorical(y_test, nb_classes)

    return (x_train, y_train), (x_test, y_test)

input_width = 28
input_channels = 1

(x_train, y_train), (x_test, y_test) = get_data()

print(x_train.shape)
print(y_train.shape)

(60000, 28, 28, 1)
(60000, 10)


In [4]:
class FakeLayer(Layer):

    """Subclass of Layer to create an Identity or Fake layer that does not exist for TensorFlow 2.4.0:
    https://www.tensorflow.org/api_docs/python/tf/keras/layers/Identity"""
    
    def __init__(self, name=None):
        super(FakeLayer, self).__init__(name=name)

    def call(self, inputs):
        return inputs

In [5]:
def my_evaluate(predictions, y):
    index_pred = np.argmax(predictions)
    #print(index_pred)
    index_gold = np.argmax(y)
    #print(index_gold)
    if index_pred != index_gold:
        return 0
    else:
        return 1 # 1 when correct

In [6]:
def calculate_padding(inputs, kernel_size, strides, padding):
    
    # https://www.tensorflow.org/api_docs/python/tf/nn#notes_on_padding_2
    
    in_height = inputs.shape[1]
    in_width = inputs.shape[2]
    filter_height = kernel_size[0]
    filter_width = kernel_size[1]
    stride_height = strides[0]
    stride_width = strides[1]

    if padding == "valid":
        output_height = math.ceil((in_height - filter_height + 1) / stride_height)
        output_width  = math.ceil((in_width - filter_width + 1) / stride_width)

        pad_top = 0
        pad_bottom = 0
        pad_left = 0
        pad_right = 0

    elif padding == "same":
        output_height = math.ceil(in_height / stride_height)
        output_width  = math.ceil(in_width / stride_width)

        if (in_height % stride_height == 0):
            pad_along_height = max(filter_height - stride_height, 0)
        else:
            pad_along_height = max(filter_height - (in_height % stride_height), 0)
        if (in_width % stride_width == 0):
            pad_along_width = max(filter_width - stride_width, 0)
        else:
            pad_along_width = max(filter_width - (in_width % stride_width), 0)

        pad_top = pad_along_height // 2
        pad_bottom = pad_along_height - pad_top
        pad_left = pad_along_width // 2
        pad_right = pad_along_width - pad_left

    return output_height, output_width, pad_top, pad_bottom, pad_left, pad_right

***
<a class="anchor" id="ch1"></a>
# 1) Quantized network design

Go to next: [Ch. 2](#ch2).

Go to others: [Ch. 0](#ch0), [Ch. 1](#ch1), [Ch. 2](#ch2), [Ch. 3](#ch3), [Ch. 4](#ch4).

Go to [Top](#top).

The goal of this chapter is to design a Convolutional network that provides quantized inputs and weights to its 2D-Conv kernels, as an example.

It is important to know that inside the quantized kernels of QKeras (```QConv2D```, ```QDepthwiseConv2D```, ```QDense```) there are the corresponding TensorFlow kernels (```tf.keras.backend.conv2d()```, ```tf.keras.backend.depthwise_conv2d()```, ```tf.keras.backend.dot()```) which are floating-point kernels. 

When running one of these quantized kernels, QKeras partially uses the technique called ["fake quantization"](https://github.com/google/qkeras/issues/96#issuecomment-1210877800) that is the same technique used by Tensorflow Lite [2]. This technique consists in quantizing and dequantizing inputs and weights before running the floating-point kernel. In this way, inputs, weights (and then outputs) remain floating point numbers, but can represent quantized values only. However, there is a difference: QKeras does not fake-quantize the inputs, i.e. they remain "true" floating point numbers so they can represent any number in the floating point range (you can look at the source code of ```QConv2D()``` in [qconvolutional.py#L294](https://github.com/google/qkeras/blob/eb6e0dc86c43128c6708988d9cb54d1e106685a4/qkeras/qconvolutional.py#L294) yourself). The same holds also for the outputs: they remain in floating point because computing a kernel with floating-point inputs and fake-quantized weights gives floating-point outputs.

To tackle this problem, we perform a ```quantized_bits()``` operation on the input feature map tensor, by inserting a ```QActivation``` layer. ```quantized_bits()``` performs a quantization-dequantization (q-deq) operation on the floating point tensor. Thanks to this ```QActivation``` layer, we can extract the quantization parameters of the input feature map tensor and quantize it to integer values (see [Chapter 2](#ch2)). Now the output tensor of ```QConv2D``` is fake-quantized completely because both inputs and weights to this layer are fake-quantized.

The next two layers are a standard ```ReLU``` followed by another ```QActivation``` with ```quantized_bits()```. We could have used ```QActivation("quantized_relu(bits,integer)")```, but ```quantized_relu()``` does not quantize the input data in the same way as ```quantized_bits()```. In particular, the argument ```alpha="auto"``` is not present in ```quantized_relu()```, so it does not quantize with the standard affine quantization mapping formula [2][3] (shown below) which is the quantization we want to implement.
$$x_q = \text{clip}\Big( \text{round}\big(\frac{1}{s} x + z\big), \alpha_q, \beta_q \Big)$$
Thus, in order to tell QKeras to use this quantization mapping formula, we have to set ```alpha="auto"``` in ```quantized_bits()``` for all QKeras layers.

The only drawback of using ```quantized_bits()``` is that it implements only a symmetric quantized range when ```alpha="auto"``` (as written in the comment [quantizers.py#L1404](https://github.com/google/qkeras/blob/c5051b51ac5d8db7b5d235419a1538258a35a8a7/qkeras/quantizers.py#L1404)), so even if we set ```symmetric=0``` and ```keep_negative=0```, it automatically forces ```symmetric=1``` ([quantizers.py#L524](https://github.com/google/qkeras/blob/b91d8815b31f05ddf9c7b6d62381df9be72a570a/qkeras/quantizers.py#L524)) and ```keep_negative=1``` ([quantizers.py#L584](https://github.com/google/qkeras/blob/b91d8815b31f05ddf9c7b6d62381df9be72a570a/qkeras/quantizers.py#L524), [quantizers.py#L603](https://github.com/google/qkeras/blob/b91d8815b31f05ddf9c7b6d62381df9be72a570a/qkeras/quantizers.py#L524)). Thefore, as an example, ```quantized_bits(4,4,0,0,alpha='auto')``` will be treated by QKeras as ```quantized_bits(4,4,1,1,alpha='auto')```. 
This implies that our features will lose 1 bit in the positive range after passing a ReLU activation.

Finally, the ```QActivation``` layer that follows the ```ReLU``` can be used to fake-quantize the features to another bitwidth precision. ```ReLU``` can NOT be passed to ```QConv2D``` in the ```activation``` argument because it would be threated as ```quantized_relu()```.

Regarding the number of bits for ```quantized_bits()```, we want that ```bits``` = ```integer``` because our target is to implement integer-only arithmetic. 

<br><br>
STRANGE THINGS.
1) Using ```bits``` > 31 quantizes things with ```nan```. Why? Future work

2) Always explicit the value of the keyword argument ```alpha``` of  ```quantized_bits()```, that is never leave the field blank, to avoid [strange behaviors](https://github.com/google/qkeras/issues/60).

In [7]:
# POOL PARAMS
pool_size_list = [(4, 4)]

# 2DCONV PARAMS
filters_list = [2, 3]
kernel_size_list = [(3, 3), (3, 3)]
strides_list = [(1, 1), (2, 2)]
pads_list = ["valid", "same"] # "valid" or "same"

# DENSE PARAMS
units_list = [10]

# QUANTIZATION PARAMS
bit_flat = 16

bits_qactiv_list  = [bit_flat, bit_flat, bit_flat, bit_flat]
bits_qweight_list = [bit_flat, bit_flat, bit_flat, 0       ] # last value is dummy and is needed by the next for loop

### Original Keras model that we want to convert in QKeras

In [8]:
use_batchnorm = 1

## Case without fused BN

### Manually modified Keras model to prepare it for transfer weights

In [9]:
if use_batchnorm == 0:
    
    model = tf.keras.models.Sequential([

        MaxPooling2D(pool_size=pool_size_list[0], name="pool"),
        FakeLayer(name="act_0"), # fake relu

        # Example of Conv2D without Batchnorm
        Conv2D(filters=filters_list[0], kernel_size=kernel_size_list[0], strides=strides_list[0], padding=pads_list[0], name="conv2d_0"),
              #activation="relu"),
        ReLU(name="relu_0"), # real relu
        FakeLayer(name="act_1"), # fake relu

        # Example of Conv2D with external Batchnorm
        Conv2D(filters_list[1], kernel_size_list[1], strides_list[1], pads_list[1], name="conv2d_1"),
        BatchNormalization(name="bn_1"),
        ReLU(name="relu_1"), # real relu
        FakeLayer(name="act_2"), # fake relu

        Flatten(name="flatten"),

        # Example of Dense without Batchnorm
        Dense(units_list[0], name="dense"),
        FakeLayer(name="act_3"), # fake relu

        Activation("softmax", name="softmax")

    ])

    model.build((None,input_width,input_width,input_channels))

    model.compile(Adam(lr=0.001), loss=['categorical_crossentropy'], metrics=['accuracy'],
                   run_eagerly=True)

    model.summary()

### QKeras model with new activation layer "quantized_bits_featuremap" 

In [10]:
if use_batchnorm == 0:
    
    qmodel = tf.keras.models.Sequential([
    
        MaxPooling2D(pool_size=pool_size_list[0], name="pool"),
        QActivation("quantized_bits_featuremap(bits=%s,integer=%s,symmetric=1,keep_negative=1,alpha='auto',scale_axis=0)" % (bits_qactiv_list[0], bits_qactiv_list[0]), name="act_0"),

        # Example of Conv2D without Batchnorm
        QConv2D(filters=filters_list[0], kernel_size=kernel_size_list[0], strides=strides_list[0], padding=pads_list[0], name="conv2d_0",
              kernel_quantizer="quantized_bits(%s,%s,1,1,alpha='auto')" % (bits_qweight_list[0], bits_qweight_list[0]), 
              bias_quantizer="quantized_bits(31,31,1,1,alpha='auto')"),
              #activation="relu"), # This way applies quantized_relu() that we do not want
        ReLU(name="relu_0"),
        QActivation("quantized_bits_featuremap(%s,%s,1,1,alpha='auto',scale_axis=0)" % (bits_qactiv_list[1], bits_qactiv_list[1]), name="act_1"),

        # Example of Conv2D with Batchnorm
        #QConv2DBatchnorm(filters_list[1], kernel_size_list[1], strides_list[1], pads_list[1], name="conv2d_1",
        #      kernel_quantizer="quantized_bits(%s,%s,1,1,alpha='auto')" % (bits_qweight_list[1], bits_qweight_list[1]), 
        #      bias_quantizer="quantized_bits(31,31,1,1,alpha='auto')"),
        #FakeLayer(name="bn_1"),
        QConv2D(filters=filters_list[1], kernel_size=kernel_size_list[1], strides=strides_list[1], padding=pads_list[1], name="conv2d_1",
              kernel_quantizer="quantized_bits(%s,%s,1,1,alpha='auto')" % (bits_qweight_list[1], bits_qweight_list[1]), 
              bias_quantizer="quantized_bits(31,31,1,1,alpha='auto')"),
        BatchNormalization(name="bn_1"),
        ReLU(name="relu_1"),
        QActivation("quantized_bits_featuremap(%s,%s,1,1,alpha='auto',scale_axis=0)" % (bits_qactiv_list[2], bits_qactiv_list[2]), name="act_2"),

        Flatten(name="flatten"),

        # Example of Dense without Batchnorm
        QDense(units_list[0],
             kernel_quantizer="quantized_bits(%s,%s,1,1,alpha='auto')" % (bits_qweight_list[2], bits_qweight_list[2]),
             bias_quantizer="quantized_bits(31,31,1,1,alpha='auto')",
             name="dense"),
        QActivation("quantized_bits_featuremap(%s,%s,1,1,alpha='auto',scale_axis=0)" % (bits_qactiv_list[3], bits_qactiv_list[3]), name="act_3"),

        Activation("softmax", name="softmax")

    ])

    qmodel.build((None,input_width,input_width,input_channels))

    qmodel.compile(Adam(lr=0.001), loss=['categorical_crossentropy'], metrics=['accuracy'],
                   run_eagerly=True)

    qmodel.summary()

## Case with fused BN (BatchNormalization --> FakeLayer)

### Manually modified Keras model to prepare it for transfer weights

In [11]:
if use_batchnorm == 1:
    
    model = tf.keras.models.Sequential([

        MaxPooling2D(pool_size=pool_size_list[0], name="pool"),
        FakeLayer(name="act_0"), # fake relu

        # Example of Conv2D without Batchnorm
        Conv2D(filters=filters_list[0], kernel_size=kernel_size_list[0], strides=strides_list[0], padding=pads_list[0], name="conv2d_0"),
              #activation="relu"),
        ReLU(name="relu_0"), # real relu
        FakeLayer(name="act_1"), # fake relu

        # Example of Conv2D with external Batchnorm
        Conv2D(filters_list[1], kernel_size_list[1], strides_list[1], pads_list[1], name="conv2d_1"),
        FakeLayer(name="bn_1"),
        ReLU(name="relu_1"), # real relu
        FakeLayer(name="act_2"), # fake relu

        Flatten(name="flatten"),

        # Example of Dense without Batchnorm
        Dense(units_list[0], name="dense"),
        FakeLayer(name="act_3"), # fake relu

        Activation("softmax", name="softmax")

    ])

    model.build((None,input_width,input_width,input_channels))

    model.compile(Adam(lr=0.001), loss=['categorical_crossentropy'], metrics=['accuracy'],
                   run_eagerly=True)

    model.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
pool (MaxPooling2D)          (None, 7, 7, 1)           0         
_________________________________________________________________
act_0 (FakeLayer)            (None, 7, 7, 1)           0         
_________________________________________________________________
conv2d_0 (Conv2D)            (None, 5, 5, 2)           20        
_________________________________________________________________
relu_0 (ReLU)                (None, 5, 5, 2)           0         
_________________________________________________________________
act_1 (FakeLayer)            (None, 5, 5, 2)           0         
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 3, 3, 3)           57        
_________________________________________________________________
bn_1 (FakeLayer)             (None, 3, 3, 3)           0

2024-01-18 17:39:58.103613: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.


### QKeras model with new activation layer "quantized_bits_featuremap"

In [12]:
if use_batchnorm == 1:
    
    qmodel = tf.keras.models.Sequential([

        MaxPooling2D(pool_size=pool_size_list[0], name="pool"),
        QActivation("quantized_bits_featuremap(bits=%s,integer=%s,symmetric=1,keep_negative=1,alpha='auto',scale_axis=0)" % (bits_qactiv_list[0], bits_qactiv_list[0]), name="act_0"),

        # Example of Conv2D without Batchnorm
        QConv2D(filters=filters_list[0], kernel_size=kernel_size_list[0], strides=strides_list[0], padding=pads_list[0], name="conv2d_0",
              kernel_quantizer="quantized_bits(%s,%s,1,1,alpha='auto')" % (bits_qweight_list[0], bits_qweight_list[0]), 
              bias_quantizer="quantized_bits(31,31,1,1,alpha='auto')"),
              #activation="relu"), # This way applies quantized_relu() that we do not want
        ReLU(name="relu_0"),
        QActivation("quantized_bits_featuremap(%s,%s,1,1,alpha='auto',scale_axis=0)" % (bits_qactiv_list[1], bits_qactiv_list[1]), name="act_1"),

        # Example of Conv2D with fused Batchnorm
        QConv2DBatchnorm(filters_list[1], kernel_size_list[1], strides_list[1], pads_list[1], name="conv2d_1",
              kernel_quantizer="quantized_bits(%s,%s,1,1,alpha='auto')" % (bits_qweight_list[1], bits_qweight_list[1]), 
              bias_quantizer="quantized_bits(31,31,1,1,alpha='auto')"),
        FakeLayer(name="bn_1"),
        #QConv2D(filters=filters_list[1], kernel_size=kernel_size_list[1], strides=strides_list[1], padding=pads_list[1], name="conv2d_1",
        #      kernel_quantizer="quantized_bits(%s,%s,1,1,alpha='auto')" % (bits_qweight_list[1], bits_qweight_list[1]), 
        #      bias_quantizer="quantized_bits(31,31,1,1,alpha='auto')"),
        #BatchNormalization(name="bn_1"),
        ReLU(name="relu_1"),
        QActivation("quantized_bits_featuremap(%s,%s,1,1,alpha='auto',scale_axis=0)" % (bits_qactiv_list[2], bits_qactiv_list[2]), name="act_2"),

        Flatten(name="flatten"),

        # Example of Dense without Batchnorm
        QDense(units_list[0],
             kernel_quantizer="quantized_bits(%s,%s,1,1,alpha='auto')" % (bits_qweight_list[2], bits_qweight_list[2]),
             bias_quantizer="quantized_bits(31,31,1,1,alpha='auto')",
             name="dense"),
        QActivation("quantized_bits_featuremap(%s,%s,1,1,alpha='auto',scale_axis=0)" % (bits_qactiv_list[3], bits_qactiv_list[3]), name="act_3"),

        Activation("softmax", name="softmax")

    ])

    qmodel.build((None,input_width,input_width,input_channels))

    qmodel.compile(Adam(lr=0.001), loss=['categorical_crossentropy'], metrics=['accuracy'],
                   run_eagerly=True)

    qmodel.summary()

Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
pool (MaxPooling2D)          (None, 7, 7, 1)           0         
_________________________________________________________________
act_0 (QActivation)          (None, 7, 7, 1)           0         
_________________________________________________________________
conv2d_0 (QConv2D)           (None, 5, 5, 2)           20        
_________________________________________________________________
relu_0 (ReLU)                (None, 5, 5, 2)           0         
_________________________________________________________________
act_1 (QActivation)          (None, 5, 5, 2)           0         
_________________________________________________________________
conv2d_1 (QConv2DBatchnorm)  (None, 3, 3, 3)           70        
_________________________________________________________________
bn_1 (FakeLayer)             (None, 3, 3, 3)          

## Train and save qmodel

In [13]:
train_model = 1
epochs = 30

save_model  = 1
save_path = "./qmodel"
#save_path = "./qmodel.h5"

In [14]:
if train_model == 1:
    
    qmodel.fit(x_train, y_train, batch_size=512,
               epochs=epochs, validation_split=0.25, shuffle=True)    
    
    if save_model == 1:
        
        try:
            os.remove("./qmodel.index")
            os.remove("./qmodel.data-00000-of-00001")
            os.remove("./checkpoint")
            print("all files deleted")
        except:
            print("nothing to delete")

        qmodel.save_weights(save_path, overwrite=True, save_format="tf")
        #qmodel.save(save_path)
        #os.remove("./checkpoint")

        print("Model saved correctly")
else:
    qmodel.load_weights(save_path, by_name=False).expect_partial()

Epoch 1/30
 1/88 [..............................] - ETA: 11s - loss: 2.8202 - accuracy: 0.1465

2024-01-18 17:40:01.746510: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2)


Epoch 2/30
Epoch 3/30
Epoch 4/30
Epoch 5/30
Epoch 6/30
Epoch 7/30
Epoch 8/30
Epoch 9/30
Epoch 10/30
Epoch 11/30
Epoch 12/30
Epoch 13/30
Epoch 14/30
Epoch 15/30
Epoch 16/30
Epoch 17/30
Epoch 18/30
Epoch 19/30
Epoch 20/30
Epoch 21/30
Epoch 22/30
Epoch 23/30
Epoch 24/30
Epoch 25/30
Epoch 26/30
Epoch 27/30
Epoch 28/30
Epoch 29/30
Epoch 30/30
all files deleted
Model saved correctly


### Transfer weights from qmodel to model

In [15]:
for qlayer, layer in zip(qmodel.layers, model.layers):
    
    with io.capture_output(stdout=True, stderr=False) as captured: # to disable all the printings to stdout of the functions inside this statement: https://stackoverflow.com/questions/23610585/ipython-notebook-avoid-printing-within-a-function/23611571#23611571
        
        print(qlayer.__class__.__name__)

        if qlayer.get_weights() and qlayer.__class__.__name__ not in ["BatchNormalization"]:
            
            print(qlayer.name)
            print("qlayer.get_weights()[0].shape:", qlayer.get_weights()[0].shape)
            #print(len(qlayer.get_weights()))
            #print(len(qlayer.get_weights()[0:2]))
            try:
                print(len(qlayer.get_folded_weights())) # non sono quantizzati da quantized_bits
                print("This layer IS FOLDED")
            except:
                print("This layer is NOT folded")
            #print(len(qlayer.weights))
            
            print("layer.get_weights()[0][0] MODEL BEFORE")
            print(layer.get_weights()[0][0])
            print("layer.get_weights()[1] MODEL BEFORE")
            print(layer.get_weights()[1])
            layer.set_weights(copy.deepcopy(qlayer.get_weights()[0:2]))
            print("layer.get_weights()[0][0] MODEL AFTER")
            print(layer.get_weights()[0][0])
            print("layer.get_weights()[1] MODEL AFTER")
            print(layer.get_weights()[1])
            
            print("qlayer.get_weights()[0][0] QMODEL") # equal to print("qlayer.weights[0][0]")
            print(qlayer.get_weights()[0][0])
            print("qlayer.get_weights()[1] QMODEL")
            print(qlayer.get_weights()[1])
            
            if (layer.get_weights()[0][0] != qlayer.get_weights()[0][0]).all() or \
               (layer.get_weights()[1] != qlayer.get_weights()[1]).all():
                raise Exception("Transfer weights failed")
            

        elif qlayer.get_weights() and qlayer.__class__.__name__ in ["BatchNormalization"]:

            print(qlayer.name)
            #print(len(qlayer.get_weights()))
            print("layer.get_weights()[0] MODEL BEFORE")
            print(layer.get_weights()[0])
            print("layer.get_weights()[1] MODEL BEFORE")
            print(layer.get_weights()[1])
            layer.set_weights(copy.deepcopy(qlayer.get_weights()))
            print("layer.get_weights()[0] MODEL AFTER")
            print(layer.get_weights()[0])
            print("layer.get_weights()[1] MODEL AFTER")
            print(layer.get_weights()[1])
            
            print("qlayer.get_weights()[0] QMODEL")
            print(qlayer.get_weights()[0])
            print("qlayer.get_weights()[1] QMODEL")
            print(qlayer.get_weights()[1])
            
            if (layer.get_weights()[0] != qlayer.get_weights()[0]).all() or \
               (layer.get_weights()[1] != qlayer.get_weights()[1]).all():
                raise Exception("Transfer weights failed")

        print("------------")
    
print("Done")

Done



***
<a class="anchor" id="ch2"></a>
# 2) Extract QKeras quantization factors
Go to next: [Ch. 3](#ch3).

Go to others: [Ch. 0](#ch0), [Ch. 1](#ch1), [Ch. 2](#ch2), [Ch. 3](#ch3), [Ch. 4](#ch4).

Go to [Top](#top).

This is the complete quantization formula for a matrix multiplication operation (valid also for an FC layer) taken from [[3]#Quantized-Matrix-Multiplication-Mathematics](#https://leimao.github.io/article/Neural-Networks-Quantization/#Quantized-Matrix-Multiplication-Mathematics) (an equivalent version is Eq.7 in [2]):
$$\begin{align} Y_{q,i,j} &= z_Y + \frac{s_b}{s_Y} (b_{q, j} - z_b) + \frac{s_X s_W}{s_Y} \Bigg[ \bigg( \sum_{k=1}^{p} X_{q,i,k} W_{q, k,j} \bigg) - \bigg( z_W \sum_{k=1}^{p} X_{q,i,k} \bigg) - \bigg( z_X \sum_{k=1}^{p} W_{q, k,j} \bigg) + p z_X z_W\Bigg] \end{align}$$

There are some contributions that could be deleted if the <b>zero points</b> of weights ```z_w``` and biases ```z_b``` <b>are forced to be zero, i.e. if both the quantized range and the fake-quantized/floating-point range of weights and biases, respectively, are symmetric</b>. When this happens, affine quantization mapping is called scale quantization mapping [3]. It is relatively easy to set the quantized range to be symmetric (for example, in QKeras we just need to pass ```quantized_bits()``` with ```symmetric=1``` and ```keep_negative=1``` to both the arguments ```kernel_quantizer``` and ```bias_quantizer``` of each QKeras layer), but this is not the case for the fake-quantized/floating-point range. There are two ways to make the latter symmetric:

1) during training, by constraining weight and bias tensors to a given symmetric range of values;

2) during training, by using a different way to calculate the scaling factor ```s```. Instead of calculating it in the standard and more general way: 
$$\begin{align} s &= \frac{\beta - \alpha}{\beta_q - \alpha_q}\end{align},$$
it can be calculated as: 
$$\begin{align} s &= \frac{2 * max (abs (tensor) )}{\beta_q - \alpha_q}\end{align},$$
where ```tensor``` is the floating-point weight/bias tensor to be quantized, ```[alpha; beta]``` is the floating-point range (where in turns ```alpha``` and ```beta``` are the minimum and maximum values of the entire tensor, so there is only one scalar ```s``` for the entire tensor), ```[alphaq; betaq]``` is the quantized range (which depends on the number of bits we want to represent the quantized data). Regarding the operations, ```abs()``` calculates the absolute value of all the elements in ```tensor``` and ```max``` extracts the maximum value from each channel (so in the second formula ```s``` is an array of scaling factors). Apart from the difference related to the per-layer vs per-channel quantization, the second formula is more general because removes the constraint of searching for the minimum in the tensor and directly assumes that the floating-point range (numerator) is symmetric, even if it is not actually true, but in this way it avoids to constrain ```alpha``` and ```beta``` to be exactly equal and opposite.

[TensorFlow Lite states](https://www.tensorflow.org/lite/performance/quantization_spec#symmetric_vs_asymmetric) that they are forcing the zero points to zero, but they do not show how (maybe it is necessary to look at the source code: future work). [This guy](https://stackoverflow.com/questions/69746834/tf-lite-model-force-symmetric-filter-weights-in-fully-connected-layers) tried to implement the first approach using [tf.keras.constraints](https://www.tensorflow.org/api_docs/python/tf/keras/constraints) without success; instead QKeras follows the second approach, as you can see in source code of ```quantized_bits()``` in [quantizers.py#L586](https://github.com/google/qkeras/blob/b91d8815b31f05ddf9c7b6d62381df9be72a570a/qkeras/quantizers.py#L586).

In the light of the aforementioned, in the next cells of this notebook we will extract and save to .txt files only the following quantization parameters that will be used for the inference in hardware (discussed in [Chapter 3](#ch3)):

- ```fq```, ```scale_f```, ```zeropoint_f``` and ```[alphaq_f, betaq_f]``` are the quantized values, the scaling factors, the zero points and the quantized range of the corresponding output features of a QActivation layer used for the q-deq operation, respectively;

- ```wq``` and ```scale_w``` are the quantized weights and their scaling factors; 

- ```bq``` and ```scale_b``` are the quantized biases and their scaling factors;

- ```subq1``` is the third term in the squared brackets in the quantization formula above (i.e. the summation over the quantized weights multiplied by the zero point of the input features):
$$\bigg( z_X \sum_{k=1}^{p} W_{q, k,j} \bigg)$$

To run the notebook without issues, you should edit the file [quantizers.py](https://github.com/google/qkeras/blob/b91d8815b31f05ddf9c7b6d62381df9be72a570a/qkeras/quantizers.py) to expose the following internal variables to the external world as attributes. "To ease this step, just follow the instructions in the readme file of this repo":

- ```m_i = K.cast_to_floatx(K.pow(2, self.integer))```;

- ```alphaq = -2**(self.bits-1)+1 if self.symmetric else 0```;

- ```betaq = 2**(self.bits-1)-1 if self.symmetric else (2**self.bits)-1```;

- ```scale1 = (K.max(abs(x), axis=axis, keepdims=True) * 2) / levels```.

In particular, we need ```m_i``` and ```scale1``` to compute a scaling factor that matches the definition of scaling factor of TensorFlow Lite [2][3]. In fact, one might imagine that the scaling factor provided by the ```scale``` attribute of ```quantized_bits()``` is the same as the TensorFlow one: unfortunately it is not, as you can see from [quantizers.py#L608](https://github.com/google/qkeras/blob/b91d8815b31f05ddf9c7b6d62381df9be72a570a/qkeras/quantizers.py#L608). The correct scaling factor is ```scale = scale1 * m_i``` and has to be computed manually.

Finally, the following calculations show the <b>quantization of the weights</b>, which is a <b>per-channel</b>  approach, i.e. weights have a number of scaling factors and zero points equal to the number of output channels, while <b>activations are quantized in a per-layer fashion</b> (one scaling factor and one zero point for each feature map tensor). The difference is the use of ```scale_axis=0```in ```quantized_bits()``` for ```QActivation()```. The reason why per-channel quantization of activations is not implemented in QKeras, as well as in TensorFlow Lite, is because "<i>per-channel quantization of activations is much harder to implement because we cannot factor the scale factor out of the summation and would, therefore, require rescaling the accumulator for each input channel</i>" [4].

In [16]:
folder_generated_txtfiles_from_python = "./generated_txtfiles_from_python/"

folder = "./float/" # contains the feature maps of each layer in floating point
qfolder = "./quantization_parameters/" # contains all the quantized feature maps and weights obtained from inference from Python (not from conversion from float)
qfolder_qdense = folder_generated_txtfiles_from_python + "qdense/"
qfolder_qconv2d = folder_generated_txtfiles_from_python + "qconv2d/"
qfolder_qdepthwise = folder_generated_txtfiles_from_python + "qdepthwise/"
qreverse = "./qreverse/"
qoutput = "./quantized_outputs/" # contains all the quantized output feature maps obtained from inference from Python (not from conversion from float)

folders_list = []
folders_list.append(folder_generated_txtfiles_from_python)
folders_list.append(folder)
folders_list.append(qfolder)
folders_list.append(qfolder_qdense)
folders_list.append(qfolder_qconv2d)
folders_list.append(qfolder_qdepthwise)
folders_list.append(qreverse)
folders_list.append(qoutput)

for f in folders_list:
    try:    
        os.mkdir(f)
    except:
        print(f + " exists")

./generated_txtfiles_from_python/ exists
./float/ exists
./quantization_parameters/ exists
./generated_txtfiles_from_python/qdense/ exists
./generated_txtfiles_from_python/qconv2d/ exists
./generated_txtfiles_from_python/qdepthwise/ exists
./qreverse/ exists
./quantized_outputs/ exists


In [17]:
precision_float = '%32.128f'
precision_quantized = '%.1f'

random_data = 0

def generate_txtfiles_from_python(model, x_sample):

    print("generate_txtfiles_from_python")
    
    # Reset all lists

    of_list = []
    w_list = []
    b_list = []

    alphaq_of_list = []
    #alphaq_w_list = []
    #alphaq_b_list = []
    betaq_of_list = []
    #betaq_w_list = []
    #betaq_b_list = []
    fq_list = []
    wq_list = []
    bq_list = []

    #zeropoint_w_list = []
    #zeropoint_b_list = []
    zeropoint_of_list = []
    scale_of_list = []

    scale_w_list = []
    scale_b_list = []
    subq1_list = []
    #subq2_list = []
    #sumq3_list = []

    layer_index = 0
    qdense_index = 0
    qconv2d_index = 0
    qdepthwise_index = 0

    first_layer_flag = 0
    ready1 = 0
    ready2 = 0
    
    previous_layers = ["None", "None", "None"]

    # print the output of the input layer (i.e. the input layer itself)
    if random_data == 0:
        data = np.reshape(x_sample, (1, input_width,input_width,input_channels))
    else:
        data = tf.random.normal((1, input_width,input_width,input_channels))
    of_list.append(data)
    np.savetxt(folder+str(layer_index)+"_input.txt", np.reshape(data, (input_width*input_width*input_channels,)), fmt=precision_float)
    layer_index += 1
    
    #print(model.layers)

    for layer in model.layers:

        print("--> ", layer.name)
        print("    ", layer.__class__.__name__)

        # print dequantized output features
        extractor = tf.keras.Model( inputs=model.inputs,
                                   outputs=model.get_layer(layer.name).output)
        of = extractor(data).numpy()
        #
        #print("of:", of)
        of_list.append(of)
        fname = folder+str(layer_index)+"_of_"+layer.name+".txt"
        np.savetxt(fname, np.reshape(of, (of.size,)), fmt=precision_float)


        ##### WEIGHTS #####
        if layer.__class__.__name__ in ["QConv2D", "QConv2DBatchnorm", "QDepthwiseConv2D", "QDepthwiseConv2DBatchnorm", "QDense", "QDenseBatchnorm"]:

            if layer.__class__.__name__ in ["QConv2D", "QDepthwiseConv2D", "QDense"]:
                parameters = layer.weights
            else:
                parameters = layer.get_folded_weights() # folded weights not quantized

            ##### CONV2D WEIGHTS #####
            quantizer_w = layer.get_quantizers()[0]

            # print dequantized weights
            w = parameters[0].numpy()
            #
            print("w.shape:", w.shape)
            w_list.append(w)
            fname = folder+str(layer_index)+"_w_"+layer.name+".txt"
            np.savetxt(fname, np.reshape(w, (w.size,)), fmt=precision_float)
            

            # (not needed)
            #if layer.__class__.__name__ in ["QConv2D", "QConv2DBatchnorm", "QDepthwiseConv2D", "QDepthwiseConv2DBatchnorm"]: # dwconv has output channel = 1 by default in qkeras
            if layer.__class__.__name__ in ["QConv2D", "QConv2DBatchnorm"]:
                alpha_w = w.min(axis=(0, 1, 2)) # the second axis contains the input channels, the third the output channels.
                beta_w = w.max(axis=(0, 1, 2))
            elif layer.__class__.__name__ in ["QDepthwiseConv2D", "QDepthwiseConv2DBatchnorm"]:
                alpha_w = w.min(axis=(0, 1))
                beta_w = w.max(axis=(0, 1))
            elif layer.__class__.__name__ in ["QDense", "QDenseBatchnorm"]:
                alpha_w = w.min(axis=0)
                beta_w = w.max(axis=0)
            alphaq_w = quantizer_w.alphaq
            betaq_w = quantizer_w.betaq
            #
            print("alpha_w: ", alpha_w)
            print("beta_w: ", beta_w)
            print("alphaq_w: ", alphaq_w)
            print("betaq_w: ", betaq_w)
            #alphaq_w_list.append(alphaq_w)
            #betaq_w_list.append(betaq_w)


            # print weights scale
            scale1_w = quantizer_w.scale1.numpy().flatten()
            m_i = quantizer_w.m_i.numpy().flatten()
            scale_w = scale1_w * m_i
            #
            print("axis:", quantizer_w.axis) # for dwconv it has to be [0,1] to have a per-channel quantization of weights. This is a bug of qkeras.
            print("scale_w: ", scale_w)
            scale_w_list.append(scale_w)
            fname = qfolder+str(layer_index)+"_w_s_"+layer.name+".txt"
            np.savetxt(fname, scale_w, fmt=precision_float)
            if layer.__class__.__name__ in ["QConv2D", "QConv2DBatchnorm"]:
                fname = qfolder_qconv2d+str(qconv2d_index)+"_w_s.txt"
            elif layer.__class__.__name__ in ["QDense", "QDenseBatchnorm"]:
                fname = qfolder_qdense+str(qdense_index)+"_w_s.txt"
            elif layer.__class__.__name__ in ["QDepthwiseConv2D", "QDepthwiseConv2DBatchnorm"]:
                fname = qfolder_qdepthwise+str(qdepthwise_index)+"_w_s.txt"
            np.savetxt(fname, scale_w, fmt=precision_float)

            
            # print weights zero point # (not needed)
            #tmp = [(beta_w[i]*alphaq_w - alpha_w[i]*betaq_w)/(beta_w[i] - alpha_w[i]) if elem != 0 else 0 for i,elem in enumerate((beta_w - alpha_w))]
            #z_w = np.trunc(tmp + np.sign(tmp)*0.5)
            #print("z_w:", z_w)
            #zeropoint_w_list.append(z_w)
            #fname = qfolder+str(layer_index)+"_w_z_"+layer.name+".txt"
            #np.savetxt(fname, z_w, fmt='%3.3f')


            # print quantized weights
            tmp = np.divide(w, scale_w, dtype=np.float64)
            tmp[np.isnan(tmp)] = 0
            tmp[np.isinf(tmp)] = 0
            wq = np.clip(np.trunc(tmp + np.sign(tmp)*0.5), alphaq_w, betaq_w)
            #
            #print("w.shape:", w.shape)
            #print("w:", w)
            #print("wq:", wq)
            wq_list.append(wq)
            fname = qfolder+str(layer_index)+"_wq_"+layer.name+".txt"
            np.savetxt(fname, np.reshape(wq, (wq.size,)), fmt=precision_quantized)
            if layer.__class__.__name__ in ["QConv2D", "QConv2DBatchnorm"]:
                fname = qfolder_qconv2d+str(qconv2d_index)+"_wq.txt"
            elif layer.__class__.__name__ in ["QDense", "QDenseBatchnorm"]:
                fname = qfolder_qdense+str(qdense_index)+"_wq.txt"
            elif layer.__class__.__name__ in ["QDepthwiseConv2D", "QDepthwiseConv2DBatchnorm"]:
                fname = qfolder_qdepthwise+str(qdepthwise_index)+"_wq.txt"
            np.savetxt(fname, np.reshape(wq, (wq.size,)), fmt=precision_quantized)


            # print "subq1" (part 2)
            if layer.__class__.__name__ in ["QConv2D", "QConv2DBatchnorm"]:
                sum_of_weights = wq.sum(axis=(0, 1, 2))
            elif layer.__class__.__name__ in ["QDense", "QDenseBatchnorm"]:
                sum_of_weights = wq.sum(axis=0)
            if layer.__class__.__name__ in ["QDepthwiseConv2D", "QDepthwiseConv2DBatchnorm"]:
                sum_of_weights = wq.sum(axis=(0, 1))
            #print("wq.shape:", wq.shape)
            #print("sum_of_weights conv: ", wq.sum(axis=(0, 1, 2)))
            #print("sum_of_weights dense: ", wq.sum(axis=0))
            #print("sum_of_weights dwconv: ", wq.sum(axis=(0, 1)))
            print("sum_of_weights: ", sum_of_weights)
            if ready1 == 1:
                subq1 = (zeropoint_of_list[-1] * sum_of_weights).flatten()
                #
                print("subq1: ", subq1)
                subq1_list.append(subq1)
                fname = qfolder+str(layer_index)+"_subq1_"+layer.name+".txt"
                np.savetxt(fname, np.reshape(subq1, (subq1.size,)), fmt=precision_quantized)
                if layer.__class__.__name__ in ["QConv2D", "QConv2DBatchnorm"]:
                    fname = qfolder_qconv2d+str(qconv2d_index)+"_subq1.txt"
                elif layer.__class__.__name__ in ["QDense", "QDenseBatchnorm"]:
                    fname = qfolder_qdense+str(qdense_index)+"_subq1.txt"
                elif layer.__class__.__name__ in ["QDepthwiseConv2D", "QDepthwiseConv2DBatchnorm"]:
                    fname = qfolder_qdepthwise+str(qdepthwise_index)+"_subq1.txt"
                np.savetxt(fname, np.reshape(subq1, (subq1.size,)), fmt=precision_quantized)
                ready1 = 0

                
            # print "subq2" (part 2) # (not needed)
            #if ready2 == 1:
            #    subq2 = [z_w[0] * sum_of_ofeatures]
            #    #
            #    #print("subq2: ", subq2)
            #    #subq2_list.append(subq2)
            #    #fname = qfolder+str(layer_index)+"_subq2_"+layer.name+".txt"
            #    #np.savetxt(fname, subq2, fmt='%3.3f')
            #    ready2 = 0

            # (not needed)
            #sumq3 = [input_channels * int(np.sqrt(wq.size))**2 * z_w[0] * zeropoint_of_list[-1]]
            #print("sumq3: ", sumq3)
            #sumq3_list.append(sumq3)
            #fname = qfolder+str(layer_index)+"_sumq3_"+layer.name+".txt"
            #np.savetxt(fname, sumq3, fmt='%3.3f')



            ##### BIASES ######
            quantizer_b = layer.get_quantizers()[1]

            # print biases
            b = parameters[1].numpy()
            #
            #b = np.reshape(b, (b.size,))
            b_list.append(b)
            fname = folder+str(layer_index)+"_b_"+layer.name+".txt"
            np.savetxt(fname, np.reshape(b, (b.size,)), fmt=precision_float)

            # (not needed)
            alpha_b = b.min(axis=0) # b is a 1D array
            beta_b = b.max(axis=0)
            alphaq_b = quantizer_b.alphaq
            betaq_b = quantizer_b.betaq
            #if alphaq_b == -7:
            #    alphaq_b = 0
            #if betaq_b == 15:
            #    betaq_b = 7
            #
            #print("alpha_b: ", alpha_b)
            #print("beta_b: ", beta_b)
            #print("alphaq_b: ", alphaq_b)
            #print("betaq_b: ", betaq_b)
            #alphaq_b_list.append(alphaq_b)
            #betaq_b_list.append(betaq_b)


            # print bias scale
            scale1_b = quantizer_b.scale1.numpy().flatten()
            m_i = quantizer_b.m_i.numpy().flatten()
            scale_b = scale1_b * m_i
            #
            print("scale_b: ", scale_b)
            scale_b_list.append(scale_b)
            fname = qfolder+str(layer_index)+"_b_s_"+layer.name+".txt"
            np.savetxt(fname, scale_b, fmt=precision_float)
            if layer.__class__.__name__ in ["QConv2D", "QConv2DBatchnorm"]:
                fname = qfolder_qconv2d+str(qconv2d_index)+"_b_s.txt"
            elif layer.__class__.__name__ in ["QDense", "QDenseBatchnorm"]:
                fname = qfolder_qdense+str(qdense_index)+"_b_s.txt"
            elif layer.__class__.__name__ in ["QDepthwiseConv2D", "QDepthwiseConv2DBatchnorm"]:
                fname = qfolder_qdepthwise+str(qdepthwise_index)+"_b_s.txt"
            np.savetxt(fname, scale_b, fmt=precision_float)

            # print bias zero point # (not needed)
            #if (beta_b - alpha_b) != 0:
            #    tmp = (beta_b*alphaq_b - alpha_b*betaq_b)/(beta_b - alpha_b)
            #    tmp[np.isnan(tmp)] = 0
            #    tmp[np.isinf(tmp)] = 0
            #    z_b = [np.trunc(tmp + np.sign(tmp)*0.5)]
            #else:
            #    z_b = [0]
            #
            #print("z_b:", z_b)
            #zeropoint_b_list.append(z_b)
            #fname = qfolder+str(layer_index)+"_b_z_"+layer.name+".txt"
            #np.savetxt(fname, z_b, fmt='%3.3f')


            # print quantized bias
            if scale_b != 0:
                tmp = np.divide(b, scale_b, dtype=np.float64)
                tmp[np.isnan(tmp)] = 0
                tmp[np.isinf(tmp)] = 0
                bq = np.clip(np.trunc(tmp + np.sign(tmp)*0.5), alphaq_b, betaq_b)
            else:
                bq = np.zeros(b.size)
            #
            print("b:", b)
            print("bq: ", bq)
            bq_list.append(bq)
            fname = qfolder+str(layer_index)+"_bq_"+layer.name+".txt"
            np.savetxt(fname, bq, fmt=precision_quantized)
            if layer.__class__.__name__ in ["QConv2D", "QConv2DBatchnorm"]:
                fname = qfolder_qconv2d+str(qconv2d_index)+"_bq.txt"
            elif layer.__class__.__name__ in ["QDense", "QDenseBatchnorm"]:
                fname = qfolder_qdense+str(qdense_index)+"_bq.txt"
            elif layer.__class__.__name__ in ["QDepthwiseConv2D", "QDepthwiseConv2DBatchnorm"]:
                fname = qfolder_qdepthwise+str(qdepthwise_index)+"_bq.txt"
            np.savetxt(fname, bq, fmt=precision_quantized)
            
            

        ##### FEATURES #####
        elif layer.__class__.__name__ in ["QActivation"]:

            # axis 0 is for batch, 1 and 2 are for feature map, 3 is for channels
            quantizer_of = layer.quantizer
            #
            print("of.shape:", of.shape)
            
            alpha_of = quantizer_of.alpha_f.numpy().flatten()
            beta_of = quantizer_of.beta_f.numpy().flatten()
            #
            print("alpha_of_new (input): ", alpha_of)
            print("beta_of_new (input): ", beta_of)
            print("QActivation uses quantized_bits_featuremaps()")
                
            alphaq_of = quantizer_of.alphaq
            betaq_of = quantizer_of.betaq
            # 
            print("alphaq_of: ", alphaq_of)
            print("betaq_of: ", betaq_of)
            alphaq_of_list.append(alphaq_of)
            betaq_of_list.append(betaq_of)
           
            # print features scale (per-layer)
            # is equal to the scale factor calculated inside quantized_bits() when scale_axis=0.
            # TODO: da sostituire con valore calcolato con il calibration set per ogni layer di features
            #levels = quantizer_of.levels
            #scale_of = [(2 * abs(of_list[-2]).max(axis=None)) / levels]
            #print("levels:", levels)
            #print("abs(of_list[-2]).max(axis=None):", abs(of_list[-2]).max(axis=None))
            #print("scale_of (per-layer) (output): ", scale_of)
            #scale_of_list.append(scale_of)
            #fname = qfolder+str(layer_index)+"_of_s_"+layer.name+".txt"
            #np.savetxt(fname, scale_of, fmt=precision_float)
            
            # print features scale (per-layer)
            # (not needed for activations, but I use it with quantized_bits(scale_axis=0))
            scale1_of = quantizer_of.scale1.numpy().flatten()
            #m_i = quantizer_of.m_i.numpy().flatten()
            #scale_of2 = scale1_of * m_i
            scale_of2 = scale1_of
            print("scale_of (per-layer) (input): ", scale_of2)
            scale_of_list.append(scale_of2)
            fname = qfolder+str(layer_index)+"_of_s_"+layer.name+".txt"
            np.savetxt(fname, scale_of2, fmt=precision_float)
            
            # print features zero point
            #z_of = [np.around(((beta_of[i]*alphaq_of - alpha_of[i]*betaq_of)/(beta_of[i] - alpha_of[i])), 0) if elem != 0 else 0 for i,elem in enumerate((beta_of - alpha_of))]
            #if (beta_of - alpha_of) != 0: 
            #    z_of = np.asarray([np.around(((beta_of*alphaq_of - alpha_of*betaq_of)/(beta_of - alpha_of)), 0)], dtype=np.float64)
            #else:
            #    z_of = [0]
            #
            #print("z_of (output):", z_of)
            
            # print features zero point
            z_of = quantizer_of.zeropoint.numpy().flatten()
            #print("z_of_new (input):", z_of)
            z_of[np.isnan(z_of)] = 0
            z_of[np.isinf(z_of)] = 0
            #
            print("z_of_new (input):", z_of)
            zeropoint_of_list.append(z_of)
            fname = qfolder+str(layer_index)+"_of_z_"+layer.name+".txt"
            np.savetxt(fname, z_of, fmt=precision_quantized)
            ready1 = 1 # print "subq1" (part 1)
                    
            
            # print quantized output features
            # outputq_1 = np.around((zeropoint_of_list[i+1] + np.divide(output_bias, scale_of_list[i+1])), 0)
            # outputq_2 = np.where(output_bias < 0, zeropoint_of_list[i+1], outputq_1)
            # outputq_3 = np.asarray(np.clip(outputq_2, alphaq_of_list[i+1], betaq_of_list[i+1]), dtype=np.float64)
            #ofq = np.clip(np.around((np.divide(of, scale_of) + z_of), 0), alphaq_of, betaq_of)
            #ofq[np.isnan(ofq)] = 0
            #print("ofq (calcolato nel notebook)", ofq)
            #
            ofq = quantizer_of.outputq.numpy()
            #print("of (input a questo layer):", of_list[-2])
            #print("of (input a questo layer interno) inputx:", quantizer_of.inputx.numpy())
            #print("outputq before_rounding (interno):", quantizer_of.before_rounding.numpy())
            #print("outputq: (calcolato nella nuova classe)", ofq)
            #print("of (q-dq) (output interno esterno):", of)
            #print("ofq (outputq esterno):", ofq)
            fq_list.append(ofq)
            fname = qfolder+str(layer_index)+"_ofq_"+layer.name+".txt"
            np.savetxt(fname, ofq.flatten(), fmt=precision_quantized)
            
            idx_pl = -2 # index previous layer
            
            if previous_layers[idx_pl] in ["QDense"] or previous_layers[idx_pl+1] in ["QDense"]:
                
                fname = qfolder_qdense+str(qdense_index)+"_of_s_in.txt"
                np.savetxt(fname, scale_of_list[-2], fmt=precision_float)
                fname = qfolder_qdense+str(qdense_index)+"_of_s_out.txt"
                np.savetxt(fname, scale_of_list[-1], fmt=precision_float)
                
                fname = qfolder_qdense+str(qdense_index)+"_of_z_in.txt"
                np.savetxt(fname, zeropoint_of_list[-2], fmt=precision_quantized)
                fname = qfolder_qdense+str(qdense_index)+"_of_z_out.txt"
                np.savetxt(fname, zeropoint_of_list[-1], fmt=precision_quantized)
                
                fname = qfolder_qdense+str(qdense_index)+"_ofq_in.txt"
                np.savetxt(fname, fq_list[-2].flatten(), fmt=precision_quantized)
                fname = qfolder_qdense+str(qdense_index)+"_ofq_out.txt"
                np.savetxt(fname, fq_list[-1].flatten(), fmt=precision_quantized)
                
                # If this is the first layer
                if first_layer_flag == 0 and qdense_index == 0: 
                    fname = qfolder_qdense+str(qdense_index)+"_ofq_in_out.txt"
                    np.savetxt(fname, fq_list[-2].flatten(), fmt=precision_quantized)
                    first_layer_flag = 1
                    
                qdense_index += 1
                
            elif previous_layers[idx_pl] in ["QConv2D", "QConv2DBatchnorm"] or previous_layers[idx_pl-1] in ["QConv2DBatchnorm"]:
                
                fname = qfolder_qconv2d+str(qconv2d_index)+"_of_s_in.txt"
                np.savetxt(fname, scale_of_list[-2], fmt=precision_float)
                fname = qfolder_qconv2d+str(qconv2d_index)+"_of_s_out.txt"
                np.savetxt(fname, scale_of_list[-1], fmt=precision_float)
                
                fname = qfolder_qconv2d+str(qconv2d_index)+"_of_z_in.txt"
                np.savetxt(fname, zeropoint_of_list[-2], fmt=precision_quantized)
                fname = qfolder_qconv2d+str(qconv2d_index)+"_of_z_out.txt"
                np.savetxt(fname, zeropoint_of_list[-1], fmt=precision_quantized)
                
                fname = qfolder_qconv2d+str(qconv2d_index)+"_ofq_in.txt"
                np.savetxt(fname, fq_list[-2].flatten(), fmt=precision_quantized)
                fname = qfolder_qconv2d+str(qconv2d_index)+"_ofq_out.txt"
                np.savetxt(fname, fq_list[-1].flatten(), fmt=precision_quantized)
                
                # If this is the first layer
                if first_layer_flag == 0 and qconv2d_index == 0: 
                    fname = qfolder_qconv2d+str(qconv2d_index)+"_ofq_in_out.txt"
                    np.savetxt(fname, fq_list[-2].flatten(), fmt=precision_quantized)
                    first_layer_flag = 1

                qconv2d_index += 1
                    
            elif previous_layers[idx_pl] in ["QDepthwiseConv2D", "QDepthwiseConv2DBatchnorm"] or previous_layers[idx_pl-1] in ["QDepthwiseConv2DBatchnorm"]:
                
                fname = qfolder_qdepthwise+str(qdepthwise_index)+"_of_s_in.txt"
                np.savetxt(fname, scale_of_list[-2], fmt=precision_float)
                fname = qfolder_qdepthwise+str(qdepthwise_index)+"_of_s_out.txt"
                np.savetxt(fname, scale_of_list[-1], fmt=precision_float)
                
                fname = qfolder_qdepthwise+str(qdepthwise_index)+"_of_z_in.txt"
                np.savetxt(fname, zeropoint_of_list[-2], fmt=precision_quantized)
                fname = qfolder_qdepthwise+str(qdepthwise_index)+"_of_z_out.txt"
                np.savetxt(fname, zeropoint_of_list[-1], fmt=precision_quantized)
                
                fname = qfolder_qdepthwise+str(qdepthwise_index)+"_ofq_in.txt"
                np.savetxt(fname, fq_list[-2].flatten(), fmt=precision_quantized)
                fname = qfolder_qdepthwise+str(qdepthwise_index)+"_ofq_out.txt"
                np.savetxt(fname, fq_list[-1].flatten(), fmt=precision_quantized)
                
                # If this is the first layer
                if first_layer_flag == 0 and qdepthwise_index == 0: 
                    fname = qfolder_qdepthwise+str(qdepthwise_index)+"_ofq_in_out.txt"
                    np.savetxt(fname, fq_list[-2].flatten(), fmt=precision_quantized)
                    first_layer_flag = 1
                    
                qdepthwise_index += 1
                
            else:
                
                print("previous_layers[idx_pl]:", previous_layers[idx_pl])


            # print "subq2" # (not needed)
            #try:
            #    sum_of_ofeatures = ofq.sum(axis=None)
            #    #print("sum_of_ofeatures: ", sum_of_ofeatures)
            #    ready2 = 1 # print "subq2" (part 1)
            #except: # for the last qdense layer
            #    print("sum_of_ofeatures: N.D. for last dense layer")      
                
        layer_index += 1
        previous_layers.append(layer.__class__.__name__)
        
    
    print("------------------------\n")
    return of_list,w_list,b_list,alphaq_of_list,betaq_of_list,fq_list,wq_list,bq_list,zeropoint_of_list,scale_of_list,scale_w_list,scale_b_list,subq1_list

***
<a class="anchor" id="ch3"></a>
# 3) Run inference and compare model with qmodel

Go to next: [Ch. 4](#ch4).

Go to others: [Ch. 0](#ch0), [Ch. 1](#ch1), [Ch. 2](#ch2), [Ch. 3](#ch3), [Ch. 4](#ch4).

Go to [Top](#top).

In [18]:
offset = 0
samples_to_run = 100
min_samples = offset
max_samples = offset+samples_to_run

print_prediction = False

# number of layers (with weights) of the networks
tot_layers = 0
for layer in qmodel.layers:
    if layer.__class__.__name__ in ["QConv2D", "QConv2DBatchnorm", "QDense", "QDepthwiseConv2D"]:
        tot_layers += 1
print("tot_layers:", tot_layers)

test_acc_accumulator_model = 0
test_acc_accumulator_qmodel = 0

iterations = 0

for x, y in zip(x_test[min_samples:max_samples], y_test[min_samples:max_samples]):
        
    # uncomment the next line and indent the next two to disable all the printings to stdout of those two functions
    # https://stackoverflow.com/questions/23610585/ipython-notebook-avoid-printing-within-a-function/23611571#23611571
    #with io.capture_output(stdout=True, stderr=False) as captured:
    #    of_list,w_list,b_list,alphaq_of_list,betaq_of_list,fq_list,wq_list,bq_list,zeropoint_of_list,scale_of_list,scale_w_list,scale_b_list,subq1_list = generate_txtfiles_from_python(qmodel, x)
       
    x_reshaped = x.reshape(1, x.shape[0], x.shape[1], x.shape[2])
    
    # Predict the samples with model, i.e. the Keras model
    pred_model = np.asarray(model.predict(x_reshaped, batch_size=1, verbose=0), dtype=np.float64)
    #print(pred_model)
    #print("--> quantize keras output (round and clip)")
    #tmp = zeropoint_of_list[-1] + np.divide(pred_model, scale_of_list[-1], dtype=np.float64)
    #tmp[np.isnan(tmp)] = 0
    #tmp[np.isinf(tmp)] = 0
    #pred_model_q1 = np.trunc(tmp + np.sign(tmp)*0.5)
    #pred_model_q2 = np.asarray(np.clip(pred_model_q1, alphaq_of_list[-1], betaq_of_list[-1]), dtype=np.float64)

    # Predict the samples with qmodel, i.e. the QKeras model
    pred_qmodel = np.asarray(qmodel.predict(x_reshaped, batch_size=1, verbose=0), dtype=np.float64)
    #print("--> quantize qkeras output (round and clip)")
    #tmp = zeropoint_of_list[-1] + np.divide(pred_qmodel, scale_of_list[-1], dtype=np.float64)
    #tmp[np.isnan(tmp)] = 0
    #tmp[np.isinf(tmp)] = 0
    #pred_qmodel_q1 = np.trunc(tmp + np.sign(tmp)*0.5)
    #pred_qmodel_q2 = np.asarray(np.clip(pred_qmodel_q1, alphaq_of_list[-1], betaq_of_list[-1]), dtype=np.float64)

    # Calculate test accuracy
    test_acc_model  = my_evaluate(pred_model,  y)
    test_acc_qmodel = my_evaluate(pred_qmodel, y)

    if print_prediction == True:
        
        print("\npred_model:\n", pred_model)
        #print("pred_model_q2:\n", pred_model_q2)

        print("pred_qmodel:\n", pred_qmodel)
        #print("pred_qmodelcxx_q2:\n", pred_qmodel_q2)
        
        print("test_acc_model:   ", test_acc_model)
        print("test_acc_qmodel:  ", test_acc_qmodel)
        
    test_acc_accumulator_model  += test_acc_model
    test_acc_accumulator_qmodel += test_acc_qmodel

    iterations = iterations + 1
    
    if print_prediction == True:
        print("iteration %d/%d" % (iterations, (max_samples - min_samples)))
        print("-------------------------")
    
print("-------------------------")

print("TOT iterations:        ", iterations)
print("TOT test_acc_model:    ", test_acc_accumulator_model/iterations)
print("TOT test_acc_qmodel:   ", test_acc_accumulator_qmodel/iterations)


tot_layers: 3
-------------------------
TOT iterations:         100
TOT test_acc_model:     0.39
TOT test_acc_qmodel:    0.91


<!--Go to next: [Ch. 5](#ch5).-->

Go to others: [Ch. 0](#ch0), [Ch. 1](#ch1), [Ch. 2](#ch2), [Ch. 3](#ch3), [Ch. 4](#ch4).

Go to [Top](#top).

***
<a class="anchor" id="ch4"></a>
# 4) Quantized network design for AutoQKeras

Go to next: [Ch. 5](#ch5).

Go to others: [Ch. 0](#ch0), [Ch. 1](#ch1), [Ch. 2](#ch2), [Ch. 3](#ch3), [Ch. 4](#ch4).

Go to [Top](#top).

To perform an hyperparameter search on a Keras model with AutoQKeras, we need to pass the Keras model to the first argument of the AutoQKeras class to create an AutoQKeras object. The input Keras model is automatically converted to a QKeras model during the building process by the ```quantize_model()``` method of AutoQKHyperModel class (see [autoqkeras_internal.py#L570](https://github.com/google/qkeras/blob/1ab354276a041b45cd72c300e89a7c51ec99fa35/qkeras/autoqkeras/autoqkeras_internal.py#L570)). 

The idea is to exploit this automatic convertion to realize a QKeras model according to the methodology previously described in this notebook ([Ch. 1](#ch1)) with as minimum changes as possible to the original Keras model definition, as explained here:

1) Every 2DConv, DWConv and FC layer has to be anticipated by an ```Activation``` layer with whatever activation function (we don't care about the type of activation because it will be replaced by AutoQKeras during the search with the ```activation``` values defined in the search space ```quantization_config```). In this example we are going to use "sigmoid", but any other type would be fine;

2) The last 2DConv, DWConv or FC layer of the network has also to be followed by an ```Activation``` layer, as written in the previous point;

3) Every ```BatchNormalization``` layer that follows a 2DConv or DWConv layer has to be fused with the convolution. We can use the flag ```enable_bn_folding=True``` when instantiating the AutoQKeras object to automatically do the batch normalization fusion;

4) Every activation layer associated to a convolutional or fully-connected layer, either declared as argument of the ```Activation``` layer (such as ```Activation(activation="relu")```), or declared as argument of the ```Conv2D```, ```DepthwiseConv2D``` or ```Dense``` layer (such as ```Conv2D(activation="relu")```), has to be written in the "direct" form, i.e. using a layer that has the same name of the activation (such as ```ReLU()```);

5) The activations not associated to a convolutional or fully-connected layer, usually those placed as last layer of CNNs (such as softmax), has not to be changed;

6) Any other layer, such as pooling or add layers, has not to be changed;

7) After average pooling, since the average of integers is not necessarily an integer, a quantization step is required (it would be better to use the same quantizer used for its input because its output range does not significantly change). Moreover, the dynamic of the activations changes between input and output of pooling. So an ```Activation``` layer is required after an average pooling;

8) For residual connections ("add" layers), do as the next cell shows.

Things to know:
- ```quantize_model()``` does not accept an input model with QKeras layers

To run the next cells without issues, you should edit the file [autoqkeras/autoqkeras_internal.py](https://github.com/google/qkeras/blob/master/qkeras/autoqkeras/autoqkeras_internal.py) to expose the flag ```enable_bn_folding``` to the AutoQKeras interface (externally) and to connect it to the ```AutoQKHyperModel``` class and its ```model_quantize``` method (internally).

In [19]:
# Modified Keras model for AutoQKeras
model2 = tf.keras.models.Sequential([
    
    MaxPooling2D(pool_size=pool_size_list[0], name="pool_0"),
    Activation("sigmoid", name="act_0"), # fake relu
    
    # Esempio conv senza batchnorm
    Conv2D(filters=filters_list[0], kernel_size=kernel_size_list[0], strides=strides_list[0], padding=pads_list[0], name="conv2d_0"),
          #activation="relu"),
    ReLU(name="relu_0"), # real relu
    Activation("sigmoid", name="act_1"), # fake relu
    
    # Esempio conv con batchnorm
    Conv2D(filters_list[1], kernel_size_list[1], strides_list[1], pads_list[1], name="conv2d_1"),
    BatchNormalization(name="bn_1"),
    ReLU(name="relu_1"), # real relu
    Activation("sigmoid", name="act_2"), # fake relu
    
    Flatten(name="flatten"),
    
    # Esempio dense senza batchnorm
    Dense(units_list[0], name="dense"),
    Activation("sigmoid", name="act_3"), # fake relu
  
    Activation("softmax", name="softmax")

])

In [20]:
model2.build((None,input_width,input_width,input_channels))

model2.summary()

Model: "sequential_2"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
pool_0 (MaxPooling2D)        (None, 7, 7, 1)           0         
_________________________________________________________________
act_0 (Activation)           (None, 7, 7, 1)           0         
_________________________________________________________________
conv2d_0 (Conv2D)            (None, 5, 5, 2)           20        
_________________________________________________________________
relu_0 (ReLU)                (None, 5, 5, 2)           0         
_________________________________________________________________
act_1 (Activation)           (None, 5, 5, 2)           0         
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 3, 3, 3)           57        
_________________________________________________________________
bn_1 (BatchNormalization)    (None, 3, 3, 3)          

In [21]:
model2.compile(Adam(lr=0.001), loss=['categorical_crossentropy'], metrics=['accuracy'],
               run_eagerly=True)

if tf.config.list_physical_devices('GPU') == []:
    print("No GPU available")
else:
    print("GPU available")

No GPU available


In [22]:
# scale_axis=0 does a per-layer quantization, i.e. one scaling factor for the entire layer
# otherwise qkeras automatically does a per-channel quantization, i.e. one scaling facor for each channel in layer
# we want a per-channel quantization for weigths/biases and a per-layer quantization for activations
quantization_config = {
        "kernel": {
                "quantized_bits(4,4,1,1,alpha='auto')": 4,
                "quantized_bits(8,8,1,1,alpha='auto')": 8,
                "quantized_bits(16,16,1,1,alpha='auto')": 16,
        },
        "bias": {
                "quantized_bits(16,16,1,1,alpha='auto')": 16,
                "quantized_bits(31,31,1,1,alpha='auto')": 31,
        },
        "activation": {
                "quantized_bits(4,4,1,1,alpha='auto',scale_axis=0)": 4,
                "quantized_bits(8,8,1,1,alpha='auto',scale_axis=0)": 8,
                "quantized_bits(16,16,1,1,alpha='auto',scale_axis=0)": 16
        }
}

# w, b, a
limit = {
    "Dense": [16, 31, 16],
    "Conv2D": [16, 31, 16],
    "DepthwiseConv2D": [16, 31, 16],
    "Activation": [16],
    "BatchNormalization": []
}

goal = {
    "type": "energy",
    "params": {
        "delta_p": 5.0,
        "delta_n": 5.0,
        "rate": 2.0,
        "stress": 1.0,
        "process": "horowitz",
        "parameters_on_memory": ["sram", "sram"],
        "activations_on_memory": ["sram", "sram"],
        "rd_wr_on_io": [False, False],
        "min_sram_size": [0, 0],
        "source_quantizers": ["int8"],
        "reference_internal": "int8",
        "reference_accumulator": "int16"
        }
}

run_config = {
  "output_dir": "./autoqkeras",
  "goal": goal,
  "quantization_config": quantization_config,
  "learning_rate_optimizer": False,
  "transfer_weights": False,
  "mode": "random",
  "seed": 42,
  "limit": limit,
  "tune_filters": "none",
  "tune_filters_exceptions": "^dense",
  "distribution_strategy": tf.distribute.get_strategy(),
  # first layer is input, layer two layers are softmax and flatten
  "layer_indexes": range(1, len(model.layers) - 1),
  "max_trials": 5
}

from qkeras.autoqkeras import *

autoqk = AutoQKeras(model2, metrics=["acc"], custom_objects={}, **run_config, enable_bn_folding=True)

Instructions for updating:
Use ref() instead.
operation count for <tensorflow.python.keras.layers.advanced_activations.ReLU object at 0x15e8f40d0> is defaulted to 0
operation count for <tensorflow.python.keras.layers.advanced_activations.ReLU object at 0x15e89e0d0> is defaulted to 0
Limit configuration:{"Dense": [16, 31, 16], "Conv2D": [16, 31, 16], "DepthwiseConv2D": [16, 31, 16], "Activation": [16], "BatchNormalization": []}
operation count for <tensorflow.python.keras.layers.advanced_activations.ReLU object at 0x161542670> is defaulted to 0
operation count for <tensorflow.python.keras.layers.advanced_activations.ReLU object at 0x16155e760> is defaulted to 0
learning_rate: 0.0010000000474974513
Model: "model"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
pool_0_input (InputLayer)    [(None, 28, 28, 1)]       0         
_________________________________________________________________
pool_0 (MaxPool

In [23]:
autoqk.fit(x_train, y_train, validation_data=(x_test, y_test), batch_size=1024, epochs=1)

Trial 5 Complete [00h 00m 06s]
val_score: 0.3983521230307274

Best val_score So Far: 0.5100576839010837
Total elapsed time: 00h 00m 29s
INFO:tensorflow:Oracle triggered exit


INFO:tensorflow:Oracle triggered exit


In [24]:
def extract_calibration_data(model,x_test,test_type):        #see generate_from_python

    tot_iter = x_test.shape[0]

    # Reset all lists

    of_list = []
    alphaq_of_list = []
    betaq_of_list = []
    fq_list = []
    wq_list = []
    bq_list = []
    zeropoint_of_list = []
    scale_of_list = []
    scale_w_list = []
    scale_b_list = []
    subq1_list = []
    layer_index = 0
    ready1 = 0
    previous_layers = ["None", "None","None"]


    if (test_type!="resnet"):
        layers_list = [layer for layer in model.layers]
    else:   #resnet
        layers_names_list = ["input_1","activation","conv2d","re_lu",                                                           #top
                             "activation_1","conv2d_1","re_lu_1","activation_2","conv2d_2","activation_3",                      #first stack RIGHT, LEFT EMPTY
                             "add",                                                                                             
                             "re_lu_2","activation_4","conv2d_3","re_lu_3","activation_5","conv2d_4","activation_6",            #second stack RIGHT
                             "activation_7","conv2d_5","activation_8",                                                          #second stack LEFT
                             "add_1",   
                             "re_lu_4", "activation_9","conv2d_6","re_lu_5","activation_10","conv2d_7","activation_11",         #third stack RIGHT
                             "activation_12","conv2d_8","activation_13",                                                        #third stack LEFT
                             "add_2",                                                                                           
                             "re_lu_6","average_pooling2d","activation_14","flatten","dense","activation_15","softmax"]         #bottom
        
        layers_list = []
        for name in layers_names_list:
            layers_list.append(model.get_layer(name))


    ##############################CALIBRATION####################################
    ##############################################################################
    """Two separate dictionaries for weight, bias and activations
       1) weight and bias: scale and zero are known and fixed (zero is actually 0)   ---> extract them by accessing the quantizer attributes
       2) activations: scale and bias vary with the input tensor                     ---> extract them by appending the output feature and taking max(|of|), alphaq,betaq instead are fixed by the quantizer

       merge them at the end of calibration after extracting alpha and beta for the act dictionary
    """

    base_param_dict = {
               "w_scale": 0,
               "b_scale": 0,
               "SF_IN_W": [],
               "WEIGHTS_CROSSPRODUCT": []
    }

    base_act_dict = {
        "alphaq": 0,
        "betaq": 0,
        "alpha": [],
        "beta": [],
        "scale": [],
        "SF_OUT_INV": [],
        "zero(Z_IN)":[]
    }


    #########DEFINE THE WEIGHT AND BIAS DICTIONARY###########################
    w_layers = []
    for layer in layers_list:
        if layer.__class__.__name__ in ["QConv2D", "QConv2DBatchnorm", "QDepthwiseConv2D", "QDepthwiseConv2DBatchnorm", "QDense", "QDenseBatchnorm"]:
            w_layers.append(layer.name)

    w_dict = {k: dc(base_param_dict) for k in w_layers} #deepcopy base dict otherwise it is always the same object

    ########DEFINE THE ACTIVATION DICTIONARY########################
    a_layers = []
    for layer in layers_list:
        if layer.__class__.__name__ in ["QActivation"]:
            a_layers.append(layer.name)
    
    #inner_a_dict = {k: [] for k in base_act_keys}
    a_dict = {k: dc(base_act_dict) for k in a_layers}   

    for iter, x in enumerate(x_test):

        if (test_type != "auto-encoder"):
            data = x.reshape(1, x.shape[0], x.shape[1], x.shape[2])
        else:
            data = x.reshape(1,640)

        of_list.append(data)
        layer_index += 1

        pred_model = np.asarray(model.predict(data, batch_size=1, verbose=0), dtype=np.float64)

        ############LAYERS LOOP##############################

        for layer in layers_list:
            extractor = tf.keras.Model( inputs=model.inputs,
                                       outputs=model.get_layer(layer.name).output)
            of = extractor(data).numpy()
            of_list.append(of)

            ##### WEIGHTS #####
            if layer.__class__.__name__ in ["QConv2D", "QConv2DBatchnorm", "QDepthwiseConv2D", "QDepthwiseConv2DBatchnorm", "QDense", "QDenseBatchnorm"]:

                if layer.__class__.__name__ in ["QConv2D", "QDepthwiseConv2D", "QDense"]:
                    parameters = layer.weights
                else:
                    parameters = layer.get_folded_weights() # folded weights not quantized

                w = parameters[0].numpy()
                quantizer_w = layer.get_quantizers()[0]
                alphaq_w = quantizer_w.alphaq
                betaq_w = quantizer_w.betaq
                scale1_w = quantizer_w.scale1.numpy().flatten()
                m_i = quantizer_w.m_i.numpy().flatten()
                scale_w = scale1_w * m_i                    #WEIGHT SCALE
                scale_w_list.append(scale_w)

                #QUANTIZED WEIGHTS
                tmp = np.divide(w, scale_w, dtype=np.float64)
                tmp[np.isnan(tmp)] = 0
                tmp[np.isinf(tmp)] = 0
                wq = np.clip(np.trunc(tmp + np.sign(tmp)*0.5), alphaq_w, betaq_w)
                wq_list.append(wq)

    
                # print "subq1" (part 2)
                if layer.__class__.__name__ in ["QConv2D", "QConv2DBatchnorm"]:
                    sum_of_weights = wq.sum(axis=(0, 1, 2))
                elif layer.__class__.__name__ in ["QDense", "QDenseBatchnorm"]:
                    sum_of_weights = wq.sum(axis=0)
                if layer.__class__.__name__ in ["QDepthwiseConv2D", "QDepthwiseConv2DBatchnorm"]:
                    sum_of_weights = wq.sum(axis=(0, 1))

                if ready1 == 1:
                    subq1 = (zeropoint_of_list[-1] * sum_of_weights).flatten()
                    sx = float(scale_of_list[-1])
                    sw = float(np.max(np.abs(scale_w)))
                    sx_sw = sx*sw
                    subq1_list.append(subq1)
                    ready1 = 0
                    w_dict[layer.name]["SF_IN_W"].append(sx_sw)
                    w_dict[layer.name]["WEIGHTS_CROSSPRODUCT"].append(np.max(np.abs(subq1)))
                   

                ##### BIASES ######
                quantizer_b = layer.get_quantizers()[1]
                b = parameters[1].numpy()
                alphaq_b = quantizer_b.alphaq
                betaq_b = quantizer_b.betaq
                scale1_b = quantizer_b.scale1.numpy().flatten()
                m_i = quantizer_b.m_i.numpy().flatten()
                scale_b = scale1_b * m_i


                scale_b_list.append(scale_b)
                if scale_b != 0:
                    tmp = np.divide(b, scale_b, dtype=np.float64)
                    tmp[np.isnan(tmp)] = 0
                    tmp[np.isinf(tmp)] = 0
                    bq = np.clip(np.trunc(tmp + np.sign(tmp)*0.5), alphaq_b, betaq_b)
                else:
                    bq = np.zeros(b.size)
                bq_list.append(bq)


                #calibration, all constants x layer
                w_dict[layer.name]['w_scale'] = np.max(np.abs(scale_w))
                w_dict[layer.name]["b_scale"] = np.max(np.abs(scale_b))                
                w_dict[layer.name]["BIASQ_SCALE"] = np.max(np.abs(bq*scale_b))


            ##### FEATURES #####
            elif layer.__class__.__name__ in ["QActivation"]:

                # axis 0 is for batch, 1 and 2 are for feature map, 3 is for channels
                quantizer_of = layer.quantizer
                alphaq_of = quantizer_of.alphaq
                betaq_of = quantizer_of.betaq

                #calcola min e max invece di of
                alpha_of = quantizer_of.alpha_f.numpy().flatten()
                beta_of = quantizer_of.beta_f.numpy().flatten()


                alphaq_of_list.append(alphaq_of)
                betaq_of_list.append(betaq_of)

                scale1_of = quantizer_of.scale1.numpy().flatten()
                scale_of2 = scale1_of
                scale_of_list.append(scale_of2)

                z_of = quantizer_of.zeropoint.numpy().flatten()
                z_of[np.isnan(z_of)] = 0
                z_of[np.isinf(z_of)] = 0
                zeropoint_of_list.append(z_of)

                ready1 = 1 # print "subq1" (part 1)
                ofq = quantizer_of.outputq.numpy()
                fq_list.append(ofq)

                #calibration
                a_dict[layer.name]["alphaq"] = alphaq_of
                a_dict[layer.name]["betaq"] = betaq_of
                a_dict[layer.name]["alpha"].append(alpha_of[0])   #[0] because all these are arrays with a single value
                a_dict[layer.name]["beta"].append(beta_of[0])
                a_dict[layer.name]["scale"].append(scale_of2[0])
                a_dict[layer.name]["SF_OUT_INV"].append(1/scale_of2[0])
                a_dict[layer.name]["zero(Z_IN)"].append(z_of[0])

            layer_index += 1
            previous_layers.append(layer.__class__.__name__)
        print(f"ITERATION: {iter+1}/{tot_iter}")
    return (w_dict, a_dict)

In [25]:
def generate_csv(w_dict,a_dict,csv_file_path1,csv_file_path2):
    df = pd.DataFrame.from_dict(w_dict, orient="index") #index will be determined by the first layer of nested dictionaries (layers)
    df.to_csv(csv_file_path1)
    df = pd.DataFrame.from_dict(a_dict, orient="index") #index will be determined by the first layer of nested dictionaries (layers)
    df.to_csv(csv_file_path2)

In [26]:
(x_test,y_test) = mobilenet_utils.get_mobilenet_data() 
qmodel = qkeras.utils.load_qmodel(...)
qmodel.compile(Adam(lr=0.001), loss=['categorical_crossentropy'], metrics=['accuracy'],
               run_eagerly=True)

(w_dict,a_dict) = extract_calibration_data(qmodel, x_test)
generate_csv(w_dict, a_dict, "./calibration_results/extracted_weights.csv", "./calibration_results/extracted_activations.csv")

NameError: name 'mobilenet_utils' is not defined