<a class="anchor" id="top"></a>
# QKeras-Mod Explained
Author: Luca Urbinati, PhD Student @ Politecnico di Torino, luca.urbinati@polito.it. Date: 20/01/2024, v.1.0
***

### Content of this notebook
[Chapter 1](#ch1): how to design a quantized model (with and without fused batch normalization) starting from a Keras model using a <b>modified version of QKeras<b> [1] that <b>quantizes weights and activations to integers<b> implementing uniform integer quantization;

[Chapter 2](#ch2): <b>compare inference results</b> between the Keras model and the quantized one;

[Chapter 3](#ch3): how to <b>extract quantization factors</b> (scaling factors and zero points) from each layer of the QKeras model to behave similarly to Tensorflow Lite [2][3];

[Chapter 4](#ch4): how to use <b>AutoQKeras</b> to search for the best mixed-precision integer quantized model.

***
    
### Requirements before to start
- Read [this QKeras tutorial](https://github.com/google/qkeras/blob/master/notebook/QKerasTutorial.ipynb) to become confident with QKeras.
- Install the conda environment [qkeras-env.yml](https://github.com/LucaUrbinati44/qkeras-mod/blob/main/qkeras-env.yml) provided in this repo and activate it (_conda activate qkeras-env_).
- Apply the patch to QKeras' installation to have access to the modified version of QKeras (see the [README](https://github.com/LucaUrbinati44/qkeras-mod/blob/main/README.md)). 

### Publications using this code
- Luca Urbinati and Mario R. Casu, "High-Level Design of Precision-Scalable DNN Accelerators Based on Sum-Together Multiplier", in the review process.

### References
[1] QKeras: https://github.com/google/qkeras

[2] B. Jacob et al., "Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference," arXiv:1712.05877 [cs, stat], Dec. 2017. Available: http://arxiv.org/abs/1712.05877

[3] Mao, Lei. "Quantization for Neural Networks". Lei Mao’s Log Book, May 17, 2020, https://leimao.github.io/article/Neural-Networks-Quantization/

[4] M. Nagel, M. Fournarakis, R. A. Amjad, Y. Bondarenko, M. van Baalen, and T. Blankevoort, “A White Paper on Neural Network Quantization.” arXiv, Jun. 15, 2021. Available: http://arxiv.org/abs/2106.08295

[5] H. Wu, P. Judd, X. Zhang, M. Isaev, and P. Micikevicius, “Integer Quantization for Deep Learning Inference: Principles and Empirical Evaluation,” arXiv:2004.09602 [cs, stat], Apr. 2020, Accessed: Dec. 22, 2021. [Online]. Available: http://arxiv.org/abs/2004.09602.

***
<a class="anchor" id="ch0"></a>
# 0) Import libraries and data

Go to next: [Ch. 1](#ch1).

Go to others: [Ch. 0](#ch0), [Ch. 1](#ch1), [Ch. 2](#ch2), [Ch. 3](#ch3), [Ch. 4](#ch4).

Go to [Top](#top).

In [1]:
import random
import numpy as np
import sys
import os
from IPython.utils import io
import math
from copy import deepcopy as dc
import pandas as pd
from sklearn.model_selection import train_test_split

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.models import Sequential
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.layers import *
from tensorflow.keras.models import Model

from qkeras import *
from qkeras.utils import *
from qkeras.autoqkeras import *

tf.keras.backend.set_floatx('float64')
tf.keras.backend.floatx()

np.set_printoptions(threshold=sys.maxsize, precision=128, suppress=True)

if tf.config.list_physical_devices('GPU') == []:
    print("No GPU available")
else:
    print("GPU available")

RuntimeError: module compiled against API version 0xe but this version of numpy is 0xd

RuntimeError: module compiled against API version 0xe but this version of numpy is 0xd

No GPU available


2024-01-19 23:36:00.205681: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set


In [2]:
def set_seed(seed: int = 42) -> None:
    random.seed(seed)
    np.random.seed(seed)
    tf.random.set_seed(seed)
    tf.experimental.numpy.random.seed(seed)
    # When running on the CuDNN backend, two further options must be set
    os.environ['TF_CUDNN_DETERMINISTIC'] = '1'
    os.environ['TF_DETERMINISTIC_OPS'] = '1'
    # Set a fixed value for the hash seed
    os.environ["PYTHONHASHSEED"] = str(seed)
    print(f"Random seed set as {seed}")

set_seed(0)

Random seed set as 0


## Get dummy data

In [3]:
from tensorflow.keras.datasets import mnist

def get_data():
    (x_train, y_train), (x_test, y_test) = mnist.load_data()
    x_train = x_train.reshape(x_train.shape + (1,)).astype("float32")
    x_test = x_test.reshape(x_test.shape + (1,)).astype("float32")

    x_train /= 256.0
    x_test /= 256.0

    x_mean = np.mean(x_train, axis=0)

    x_train -= x_mean
    x_test -= x_mean

    nb_classes = np.max(y_train)+1
    y_train = to_categorical(y_train, nb_classes)
    y_test = to_categorical(y_test, nb_classes)

    return (x_train, y_train), (x_test, y_test)

input_width = 28
input_channels = 1

(x_train, y_train), (x_test, y_test) = get_data()

print(x_train.shape)
print(y_train.shape)

(60000, 28, 28, 1)
(60000, 10)


***
<a class="anchor" id="ch1"></a>
# 1) Quantized network design

Go to next: [Ch. 2](#ch2).

Go to others: [Ch. 0](#ch0), [Ch. 1](#ch1), [Ch. 2](#ch2), [Ch. 3](#ch3), [Ch. 4](#ch4).

Go to [Top](#top).

The goal of this chapter is to design a Convolutional network that provides quantized inputs and weights to its 2D-Conv kernels, as an example.

It is important to know that inside the quantized kernels of QKeras (```QConv2D```, ```QDepthwiseConv2D```, ```QDense```) there are the corresponding TensorFlow kernels (```tf.keras.backend.conv2d()```, ```tf.keras.backend.depthwise_conv2d()```, ```tf.keras.backend.dot()```) which are floating-point kernels. 

When running one of these quantized kernels, QKeras partially uses the technique called ["fake quantization"](https://github.com/google/qkeras/issues/96#issuecomment-1210877800) that is the same technique used by Tensorflow Lite [2]. This technique consists in quantizing and dequantizing inputs and weights before running the floating-point kernel. In this way, inputs, weights (and then outputs) remain floating point numbers, but can represent quantized values only. However, there is a difference: QKeras does not fake-quantize the inputs, i.e. they remain "true" floating point numbers so they can represent any number in the floating point range (you can look at the source code of ```QConv2D()``` in [qconvolutional.py#L294](https://github.com/google/qkeras/blob/eb6e0dc86c43128c6708988d9cb54d1e106685a4/qkeras/qconvolutional.py#L294) yourself). The same holds also for the outputs: they remain in floating point because computing a kernel with floating-point inputs and fake-quantized weights gives floating-point outputs.

To tackle this problem, we perform a ```quantized_bits()``` operation on the input feature map tensor, by inserting a ```QActivation``` layer. ```quantized_bits()``` performs a quantization-dequantization (q-deq) operation on the floating point tensor. Thanks to this ```QActivation``` layer, we can extract the quantization parameters of the input feature map tensor and quantize it to integer values (see [Chapter 2](#ch2)). Now the output tensor of ```QConv2D``` is fake-quantized completely because both inputs and weights to this layer are fake-quantized.

The next two layers are a standard ```ReLU``` followed by another ```QActivation``` with ```quantized_bits()```. We could have used ```QActivation("quantized_relu(bits,integer)")```, but ```quantized_relu()``` does not quantize the input data in the same way as ```quantized_bits()```. In particular, the argument ```alpha="auto"``` is not present in ```quantized_relu()```, so it does not quantize with the standard affine quantization mapping formula [2][3] (shown below) which is the quantization we want to implement.
$$x_q = \text{clip}\Big( \text{round}\big(\frac{1}{s} x + z\big), \alpha_q, \beta_q \Big)$$
Thus, in order to tell QKeras to use this quantization mapping formula, we have to set ```alpha="auto"``` in ```quantized_bits()``` for all QKeras layers.

The only drawback of using ```quantized_bits()``` is that it implements only a symmetric quantized range when ```alpha="auto"``` (as written in the comment [quantizers.py#L1404](https://github.com/google/qkeras/blob/c5051b51ac5d8db7b5d235419a1538258a35a8a7/qkeras/quantizers.py#L1404)), so even if we set ```symmetric=0``` and ```keep_negative=0```, it automatically forces ```symmetric=1``` ([quantizers.py#L524](https://github.com/google/qkeras/blob/b91d8815b31f05ddf9c7b6d62381df9be72a570a/qkeras/quantizers.py#L524)) and ```keep_negative=1``` ([quantizers.py#L584](https://github.com/google/qkeras/blob/b91d8815b31f05ddf9c7b6d62381df9be72a570a/qkeras/quantizers.py#L524), [quantizers.py#L603](https://github.com/google/qkeras/blob/b91d8815b31f05ddf9c7b6d62381df9be72a570a/qkeras/quantizers.py#L524)). Thefore, as an example, ```quantized_bits(4,4,0,0,alpha='auto')``` will be treated by QKeras as ```quantized_bits(4,4,1,1,alpha='auto')```. 
This implies that our features will lose 1 bit in the positive range after passing a ReLU activation.

Finally, the ```QActivation``` layer that follows the ```ReLU``` can be used to fake-quantize the features to another bitwidth precision. ```ReLU``` can NOT be passed to ```QConv2D``` in the ```activation``` argument because it would be threated as ```quantized_relu()```.

Regarding the number of bits for ```quantized_bits()```, we want that ```bits``` = ```integer``` because our target is to implement integer-only arithmetic. 

<br><br>
STRANGE THINGS.
1) Using ```bits``` > 31 quantizes things with ```nan```. Why? Future work

2) Always explicit the value of the keyword argument ```alpha``` of  ```quantized_bits()```, that is never leave the field blank, to avoid [strange behaviors](https://github.com/google/qkeras/issues/60).

## Set network hyperparameters

In [4]:
# User settings

# BATCHNORM PARAMS
fused_batchnorm = 1

# POOL PARAMS
pool_size_list = [(4, 4)]

# 2DCONV PARAMS
filters_list = [2, 3]
kernel_size_list = [(3, 3), (3, 3)]
strides_list = [(1, 1), (2, 2)]
pads_list = ["valid", "same"]

# DENSE PARAMS
units_list = [10]

# QUANTIZATION PARAMS
bit_flat = 16

bits_qactiv_list  = [bit_flat, bit_flat, bit_flat, bit_flat]
bits_qweight_list = [bit_flat, bit_flat, bit_flat, 0       ] # last value is dummy

#------------------------------------------

In [5]:
class FakeLayer(Layer):

    """Subclass of Layer to create an Identity or Fake layer that does not exist for TensorFlow 2.4.0:
    https://www.tensorflow.org/api_docs/python/tf/keras/layers/Identity"""
    
    def __init__(self, name=None):
        super(FakeLayer, self).__init__(name=name)

    def call(self, inputs):
        return inputs

# Case without fused BatchNorm

## Original Keras model that we want to convert in QKeras
BatchNorm is not folded

In [None]:
if fused_batchnorm == 0:
    
    model = tf.keras.models.Sequential([

        MaxPooling2D(pool_size=pool_size_list[0], name="pool"),

        # Example of Conv2D without Batchnorm
        Conv2D(filters_list[0], kernel_size_list[0], strides_list[0], pads_list[0], name="conv2d_0"),
        ReLU(name="relu_0"),

        # Example of Conv2D with Batchnorm
        Conv2D(filters_list[1], kernel_size_list[1], strides_list[1], pads_list[1], name="conv2d_1"),
        BatchNormalization(name="bn_1"), # BatchNorm is not folded
        ReLU(name="relu_1"),

        Flatten(name="flatten"),

        # Example of Dense without Batchnorm
        Dense(units_list[0], name="dense"),
        Activation("softmax", name="softmax")

    ])

    model.build((None,input_width,input_width,input_channels))

    model.compile(Adam(lr=0.001), loss=['categorical_crossentropy'], metrics=['accuracy'], run_eagerly=True)

    model.summary()

## QKeras model with new activation layer "quantized_bits_featuremap" 

In [None]:
if fused_batchnorm == 0:
    
    qmodel = tf.keras.models.Sequential([

        MaxPooling2D(pool_size_list[0], name="pool"),
        
        QActivation("quantized_bits_featuremap(%s,%s,1,1,alpha='auto',scale_axis=0)" % \
                    (bits_qactiv_list[0], bits_qactiv_list[0]), name="act_0"),

        # Example of Conv2D without Batchnorm
        QConv2D(filters_list[0], kernel_size_list[0], strides_list[0], pads_list[0],
                kernel_quantizer="quantized_bits(%s,%s,1,1,alpha='auto')" % \
                                 (bits_qweight_list[0], bits_qweight_list[0]), 
                bias_quantizer="quantized_bits(31,31,1,1,alpha='auto')", name="conv2d_0"),
                #activation="relu"), # This way applies quantized_relu() that we do not want
        ReLU(name="relu_0"), # Use this instead
        QActivation("quantized_bits_featuremap(%s,%s,1,1,alpha='auto',scale_axis=0)" % \
                    (bits_qactiv_list[1], bits_qactiv_list[1]), name="act_1"),

        # Example of Conv2D with fused Batchnorm
        #QConv2DBatchnorm(filters_list[1], kernel_size_list[1], strides_list[1], pads_list[1],
        #                 kernel_quantizer="quantized_bits(%s,%s,1,1,alpha='auto')" % \
        #                 (bits_qweight_list[1], bits_qweight_list[1]), 
        #                 bias_quantizer="quantized_bits(31,31,1,1,alpha='auto')", name="conv2d_1"),
        #FakeLayer(name="bn_1"),
        QConv2D(filters_list[1], kernel_size_list[1], strides_list[1], pads_list[1],
                kernel_quantizer="quantized_bits(%s,%s,1,1,alpha='auto')" % \
                (bits_qweight_list[1], bits_qweight_list[1]), 
                bias_quantizer="quantized_bits(31,31,1,1,alpha='auto')", name="conv2d_1"),
        BatchNormalization(name="bn_1"),
        ReLU(name="relu_1"),
        QActivation("quantized_bits_featuremap(%s,%s,1,1,alpha='auto',scale_axis=0)" % \
                    (bits_qactiv_list[2], bits_qactiv_list[2]), name="act_2"),

        Flatten(name="flatten"),

        # Example of Dense without Batchnorm (scale_axis=2 for per-layer quantization for Dense)
        QDense(units_list[0],
               kernel_quantizer="quantized_bits(%s,%s,1,1,alpha='auto',scale_axis=2)" % \
                                (bits_qweight_list[2], bits_qweight_list[2]),
               bias_quantizer="quantized_bits(31,31,1,1,alpha='auto')", name="dense"),
        QActivation("quantized_bits_featuremap(%s,%s,1,1,alpha='auto',scale_axis=0)" % \
                    (bits_qactiv_list[3], bits_qactiv_list[3]), name="act_3"),

        Activation("softmax", name="softmax")

    ])

    qmodel.build((None,input_width,input_width,input_channels))

    qmodel.compile(Adam(lr=0.001), loss=['categorical_crossentropy'], metrics=['accuracy'], run_eagerly=True)

    qmodel.summary()

# Case with fused BN

## Original Keras model that we want to convert in QKeras
BatchNorm is folded

In [None]:
if fused_batchnorm == 1:
    
    model = tf.keras.models.Sequential([

        MaxPooling2D(pool_size=pool_size_list[0], name="pool"),

        # Example of Conv2D without Batchnorm
        Conv2D(filters_list[0], kernel_size_list[0], strides_list[0], pads_list[0], name="conv2d_0"),
        ReLU(name="relu_0"),

        # Example of Conv2D with Batchnorm
        Conv2D(filters_list[1], kernel_size_list[1], strides_list[1], pads_list[1], name="conv2d_1"),
        # Not needed because folded weights of qmodel will be transfered inside Conv2D (see later cells)
        #BatchNormalization(name="bn_1"),
        ReLU(name="relu_1"),

        Flatten(name="flatten"),

        # Example of Dense without Batchnorm
        Dense(units_list[0], name="dense"),
        Activation("softmax", name="softmax")

    ])

    model.build((None,input_width,input_width,input_channels))

    model.compile(Adam(lr=0.001), loss=['categorical_crossentropy'], metrics=['accuracy'], run_eagerly=True)

    model.summary()

## QKeras model with new activation layer "quantized_bits_featuremap"
quantized_bits_featuremap(bits,integer,symmetric,keep_negative,alpha,scale_axis)

In [None]:
if fused_batchnorm == 1:
    
    qmodel = tf.keras.models.Sequential([

        MaxPooling2D(pool_size_list[0], name="pool"),
        
        QActivation("quantized_bits_featuremap(%s,%s,1,1,alpha='auto',scale_axis=0)" % \
                    (bits_qactiv_list[0], bits_qactiv_list[0]), name="act_0"),

        # Example of Conv2D without Batchnorm
        QConv2D(filters_list[0], kernel_size_list[0], strides_list[0], pads_list[0],
                kernel_quantizer="quantized_bits(%s,%s,1,1,alpha='auto')" % \
                                 (bits_qweight_list[0], bits_qweight_list[0]), 
                bias_quantizer="quantized_bits(31,31,1,1,alpha='auto')", name="conv2d_0"),
                #activation="relu"), # This way applies quantized_relu() that we do not want
        ReLU(name="relu_0"), # Use this instead
        QActivation("quantized_bits_featuremap(%s,%s,1,1,alpha='auto',scale_axis=0)" % \
                    (bits_qactiv_list[1], bits_qactiv_list[1]), name="act_1"),

        # Example of Conv2D with fused Batchnorm
        QConv2DBatchnorm(filters_list[1], kernel_size_list[1], strides_list[1], pads_list[1],
                         kernel_quantizer="quantized_bits(%s,%s,1,1,alpha='auto')" % \
                         (bits_qweight_list[1], bits_qweight_list[1]), 
                         bias_quantizer="quantized_bits(31,31,1,1,alpha='auto')", name="conv2d_1"),
        FakeLayer(name="bn_1"),
        #QConv2D(filters_list[1], kernel_size_list[1], strides_list[1], pads_list[1],
        #        kernel_quantizer="quantized_bits(%s,%s,1,1,alpha='auto')" % \
        #        (bits_qweight_list[1], bits_qweight_list[1]), 
        #        bias_quantizer="quantized_bits(31,31,1,1,alpha='auto')", name="conv2d_1"),
        #BatchNormalization(name="bn_1"),
        ReLU(name="relu_1"),
        QActivation("quantized_bits_featuremap(%s,%s,1,1,alpha='auto',scale_axis=0)" % \
                    (bits_qactiv_list[2], bits_qactiv_list[2]), name="act_2"),

        Flatten(name="flatten"),

        # Example of Dense without Batchnorm (scale_axis=2 for per-layer quantization for Dense)
        QDense(units_list[0],
               kernel_quantizer="quantized_bits(%s,%s,1,1,alpha='auto',scale_axis=2)" % \
                                (bits_qweight_list[2], bits_qweight_list[2]),
               bias_quantizer="quantized_bits(31,31,1,1,alpha='auto')", name="dense"),
        QActivation("quantized_bits_featuremap(%s,%s,1,1,alpha='auto',scale_axis=0)" % \
                    (bits_qactiv_list[3], bits_qactiv_list[3]), name="act_3"),

        Activation("softmax", name="softmax")

    ])

    qmodel.build((None,input_width,input_width,input_channels))

    qmodel.compile(Adam(lr=0.001), loss=['categorical_crossentropy'], metrics=['accuracy'], run_eagerly=True)

    qmodel.summary()

## Train, save, load qmodel

In [None]:
# User settings

train_model = 1
epochs = 1

save_model  = 1

if fused_batchnorm == 0:
    save_path = "./qmodel"
else:
    save_path = "./qmodel_fusedbatchnorm"
    
#------------------------------------------

if train_model == 1:
    
    qmodel.fit(x_train, y_train, batch_size=512,
               epochs=epochs, validation_split=0.25, shuffle=True)    
    
if save_model == 1:

    qmodel.save_weights(save_path, overwrite=True, save_format="tf")
    os.remove("./checkpoint")

    print("Model saved correctly")

else:
    
    qmodel.load_weights(save_path, by_name=False).expect_partial()
    
    print("Model loaded correctly")

## Transfer weights from qmodel to model

In [None]:
def my_evaluate(predictions, y):
    index_pred = np.argmax(predictions)
    index_gold = np.argmax(y)
    if index_pred != index_gold:
        return 0
    else:
        return 1
    

qlayers = []
for qlayer in qmodel.layers:
    if qlayer.get_weights():
        qlayers.append(qlayer)

layers = []
for layer in model.layers:
    if layer.get_weights():
        layers.append(layer)

        
for qlayer, layer in zip(qlayers, layers):
    
    # To disable all the printings to stdout of the functions inside this statement: 
    # https://stackoverflow.com/questions/23610585/ipython-notebook-avoid-printing-within-a-function/23611571
    with io.capture_output(stdout=True, stderr=False) as captured:
        
        print(qlayer.__class__.__name__)

        if qlayer.get_weights():
            
            print(qlayer.name)
            print("qlayer.get_weights()[0].shape:", qlayer.get_weights()[0].shape)
            print("qlayer.get_weights()[1].shape:", qlayer.get_weights()[1].shape)
            print("layer.get_weights()[0].shape:", layer.get_weights()[0].shape)
            print("layer.get_weights()[1].shape:", layer.get_weights()[1].shape)
            try:
                print("This layer IS FOLDED")
                extracted_weights = qlayer.get_folded_weights()
            except:
                print("This layer is NOT folded")
                extracted_weights = qlayer.get_weights()[0:2]
            
            print("layer.get_weights()[0][0] MODEL BEFORE")
            print(layer.get_weights()[0][0])
            print("layer.get_weights()[1] MODEL BEFORE")
            print(layer.get_weights()[1])
            layer.set_weights(copy.deepcopy(extracted_weights))
            print("layer.get_weights()[0][0] MODEL AFTER")
            print(layer.get_weights()[0][0])
            print("layer.get_weights()[1] MODEL AFTER")
            print(layer.get_weights()[1])
            
            try:
                print("qlayer.get_folded_weights()[0][0] QMODEL")
                print(qlayer.get_folded_weights()[0][0])
                print("qlayer.get_folded_weights()[1] QMODEL")
                print(qlayer.get_folded_weights()[1])
            except:
                print("qlayer.get_weights()[0][0] QMODEL")
                print(qlayer.get_weights()[0][0])
                print("qlayer.get_weights()[1] QMODEL")
                print(qlayer.get_weights()[1])
            
            result = layer.get_weights()[0:2]
            if not np.array_equal(layer.get_weights()[0], extracted_weights[0]) or \
               not np.array_equal(layer.get_weights()[1], extracted_weights[1]):
                raise Exception("Transfer weights failed")

        print("------------")
    
print("Done")

***
<a class="anchor" id="ch2"></a>
# 2) Run inference and compare model with qmodel

Go to next: [Ch. 3](#ch3).

Go to others: [Ch. 0](#ch0), [Ch. 1](#ch1), [Ch. 2](#ch2), [Ch. 3](#ch3), [Ch. 4](#ch4).

Go to [Top](#top).

In [None]:
# User settings

samples_to_run = 25

offset = 0
min_samples = offset
max_samples = offset+samples_to_run

print_prediction = True

#------------------------------------------

tot_layers = 0
for layer in qmodel.layers:
    if layer.get_weights():
        tot_layers += 1
print("tot_layers:", tot_layers)

iterations = 0
test_acc_accumulator_model = 0
test_acc_accumulator_qmodel = 0

for x, y in zip(x_test[min_samples:max_samples], y_test[min_samples:max_samples]):
        
    x_reshaped = x.reshape(1, x.shape[0], x.shape[1], x.shape[2])
    
    # Predict the samples with model, i.e. the Keras model
    pred_model = np.asarray(model.predict(x_reshaped, batch_size=1, verbose=0), dtype=np.float64)

    # Predict the samples with qmodel, i.e. the modified QKeras model
    pred_qmodel = np.asarray(qmodel.predict(x_reshaped, batch_size=1, verbose=0), dtype=np.float64)
    
    # Calculate test accuracy
    test_acc_model  = my_evaluate(pred_model,  y)
    test_acc_qmodel = my_evaluate(pred_qmodel, y)

    # Check predictions
    if print_prediction == True:
        
        print("\npred_model:\n", pred_model)
        print("pred_qmodel:\n", pred_qmodel)
        
        print("test_acc_model:   ", test_acc_model)
        print("test_acc_qmodel:  ", test_acc_qmodel)
        
    test_acc_accumulator_model  += test_acc_model
    test_acc_accumulator_qmodel += test_acc_qmodel

    iterations = iterations + 1
    
    if print_prediction == True:
        print("iteration %d/%d" % (iterations, (max_samples - min_samples)))
        print("-------------------------")
    
print("-------------------------")

print("TOT iterations:        ", iterations)
print("TOT test_acc_model:    ", test_acc_accumulator_model/iterations)
print("TOT test_acc_qmodel:   ", test_acc_accumulator_qmodel/iterations)



***
<a class="anchor" id="ch3"></a>
# 3) Extract QKeras quantization factors
Go to next: [Ch. 4](#ch4).

Go to others: [Ch. 0](#ch0), [Ch. 1](#ch1), [Ch. 2](#ch2), [Ch. 3](#ch3), [Ch. 4](#ch4).

Go to [Top](#top).

This is the complete quantization formula for a matrix multiplication operation (valid also for an FC layer) taken from [[3]#Quantized-Matrix-Multiplication-Mathematics](#https://leimao.github.io/article/Neural-Networks-Quantization/#Quantized-Matrix-Multiplication-Mathematics) (an equivalent version is Eq.7 in [2]):
$$\begin{align} Y_{q,i,j} &= z_Y + \frac{s_b}{s_Y} (b_{q, j} - z_b) + \frac{s_X s_W}{s_Y} \Bigg[ \bigg( \sum_{k=1}^{p} X_{q,i,k} W_{q, k,j} \bigg) - \bigg( z_W \sum_{k=1}^{p} X_{q,i,k} \bigg) - \bigg( z_X \sum_{k=1}^{p} W_{q, k,j} \bigg) + p z_X z_W\Bigg] \end{align}$$

There are some contributions that could be deleted if the <b>zero points</b> of weights ```z_w``` and biases ```z_b``` <b>are forced to be zero, i.e. if both the quantized range and the fake-quantized/floating-point range of weights and biases, respectively, are symmetric</b>. When this happens, affine quantization mapping is called scale quantization mapping [3]. It is relatively easy to set the quantized range to be symmetric (for example, in QKeras we just need to pass ```quantized_bits()``` with ```symmetric=1``` and ```keep_negative=1``` to both the arguments ```kernel_quantizer``` and ```bias_quantizer``` of each QKeras layer), but this is not the case for the fake-quantized/floating-point range. There are two ways to make the latter symmetric:

1) during training, by constraining weight and bias tensors to a given symmetric range of values;

2) during training, by using a different way to calculate the scaling factor ```s```. Instead of calculating it in the standard and more general way: 
$$\begin{align} s &= \frac{\beta - \alpha}{\beta_q - \alpha_q}\end{align},$$
it can be calculated as: 
$$\begin{align} s &= \frac{2 * max (abs (tensor) )}{\beta_q - \alpha_q}\end{align},$$
where ```tensor``` is the floating-point weight/bias tensor to be quantized, ```[alpha; beta]``` is the floating-point range (where in turns ```alpha``` and ```beta``` are the minimum and maximum values of the entire tensor, so there is only one scalar ```s``` for the entire tensor), ```[alphaq; betaq]``` is the quantized range (which depends on the number of bits we want to represent the quantized data). Regarding the operations, ```abs()``` calculates the absolute value of all the elements in ```tensor``` and ```max``` extracts the maximum value from each channel (so in the second formula ```s``` is an array of scaling factors). Apart from the difference related to the per-layer vs per-channel quantization, the second formula is more general because removes the constraint of searching for the minimum in the tensor and directly assumes that the floating-point range (numerator) is symmetric, even if it is not actually true, but in this way it avoids to constrain ```alpha``` and ```beta``` to be exactly equal and opposite.

[TensorFlow Lite states](https://www.tensorflow.org/lite/performance/quantization_spec#symmetric_vs_asymmetric) that they are forcing the zero points to zero, but they do not show how (maybe it is necessary to look at the source code: future work). [This guy](https://stackoverflow.com/questions/69746834/tf-lite-model-force-symmetric-filter-weights-in-fully-connected-layers) tried to implement the first approach using [tf.keras.constraints](https://www.tensorflow.org/api_docs/python/tf/keras/constraints) without success; instead QKeras follows the second approach, as you can see in source code of ```quantized_bits()``` in [quantizers.py#L586](https://github.com/google/qkeras/blob/b91d8815b31f05ddf9c7b6d62381df9be72a570a/qkeras/quantizers.py#L586).

In the light of the aforementioned, in the next cells of this notebook we will extract and save to csv files only the following quantization parameters that will be needed for the inference phase:

- ```wq``` and ```scale_w``` are the quantized weights and their scaling factors (known after training (zero-point weights = 0));

- ```bq``` and ```scale_b``` are the quantized biases and their scaling factors (known after training (zero-point biases = 0));

- ```subq1``` is the third term in the squared brackets in the quantization formula above, i.e. the summation over the quantized weights multiplied by the zero point of the input features (known after training):
$$\bigg( z_X \sum_{k=1}^{p} W_{q, k,j} \bigg)$$

- ```scale_f```, ```zeropoint_f``` are the scaling factors and the zero points of the activation layers (need calibration to get maximum absolute values of alpha and beta for each layer).

To run the notebook without issues, you should edit the file [quantizers.py](https://github.com/google/qkeras/blob/b91d8815b31f05ddf9c7b6d62381df9be72a570a/qkeras/quantizers.py) to expose the following internal variables to the external world as attributes (to ease this step, just follow the instructions in the readme file of this repo):

- ```m_i = K.cast_to_floatx(K.pow(2, self.integer))```;

- ```scale1 = (K.max(abs(x), axis=axis, keepdims=True) * 2) / levels```.

In particular, we need ```m_i``` and ```scale1``` to compute a scaling factor that matches the definition of scaling factor of TensorFlow Lite [2][3]. In fact, one might imagine that the scaling factor provided by the ```scale``` attribute of ```quantized_bits()``` is the same as the TensorFlow one: unfortunately it is not, as you can see from [quantizers.py#L608](https://github.com/google/qkeras/blob/b91d8815b31f05ddf9c7b6d62381df9be72a570a/qkeras/quantizers.py#L608). The correct scaling factor is ```scale = scale1 * m_i``` and has to be computed manually.

Finally, the following calculations show the <b>quantization of the weights</b>, which is a <b>per-channel</b>  approach, i.e. weights have a number of scaling factors and zero points equal to the number of output channels, while <b>activations are quantized in a per-layer fashion</b> (one scaling factor and one zero point for each feature map tensor). The difference is the use of ```scale_axis=0```in ```quantized_bits()``` for ```QActivation()```. The reason why per-channel quantization of activations is not implemented in QKeras, as well as in TensorFlow Lite, is because "<i>per-channel quantization of activations is much harder to implement because we cannot factor the scale factor out of the summation and would, therefore, require rescaling the accumulator for each input channel</i>" [4].

In [None]:
def extract_calibration_data(model, x_test, calibration_samples, csv_file_path_w, csv_file_path_a):

    """
    Return two dictionaries: one for weights and biases, and one for activations
    1) weight and bias: scale and zero-point are known after training (zero-point = 0)
    2) activations: scale and bias vary with the input -->
                   --> need calibration to get maximum absolute values of alpha and beta
    """
    
    
    # Define the weights and biases dictionary
    base_param_dict = {
               "w_scale": 0,
               "b_scale": 0,
               "subq1": [],
               "wq": [],
               "bq": []
    }
    
    w_layers = []
    for layer in model.layers:
        if layer.__class__.__name__ in ["QConv2D", "QConv2DBatchnorm", 
                                        "QDepthwiseConv2D", "QDepthwiseConv2DBatchnorm", 
                                        "QDense", "QDenseBatchnorm"]:
            w_layers.append(layer.name)

    # Deepcopy base dict otherwise it is always the same object
    w_dict = {k: dc(base_param_dict) for k in w_layers}

    
    # Define the activations dictionary
    base_act_dict = {
        "alpha_of_max_abs": [],
        "beta_of_max_abs": [],
        "in_scale": [],
        "in_zeropoint":[]
    }
    
    a_layers = []
    for layer in model.layers:
        if layer.__class__.__name__ in ["QActivation"]:
            a_layers.append(layer.name)
    
    a_dict = {k: dc(base_act_dict) for k in a_layers}   
                       
    
    
    
    # Extract maxium absolute values of alpha and beta of activations with calibration
    for layer in model.layers:
    
        alpha_of_list = []
        beta_of_list = []
        
        ##### FEATURES #####
        if layer.__class__.__name__ in ["QActivation"]:
                
            for iter, x in enumerate(x_test[0:calibration_samples]):

                data = x.reshape(1, x.shape[0], x.shape[1], x.shape[2])

                pred_model = np.asarray(model.predict(data, batch_size=1, verbose=0), dtype=np.float64)

                quantizer_of = layer.quantizer # it is quantized_bits_featuremap
                
                alpha_of = quantizer_of.alpha_f.numpy().flatten()
                beta_of = quantizer_of.beta_f.numpy().flatten()
                alpha_of_list.append(alpha_of)
                beta_of_list.append(beta_of)
                        
            alpha_of_max_abs = np.max(np.abs(alpha_of_list))
            beta_of_max_abs = np.max(np.abs(beta_of_list))
            
            print(f"layer.name: {layer.name}, \t alpha_of_max_abs:", alpha_of_max_abs, \
                  "\tbeta_of_max_abs:", beta_of_max_abs)
            
            a_dict[layer.name]["alpha_of_max_abs"].append(alpha_of_max_abs)
            a_dict[layer.name]["beta_of_max_abs"].append(beta_of_max_abs)
    
    
    
    # Extract weights, biases and activations
    zeropoint_of_list = []
    ready1 = 0
    
    x = x_test[0]
    data = x.reshape(1, x.shape[0], x.shape[1], x.shape[2])

    pred_model = np.asarray(model.predict(data, batch_size=1, verbose=0), dtype=np.float64)

    for iter, layer in enumerate(model.layers):
        extractor = tf.keras.Model( inputs=model.inputs,
                                   outputs=model.get_layer(layer.name).output)
        of = extractor(data).numpy()
        
        ##### WEIGHTS AND BIASES #####
        if layer.__class__.__name__ in ["QConv2D", "QConv2DBatchnorm", 
                                        "QDepthwiseConv2D", "QDepthwiseConv2DBatchnorm", 
                                        "QDense", "QDenseBatchnorm"]:

            ##### WEIGHTS #####
            if layer.__class__.__name__ in ["QConv2D", "QDepthwiseConv2D", "QDense"]:
                parameters = layer.weights
            else:
                parameters = layer.get_folded_weights() # folded weights not quantized

            w = parameters[0].numpy()
            quantizer_w = layer.get_quantizers()[0] # it is quantized_bits
            alphaq_w = quantizer_w.alphaq
            betaq_w = quantizer_w.betaq
            scale1_w = quantizer_w.scale1.numpy().flatten()
            m_i = quantizer_w.m_i.numpy().flatten()
            scale_w = scale1_w * m_i # WEIGHT SCALE

            if scale_w.any() != 0:
                tmp = np.divide(w, scale_w, dtype=np.float64)
                tmp[np.isnan(tmp)] = 0
                tmp[np.isinf(tmp)] = 0
                wq = np.clip(np.trunc(tmp + np.sign(tmp)*0.5), alphaq_w, betaq_w) # QUANTIZED WEIGHTS
            else:
                wq = np.zeros(w.size)
                raise Exception("scale_w has some values equal to 0.")
                

            # "subq1" (part 2)
            if layer.__class__.__name__ in ["QConv2D", "QConv2DBatchnorm"]:
                sum_of_weights = wq.sum(axis=(0, 1, 2))
            elif layer.__class__.__name__ in ["QDense", "QDenseBatchnorm"]:
                sum_of_weights = wq.sum(axis=0)
            if layer.__class__.__name__ in ["QDepthwiseConv2D", "QDepthwiseConv2DBatchnorm"]:
                sum_of_weights = wq.sum(axis=(0, 1))

            if ready1 == 1:
                subq1 = (zeropoint_of_list[-1] * sum_of_weights).flatten()
                ready1 = 0
                w_dict[layer.name]["subq1"] = subq1


            ##### BIASES ######
            quantizer_b = layer.get_quantizers()[1] # it is quantized_bits
            b = parameters[1].numpy()
            alphaq_b = quantizer_b.alphaq
            betaq_b = quantizer_b.betaq
            scale1_b = quantizer_b.scale1.numpy().flatten()
            m_i = quantizer_b.m_i.numpy().flatten()
            scale_b = scale1_b * m_i # BIAS SCALE


            if scale_b != 0:
                tmp = np.divide(b, scale_b, dtype=np.float64)
                tmp[np.isnan(tmp)] = 0
                tmp[np.isinf(tmp)] = 0
                bq = np.clip(np.trunc(tmp + np.sign(tmp)*0.5), alphaq_b, betaq_b) # QUANTIZED BIASES
            else:
                bq = np.zeros(b.size)

            w_dict[layer.name]['w_scale'] = scale_w
            w_dict[layer.name]["b_scale"] = scale_b
            w_dict[layer.name]["wq"] = wq
            w_dict[layer.name]["bq"] = bq

            
        ##### FEATURES #####
        elif layer.__class__.__name__ in ["QActivation"]:

            quantizer_of = layer.quantizer # it is quantized_bits_featuremap
            alphaq_of = quantizer_of.alphaq
            betaq_of = quantizer_of.betaq

            alpha_of_max_abs = a_dict[layer.name]["alpha_of_max_abs"][0]
            beta_of_max_abs = a_dict[layer.name]["beta_of_max_abs"][0]
                        
            scale_of = np.asarray([(beta_of_max_abs - alpha_of_max_abs) / \
                                   (betaq_of - alphaq_of)], dtype=np.float64)
            scale_of[np.isnan(scale_of)] = 0
            scale_of[np.isinf(scale_of)] = 0
            scale_of = scale_of[0] # ACTIVATION SCALE

            z_of = np.asarray([np.around(((beta_of_max_abs*alphaq_of - alpha_of_max_abs*betaq_of)/ \
                                          (beta_of_max_abs - alpha_of_max_abs)), 0)], dtype=np.float64)
            z_of[np.isnan(z_of)] = 0
            z_of[np.isinf(z_of)] = 0
            z_of = z_of[0] # ACTIVATION ZERO-POINT
            zeropoint_of_list.append(z_of)
            
            ready1 = 1 # "subq1" (part 1)

            a_dict[layer.name]["in_scale"].append(scale_of)
            a_dict[layer.name]["in_zeropoint"].append(z_of)

        print(f"{iter+1}/{len(model.layers)} \t{layer.name}")
        
        
 
    # Index will be determined by the first layer of nested dictionaries (layers)
    df_w = pd.DataFrame.from_dict(w_dict, orient="index")
    df_a = pd.DataFrame.from_dict(a_dict, orient="index")
    
    df_w.to_csv(csv_file_path_w)
    df_a.to_csv(csv_file_path_a)
    
    return (df_w, df_a)

In [None]:
# User settings

calibration_samples = 10

#------------------------------------------

output_folder = "./calibration_results"

try:
    os.mkdir(output_folder)
except:
    print("folder exists")
    

(df_w, df_a) = extract_calibration_data(qmodel, x_test, calibration_samples,
                                        output_folder+"/extracted_weights.csv",
                                        output_folder+"/extracted_activations.csv")

print("Extraction complete")

display(df_w)
display(df_a)

***
<a class="anchor" id="ch4"></a>
# 4) Quantized network design for AutoQKeras

Go to [Top](#top).

Go to others: [Ch. 0](#ch0), [Ch. 1](#ch1), [Ch. 2](#ch2), [Ch. 3](#ch3), [Ch. 4](#ch4).

To perform an hyperparameter search on a Keras model with AutoQKeras, we need to pass the Keras model to the first argument of the AutoQKeras class to create an AutoQKeras object. The input Keras model is automatically converted to a QKeras model during the building process by the ```quantize_model()``` method of AutoQKHyperModel class (see [autoqkeras_internal.py#L570](https://github.com/google/qkeras/blob/1ab354276a041b45cd72c300e89a7c51ec99fa35/qkeras/autoqkeras/autoqkeras_internal.py#L570)) (```quantize_model()``` does not accept an input model with QKeras layers).

The idea is to exploit this automatic convertion to realize a QKeras model according to the methodology previously described in this notebook ([Ch. 1](#ch1)) with as minimum changes as possible to the original Keras model definition, as explained here:

1) Every 2DConv, DWConv and FC layer has to be anticipated by an ```Activation``` layer with whatever activation function (we don't care about the type of activation because it will be replaced by AutoQKeras during the search with the ```activation``` values defined in the search space ```quantization_config```). In this example we are going to use "sigmoid", but any other type would be fine;

2) The last 2DConv, DWConv or FC layer of the network has also to be followed by an ```Activation``` layer, as written in the previous point;

3) Every ```BatchNormalization``` layer that follows a 2DConv or DWConv layer has to be fused with the convolution. We can use the flag ```enable_bn_folding=True``` when instantiating the AutoQKeras object to automatically do the batch normalization fusion;

4) Every activation layer associated to a convolutional or fully-connected layer, either declared as argument of the ```Activation``` layer (such as ```Activation(activation="relu")```), or declared as argument of the ```Conv2D```, ```DepthwiseConv2D``` or ```Dense``` layer (such as ```Conv2D(activation="relu")```), has to be written in the "direct" form, i.e. using a layer that has the same name of the activation (such as ```ReLU()```);

5) The activations not associated to a convolutional or fully-connected layer, usually those placed as last layer of CNNs (such as softmax), has not to be changed;

6) Any other layer, such as pooling or add layers, has not to be changed;

7) After average pooling, since the average of integers is not necessarily an integer, a quantization step is required (it would be better to use the same quantizer used for its input because its output range does not significantly change). Moreover, the dynamic of the activations changes between input and output of pooling. So an ```Activation``` layer is required after an average pooling;

8) For residual connections ("add" layers), do as the next cell shows.

To run the next cells without issues, you should edit the file [autoqkeras/autoqkeras_internal.py](https://github.com/google/qkeras/blob/master/qkeras/autoqkeras/autoqkeras_internal.py) to expose the flag ```enable_bn_folding``` to the AutoQKeras interface (externally) and to connect it to the ```AutoQKHyperModel``` class and its ```model_quantize``` method (internally). (To ease this step, just follow the instructions in the readme file of this repo).

In [6]:
if fused_batchnorm == 1:
    
    model2 = tf.keras.models.Sequential([

        MaxPooling2D(pool_size=pool_size_list[0], name="pool"),
        Activation("sigmoid", name="act_0"), # fake activation layer

        # Example of Conv2D without Batchnorm
        Conv2D(filters_list[0], kernel_size_list[0], strides_list[0], pads_list[0], name="conv2d_0"),
        ReLU(name="relu_0"),
        Activation("sigmoid", name="act_1"), # fake activation layer

        # Example of Conv2D with Batchnorm
        Conv2D(filters_list[1], kernel_size_list[1], strides_list[1], pads_list[1], name="conv2d_1"),
        # This time is needed because the folding will be carried out by AutoQKeras
        BatchNormalization(name="bn_1"),
        ReLU(name="relu_1"),
        Activation("sigmoid", name="act_2"), # fake activation layer

        Flatten(name="flatten"),

        # Example of Dense without Batchnorm
        Dense(units_list[0], name="dense"),
        Activation("sigmoid", name="act_3"), # fake activation layer
        
        Activation("softmax", name="softmax")

    ])

    model2.build((None,input_width,input_width,input_channels))

    model2.compile(Adam(lr=0.001), loss=['categorical_crossentropy'], metrics=['accuracy'], run_eagerly=True)

    model2.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
pool (MaxPooling2D)          (None, 7, 7, 1)           0         
_________________________________________________________________
act_0 (Activation)           (None, 7, 7, 1)           0         
_________________________________________________________________
conv2d_0 (Conv2D)            (None, 5, 5, 2)           20        
_________________________________________________________________
relu_0 (ReLU)                (None, 5, 5, 2)           0         
_________________________________________________________________
act_1 (Activation)           (None, 5, 5, 2)           0         
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 3, 3, 3)           57        
_________________________________________________________________
bn_1 (BatchNormalization)    (None, 3, 3, 3)           1

2024-01-19 23:36:06.503268: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.


In [16]:
# User settings

max_trials = 5
epochs_per_trial = 1

#------------------------------------------

# scale_axis=0 does a per-layer quantization, i.e. one scaling factor for the entire layer
# otherwise qkeras automatically does a per-channel quantization, 
# i.e. one scaling facor for each channel in layer
# we want a per-channel quantization for weigths/biases and a per-layer quantization for activations

quantization_config = {
    "kernel": {
        "quantized_bits(4,4,1,1,alpha='auto')": 4,
        "quantized_bits(8,8,1,1,alpha='auto')": 8,
        "quantized_bits(16,16,1,1,alpha='auto')": 16,
    },
    "bias": {
        "quantized_bits(16,16,1,1,alpha='auto')": 16,
        "quantized_bits(31,31,1,1,alpha='auto')": 31,
    },
    "activation": {
        "quantized_bits_featuremap(4,4,1,1,alpha='auto',scale_axis=0)": 4,
        "quantized_bits_featuremap(8,8,1,1,alpha='auto',scale_axis=0)": 8,
        "quantized_bits_featuremap(16,16,1,1,alpha='auto',scale_axis=0)": 16
    }
}

# Maximum values for w, b, a
limit = {
    "Dense": [16, 31, 16],
    "Conv2D": [16, 31, 16],
    "DepthwiseConv2D": [16, 31, 16],
    "Activation": [16],
    "BatchNormalization": []
}

goal = {
    "type": "bits",
    "params": {
        "delta_p": 5.0,
        "delta_n": 5.0,
        "rate": 2.0,
        "stress": 1.0,
        "input_bits": 8,
        "output_bits": 8,
        "ref_bits": 16,	 
        "config": {
            "default": ["parameters", "activations"]
        } 
    }
}

run_config = {
  "output_dir": "./autoqkeras",
  "goal": goal,
  "quantization_config": quantization_config,
  "learning_rate_optimizer": False,
  "transfer_weights": True,
  "mode": "bayesian",
  "seed": 42,
  "limit": limit,
  "tune_filters": "none",
  "tune_filters_exceptions": "none",
  "distribution_strategy": tf.distribute.get_strategy(),
  "layer_indexes": range(1, len(model2.layers) - 1),
  "max_trials": max_trials
}

autoqk_model = AutoQKeras(model2, metrics=["acc"], custom_objects={}, 
                          **run_config, overwrite=False, enable_bn_folding=True)

Limit configuration:{"Dense": [16, 31, 16], "Conv2D": [16, 31, 16], "DepthwiseConv2D": [16, 31, 16], "Activation": [16], "BatchNormalization": []}
INFO:tensorflow:Reloading Oracle from existing project ././autoqkeras/oracle.json


INFO:tensorflow:Reloading Oracle from existing project ././autoqkeras/oracle.json


learning_rate: 0.0010000000474974513
Model: "model_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
pool_input (InputLayer)      [(None, 28, 28, 1)]       0         
_________________________________________________________________
pool (MaxPooling2D)          (None, 7, 7, 1)           0         
_________________________________________________________________
act_0 (QActivation)          (None, 7, 7, 1)           0         
_________________________________________________________________
conv2d_0 (QConv2D)           (None, 5, 5, 2)           20        
_________________________________________________________________
relu_0 (ReLU)                (None, 5, 5, 2)           0         
_________________________________________________________________
act_1 (QActivation)          (None, 5, 5, 2)           0         
_________________________________________________________________
conv2d_1 (QConv2DBatch

INFO:tensorflow:Reloading Tuner from ././autoqkeras/tuner0.json


Search space summary
Default search space size: 10
conv2d_0_kernel_quantizer (Choice)
{'default': "quantized_bits(4,4,1,1,alpha='auto')", 'conditions': [], 'values': ["quantized_bits(4,4,1,1,alpha='auto')", "quantized_bits(8,8,1,1,alpha='auto')", "quantized_bits(16,16,1,1,alpha='auto')"], 'ordered': False}
conv2d_1_kernel_quantizer (Choice)
{'default': "quantized_bits(4,4,1,1,alpha='auto')", 'conditions': [], 'values': ["quantized_bits(4,4,1,1,alpha='auto')", "quantized_bits(8,8,1,1,alpha='auto')", "quantized_bits(16,16,1,1,alpha='auto')"], 'ordered': False}
dense_kernel_quantizer (Choice)
{'default': "quantized_bits(4,4,1,1,alpha='auto')", 'conditions': [], 'values': ["quantized_bits(4,4,1,1,alpha='auto')", "quantized_bits(8,8,1,1,alpha='auto')", "quantized_bits(16,16,1,1,alpha='auto')"], 'ordered': False}
act_0_activation_quantizer (Choice)
{'default': "quantized_bits_featuremap(4,4,1,1,alpha='auto',scale_axis=0)", 'conditions': [], 'values': ["quantized_bits_featuremap(4,4,1,1,alpha

In [14]:
x_train, x_val, y_train, y_val = train_test_split(x_train, y_train, test_size=0.25, random_state=42)

autoqk.fit(x_train, y_train, validation_data=(x_val, y_val), batch_size=512, epochs=epochs_per_trial)

Trial 5 Complete [00h 00m 06s]
val_score: 0.5053437117068068

Best val_score So Far: 0.5053437117068068
Total elapsed time: 00h 00m 39s
INFO:tensorflow:Oracle triggered exit


INFO:tensorflow:Oracle triggered exit


In [17]:
qmodel2 = autoqk_model.get_best_model()  
print_qmodel_summary(qmodel2)

learning_rate: 0.0010000000474974513
Model: "model_2"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
pool_input (InputLayer)      [(None, 28, 28, 1)]       0         
_________________________________________________________________
pool (MaxPooling2D)          (None, 7, 7, 1)           0         
_________________________________________________________________
act_0 (QActivation)          (None, 7, 7, 1)           0         
_________________________________________________________________
conv2d_0 (QConv2D)           (None, 5, 5, 2)           20        
_________________________________________________________________
relu_0 (ReLU)                (None, 5, 5, 2)           0         
_________________________________________________________________
act_1 (QActivation)          (None, 5, 5, 2)           0         
_________________________________________________________________
conv2d_1 (QConv2DBatch

<!--Go to next: [Ch. 5](#ch5).-->

Go to others: [Ch. 0](#ch0), [Ch. 1](#ch1), [Ch. 2](#ch2), [Ch. 3](#ch3), [Ch. 4](#ch4).

Go to [Top](#top).