# Model Serialization: Lesson Learnt from Tensorflow 1.x and 2.x

.... And Why I'm So Fucked by Tensorflow

[Youtube Stream](https://www.youtube.com/watch?v=OsYcoPYIoBE)

# About Me

- A Python Developer
- Interested in machine learning, applied math and its development
- Core developer of [`uTensor`](https://utensor.github.io/website/)

- `uTensor`

![utensor](https://raw.githubusercontent.com/uTensor/uTensor/develop/docs/img/uTensorFlow.jpg)

- `utensor_cgen`: code generator for `uTensor`

![utensor-cgen](https://raw.githubusercontent.com/uTensor/utensor_cgen/develop/doc/source/_images/utensor-cli-components.drawio.svg)

# Model Development and Deployment

- define the graph

- training the graph

- **graph transformation**: graph rewriting, including quantization, node fusion, node removal, etc.

- **saving the graph**: model serialization

## Quantization

![weight-quantization](images/weight-quantization.png)
[credit](https://docs.google.com/presentation/d/1zGm5bqGrkAepwJZ5PABiYjrIKq1pDnzafa8ZYeaFhXY/edit)

In [1]:
import warnings
warnings.filterwarnings('ignore') # to silence numpy deprecation warnings in Tensorflow 1.x

In [2]:
# Tensorflow 1.x
import tensorflow as tf
from tensorflow import import_graph_def
from tensorflow.tools.graph_transforms import TransformGraph

print(tf.__version__)

graph = tf.Graph()
with graph.as_default():
    with tf.gfile.GFile("simple_model.pb", "rb") as fid:
        graph_def = tf.GraphDef()
        graph_def.ParseFromString(fid.read())
    out_tensor, = import_graph_def(
        graph_def,
        return_elements=["y_pred:0"]
    )
out_tensor

1.13.0-rc1


<tf.Tensor 'import/y_pred:0' shape=(10,) dtype=int64>

float model

![simple-model-float](images/simple-model-float.png)

In [3]:
# Quantization in Tensorflow 1.x
quant_graph_def = TransformGraph(
    graph_def,
    inputs=[],
    outputs=["y_pred"],
    transforms=["quantize_weights", "quantize_nodes"]
)

with open('quant_simple_model.pb', 'wb') as fid:
    fid.write(quant_graph_def.SerializeToString())

quantized model

![simple-model-quant](images/simple-model-quant.png)

dynamic quantization

![simple-model-quant-zoom](images/simple-model-quant-zoom.png)

In [1]:
import tensorflow as tf

print(tf.__version__)

2.3.1


In [None]:
# Tensorflow 2.x: Tensorflow Lite
model = ... # A tensorflow.keras.Model instance, **trained**
model.save('model_path') # save model, normal keras save/load api

# trainable graph -> constant graph in TF 2.x
from tensorflow.python.framework.convert_to_constants import convert_variables_to_constants_v2

model_func = tf.function(lambda x: model(x))
model_func = model_func.get_concrete_function(tf.TensorSpec(...)) # setup the input spec
model_func = convert_variables_to_constants_v2(model_func, lower_control_flow=False)

# save the freezed graph as pb file
with open('const_graph.pb', 'wb') as fid:
    fid.write(model_func.graph.as_graph_def().SerializeToString())

# create a converter which will convert a keras model to tflite flatbuffer
converter = tf.lite.TFLiteConverter.from_keras_model(model)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
# represent_ds is an callable which return a generator that will return representative dataset
converter.representative_dataset = represent_ds

In [None]:
# tflite_buffer are bytes
tflite_buffer = converter.convert()

with open('model.tflite', 'wb') as fid:
    fid.write(tflite_buffer)

References

- [Post-Train Quantization](https://www.tensorflow.org/lite/performance/post_training_quantization)

`Keras` float model

![keras-float](images/keras-float.png)

`Keras` quantized model (TFLite)

![keras-quant](images/keras-quant.png)

## Graph Rewriting

- [Dropout Removal](https://utensor-cgen.readthedocs.io/en/latest/#use-case-dropout-layer-removal)
- [Node Fusion](https://utensor-cgen.readthedocs.io/en/latest/#use-case-node-fusion)

Implement with isomorphic subgraph matching

# Why I'm Sooo Fucked

![fucked-by-tf](images/fucked_by_TF.png)

## Inconsistent Operation Name

- ex: `Add` vs `QuantizedAdd` vs `AddOp`

- Hard to identify the type of an operation/node in the graph

- Hard to implement/test isomorphic subgraph matching

Operation name/type legalization is required

## Fused Operation

- ex: `MatMul + Add + <activation_func> => FullyConnected`

- Hard to define a **generic** intermediate representation
  - `FullyConnected => MatMul + Add + <activation_func>`?
  - `MatMul + Add + <activation_func> => FullyConnected`?
  - Which is better and why?

## Implementation Differences Across Versions

- Take `tf.nn.dropout` as example

### `Dropout` in Tensorflow 1.x

![dropout-v1](images/dropout-v1.png)

### `Dropout` in Tensorflow 2.x

![dropout-v2](images/dropout-v2.png)

## Breaking Changes of Frameworks

- Changes in quantization scheme
  - Dynamic Quantization v.s Static Quantization
  - Quantization-Awared Training

- Inconsistent Saving/Loading API

- Undocumented features
  - [Graph freezing tools in Tensorflow 2.0](https://github.com/tensorflow/tensorflow/blob/r2.0/tensorflow/python/tools/freeze_graph.py)

# Q & A

![joker](images/joker.png)