# NNTool API Demonstration

This notebook contains a short demonstration on how to use NNTool programatically from Python.

First load a sample network. This is Blazeface from Greenwaves Technologies Github based NNMenu repository. NNMenu contains many preported networks for GAP.

In [None]:
!wget https://github.com/GreenWaves-Technologies/blaze_face/raw/a3645c152d34b34ea437d7d21b67f1f7051f18de/model/face_detection_front.tflite

Now import the NNMenu API. Most of the supported API is exposed through the NNGraph class. NNGraph is NNTool's internal representation of a Neural Network graph. Any method or property of NNGraph that is documented is an official API. Logging in nntool can be controlled with the standard python logging APIs. The root logger is called 'nntool'

In [None]:
from nntool.api import NNGraph
from nntool.api.utils import model_settings, quantization_options
import logging
nntool_log = logging.getLogger('nntool')
nntool_log.setLevel(logging.ERROR)

Now we load the graph

In [None]:
model = NNGraph.load_graph('face_detection_front.tflite')

# Model show returns a table of information on the Graph
# print(model.show())

# Model draw can open or save a PDF with a visual representation of the graph
# model.draw()


Now we define a little dataloader that returns normalized inputs ready for the executer. This could be extended to provide random samples from the full set, etc. IT would also be better to cache the files locally rather than downloading them each time.

If you just want to import local data you can use the import_data function or FileImporter dataloader.

In [None]:
from nntool.api.utils import import_data, FileImporter

# data = import_data('path/to/file.jpg', norm_func=lambda x: x/255)
# FileImporter.from_wildcard('/path/to/files/*.jpg')
# FileImporter.from_wildcards(('/path/to/input1_files/*.jpg', '/path/to/input2_files/*.jpg'))

In [None]:
import requests
from PIL import Image
import numpy as np

class GitHubDataLoader():
    def __init__(self, max_idx, normalize=True, return_index=False):
        self._idx = 0
        self._max_idx = max_idx
        self._normalize = normalize
        self._return_index = return_index
        self._filter = None

    def _get_url(self, idx):
        pass

    def _normalize_func(self, val):
        pass

    def _get_name(self, idx):
        return idx

    def set_filter(self, labels):
        self._filter = labels

    def __iter__(self):
        self._idx = 0
        return self

    def __next__(self):
        while True:
            if self._idx > self._max_idx:
                raise StopIteration()
            idx = self._idx
            label = self._get_name(idx)
            self._idx += 1
            if self._filter is None or label in self._filter:
                break
        # print(f"get {self._get_url(idx)}")
        with requests.get(self._get_url(idx), stream=True) as r:
            r.raise_for_status()
            r.raw.decode_content = True
            image = Image.open(r.raw)
            image = image.resize((128, 128))
            image.mode = "RGB"
        val = np.array(image, dtype=np.int8).transpose((2, 0, 1))
        if self._normalize:
            val = self._normalize_func(val)
        if self._return_index:
            return label, [val]
        return [val]

class BlazeFaceDataLoader(GitHubDataLoader):
    def __init__(self, last=130, **kwargs):
        super().__init__(last, **kwargs)

    def _normalize_func(self, val):
        return (val - 128)/128

    def _get_name(self, idx):
        return f'{idx:04d}.pgm'

    def _get_url(self, idx):
        return f"https://github.com/GreenWaves-Technologies/blaze_face/raw/a3645c152d34b34ea437d7d21b67f1f7051f18de/eval_dataset/{self._get_name(idx)}"


In [None]:
# The equivalent of the adjust command
model.adjust_order()

# The equivalent of the fusions --scale8 command. The fusions method can be given a series of fusions to apply
# fusions('name1', 'name2', etc)
model.fusions('scaled_match_group')

# draw the model here again to see the adjusted and fused graph
# model.draw()

# Lets load an image and execute the graph in float on the normalized data
data = next(BlazeFaceDataLoader())

# The executer returns all the layer output. Each layer output is an array of the outputs from each output of a layer
# Generally layers have one output but some (like a split for example) can have multiple outputs
# Here we select the first output of the last layer which in a graph with one output will always be the the
# graph output
model.execute(data)[-1][0]

Now let's quantize the graph using a small amount of sample data. You may want to use more.

In [None]:
statistics = model.collect_statistics(BlazeFaceDataLoader(last=3))
# The resulting statistics contain detailed information for each layer
statistics['input_1']

Now we can use the statistics to quantize the graph and execute it. The quantization information is saved in the model. The quantize option on execute quantizes the imput data. The dequantize option dequantizes the output after execution.

In [None]:
# quantize the model. quantization options can be supplied for a layer or for the whole model
model.quantize(statistics, schemes=['scaled'])
data = next(BlazeFaceDataLoader())

# Now execute the quantized graph outputing quantized values
print("execute model without dequantizing data")
print(model.execute(data, quantize=True)[-1][0])

# Now execute the graph twice with float and quantized versions and compare the results
print("execute model comparing float and quantized execution and showing Cosine Similarity")
cos_sim = model.cos_sim(model.execute(data), model.execute(data, quantize=True, dequantize=True))
print(cos_sim)
# the step idx can be used to index the model to find the layer with the worst cos_sim
model[np.argmin(cos_sim)]


Now lets look at how we can compress parameters of the graph using the LUT compressor in GAP9.

First we must create a Validator that validates the compressed output of the graph. In this BlazeFace case we are going to validate the QSNR of the output of the graph. The validator may be called many times by the AutoCompressor so we will cache the results of the uncompressed execution of the graph.

This is a somewhat artificial example since we are using the QSNR of the output to validate the graph. Normally the validator should use the actual labels or if audio, PESQ, etc. to validate.

In [None]:
LAST_IDX = 5

outputs = {}
for filename, data in BlazeFaceDataLoader(LAST_IDX, return_index=True):
    outputs[filename] = model.execute(data)[-1][0]

And now the validator class

In [None]:
from nntool.utils.validation_utils import ValidateBase, ValidationResultBase
from nntool.api.utils import qsnrs

class QSNRResult(ValidationResultBase):
    def __init__(self, qsnr, label, margin):
        self._qsnr = qsnr
        self._label = label
        self._margin = margin

    @property
    def validated(self):
        return self._margin >= 0

    @property
    def margin(self):
        return self._margin

class QSNRValidator(ValidateBase):

    def __init__(self, outputs, min_qsnr):
        self._outputs = outputs
        self._qsnr = min_qsnr

    def _validate(self,
                  input_tensors,
                  output_tensors,
                  input_name):
        qsnr = qsnrs([self._outputs[input_name]], output_tensors[-1])[0]
        if qsnr > 100:
            margin = 1.0
        else:
            margin = (qsnr - self._qsnr)/self._qsnr
        return QSNRResult(qsnr, input_name, margin)        

Now we can run the compressor. Since it can take a little time there is a progress function

In [None]:
from nntool.api.compression import AutoCompress, print_progress

TARGET_QSNR = 15

autocompress = AutoCompress(
    model,
    BlazeFaceDataLoader(LAST_IDX, return_index=True),
    QSNRValidator(outputs, TARGET_QSNR))
autocompress.tune_all(model.all_constants, progress=print_progress)

## Loading and executing a graph

Lets look at how we can load a graph and run on GAP. First lets retrieve a network from NNMENU

In [None]:
!wget https://github.com/GreenWaves-Technologies/image_classification_networks/raw/50ad1beb9ac784b1f5f3574beb2a4c39a46b2fbc/models/tflite_models/mobilenet_v1_1_0_224_quant.tflite


Now load into nntool and process the graph. For this to complete successfully the GAP_SDK_HOME environment variable must point to your GAP SDK directory. This can be set either in your shell startup script or in the notebook.

In [None]:
G = NNGraph.load_graph("mobilenet_v1_1_0_224_quant.tflite", load_quantization=True)
G.adjust_order()
G.fusions('scaled_match_group')

Now we can exectute the graph on GVSOC. This method creates a project, builds it and executes it. It can parse some of the output including Autotiler output and perfromance data. The GAP_SDK must be sourced to use this API. It is normal that it takes some time.

In [None]:
res = G.execute_on_target(pretty=True, at_log=True, at_loglevel=1)

Once the graph run has finished the res object will contain the requested information. Run the cell below to see the performance information as a pretty table (due to the pretty command above)

In [None]:
print(res.performance)

Now let's execute the same model again but changing some quantization and model generation settings. The nntool API defines some helper functions to create the settings dictionaries. In this case we target the graph on the NE16. The GVSOC simulation of the NE16 is quite slow so we enable output to see execution progress.

In [None]:
# NBVAL_SKIP
G.quantize(graph_options=quantization_options(use_ne16=True))
G.adjust_order()
res = G.execute_on_target(
    pretty=True,
    at_log=True,
    at_loglevel=1,
    print_output=True,
    settings=model_settings(l1_size=128000, l2_size=1000000, graph_trace_exec=True))

## Saving a graph

The intermediate graph state inside NNTool can be saved and reloaded using NNGraph.write_graph_state and
NNGraph.read_graph_state. The write process also returns a string containing a function that can be used to recreate the graph and a tensors dictionary that it requires as a parameter.

In [None]:
import os
notebook_dir = os.getcwd()
graph_file = os.path.join(notebook_dir, "mygraph.zip")
graph_function, tensors = G.write_graph_state(graphpath=graph_file)
with open(os.path.join(notebook_dir, "graph_function.py"), 'w') as file:
    file.write(graph_function)

: 

Now the saved graph can be read back with NNGraph.read_graph_state. The graph state file is actually a python zip module containing the function and tensors file so it can also be imported. Finally the returned function was saved in a file in the cell above and we can import the creation function from that file.

The creation function is a good example on how a synthetic graph can be created using the NNGraph API.

In [None]:
# create using NNGraph API
new_graph = NNGraph.read_graph_state(graph_file)
new_graph

In [None]:
# create using written graph function
from graph_function import create_graph
new_graph = create_graph(tensors)
new_graph

In [None]:
# create by importing
import sys
sys.path.insert(0, graph_file)
from mygraph import mygraph
mygraph