# Dense layer tutorial

This tutorials shows how a quantized dense (fully connected) layer can be compiled to be executed on the Gemmini accelerator. The generated baremetal C code is then tested on the Spike RISC-V ISA simulator. Before starting this tutorial, you should have downloaded the Chipyard repository and installed the Spike simulator with the Gemmini extension.

In [None]:
import tensorflow as tf
import numpy as np
import os
import tvm.contrib.gemmini as gemmini
from tvm import relay
import tvm

We need to export the environment variable CHIPYARD_HOME, in order to be able to run the Spike simulator correctly.

In [None]:
os.environ["CHIPYARD_HOME"] = ""

Then we define the parameters of the layer we want to test. In this case:

In [None]:
input_height = 32
input_width = 32
output_width = 32

We will generate a prequantized TFLite model, because for now the Gemmini integration only supports models that were quantized with specific flags as input.

In [None]:
class Model(tf.Module):
    def __init__(self, name=None):
        super().__init__(name)
        self.w = tf.Variable(tf.random.normal([input_width, output_width]), name="w")
        self.b = tf.Variable(tf.random.normal([output_width]), name="b")

    @tf.function(
        input_signature=[
            tf.TensorSpec(shape=[input_height, input_width], dtype=tf.float32),
        ]
    )
    def matmul(self, x):
        return tf.linalg.matmul(x, self.w, transpose_b=False) + self.b

model = Model()

# Convert the concrete functions using TFLiteConverter
converter = tf.lite.TFLiteConverter.from_keras_model(model)


def representative_data_gen():
    dataset = [
        (
            np.array(
                np.random.randint(-127, 128, size=(input_height, input_width)), dtype=np.float32
            ),
            np.array(
                np.random.randint(-127, 128, size=(input_width, output_width)), dtype=np.float32
            ),
        )
        for s in range(100)
    ]
    for input_value in dataset:
        # Model has only one input so each data point has one element.
        yield [input_value[0]]


converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
converter.inference_input_type = tf.uint8
converter.inference_output_type = tf.int8
converter.representative_dataset = representative_data_gen
converter._experimental_disable_per_channel = True

tflite_model = converter.convert()

# Save the model.
with open("matmul.tflite", "wb") as f:
    f.write(tflite_model)


Now that we have created the model, we import the model and run it. We store the output, in order to compare it with the output that will be later obtained from the Gemmini accelerator.

In [None]:
os.system("rm -rf model.tar dev/ include/ generated-project/")

tflite_file = "./matmul.tflite"
tflite_model_buf = open(tflite_file, "rb").read()
input_tensor = "layer1_input"
input_dtype = "uint8"

os.system("mkdir -p include")

try:
    import tflite

    tflite_model = tflite.Model.GetRootAsModel(tflite_model_buf, 0)
except AttributeError:
    import tflite.Model

    tflite_model = tflite.Model.Model.GetRootAsModel(tflite_model_buf, 0)

# Load the TFLite model and allocate tensors.
interpreter = tf.lite.Interpreter(model_path=tflite_file, experimental_preserve_all_tensors=True)
interpreter.allocate_tensors()
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()
tensor_details = interpreter.get_tensor_details()

input1 = np.random.randint(0, 255, (input_height, input_width), dtype=np.uint8)
interpreter.set_tensor(input_details[0]["index"], input1)

interpreter.invoke()
expected_output = interpreter.get_tensor(output_details[0]["index"])

Here, we create C files and headers with the inputs and expected output, so that we can then execute the same operation on the Gemmini accelerator, and compare the expected output with the actual predicted one.

In [None]:
gemmini.create_header_file("inputs", "data", "input", input1, "./include")
gemmini.create_header_file("outputs", "data", "output", expected_output, "./include")

The Gemmini environment class needs to be initialized with the parameters of the Gemmini accelerator where we want to execute our operation. We use here the default parameters.

In [None]:
gemmini.Environment.init_overwrite(dim=16, acc_rows=1024, bank_rows=4096)

The TFLite model generated in the previous steps is now imported into TVM.

In [None]:
mod, params = relay.frontend.from_tflite(
    tflite_model,
    shape_dict={
        "serving_default_x:0": (input_height, input_width),
    },
    dtype_dict={
        "serving_default_x:0": input_dtype,
    },
)
mod["main"]

In order to be able to build a model for the Gemmini accelerator, we need to replace all supported layers by the Gemmini specific operators. This is done using the __gemmini.preprocess__ pass. Notice the changes in the "main" function after running the preprocess pass.

In [None]:
mod = gemmini.preprocess_pass(mod)
mod["main"]

Now, we build the Relay Graph. Notice that we are using the CRT runtime, the target is C because we want to generate C code (but the device is Gemmini), and we use the AOT executor and the USMP feature in order to get a complete bare metal C code, without calls to memory allocator APIs.

The __gemmini.build_config__ function returns a PassContext object containing the specific parameters needed to correctly build the model for the Gemmini accelerator.

In [None]:
RUNTIME = tvm.relay.backend.Runtime("crt", {"system-lib": False})
TARGET = tvm.target.target.Target({"kind": "c", "device": "gemmini"})
EXECUTOR = tvm.relay.backend.Executor("aot", options={"interface-api": "c", "unpacked-api": 1})

with gemmini.build_config(usmp_alg="hill_climb",opt_level=3, disabled_pass=["AlterOpLayout"]):
    module = relay.build(mod, executor=EXECUTOR, runtime=RUNTIME, target=TARGET, params=params)

The builded model is exported to the model library format. This will be used in the next steps to generate the baremetal project.

In [None]:
import pathlib

os.system("mkdir dev")
model_library_format_tar_path = pathlib.Path(pathlib.Path.cwd(), "dev/model.tar")
tvm.micro.export_model_library_format(module, model_library_format_tar_path)

import tarfile

with tarfile.open(model_library_format_tar_path, "r:*") as tar_f:
    print("\n".join(f" - {m.name}" for m in tar_f.getmembers()))

Here, we create the test project, using the example project provided for this tutorial in the Gemmini microTVM template projects.

In [None]:
template_project_path = pathlib.Path(tvm.micro.get_microtvm_template_projects("gemmini"))
project_options = {
    "project_type": "dense_example"
}  

generated_project_dir = pathlib.Path(pathlib.Path.cwd(), "generated-project")
generated_project = tvm.micro.generate_project(
    template_project_path, module, generated_project_dir, project_options
)

We build the project. This will generate an executable we can run on the Spike simulator.

In [None]:
generated_project.build()

Finally, we execute the compiled baremetal project on the Spike simulator.

Note: if there are errors, these can be related to rounding errors.

In [None]:
generated_project.flash()