# Example: Auto-Tuning using TVM Platform

Autotuning is a very powerful method to optimize a given model for a specific target. In the following the usage of TVMs tuning feature is explained briefly.

*Warning:* This example only covers the TVM platform which should not be confused with the MicroTVM platform. Hence only the `tvm_cpu` (Host) target can be used for demonstartion purposes.

## Supported components

**Models:** Any (`toycar` used below)

**Frontends:** Any (`tflite` used below)

**Frameworks/Backends:** `tvmllvm` backend only

**Platforms/Targets:** `tvm_cpu` target only

**Features:** `autotune` and `autotuned` feature have to be enabled.

## Prerequisites

Set up MLonmCU as usual, i.e. initialize an environment and install all required dependencies. Feel free to use the following minimal `environment.yml.j2` template:

```yaml
---
home: "{{ home_dir }}"
logging:
  level: DEBUG
  to_file: false
  rotate: false
cleanup:
  auto: true
  keep: 10
paths:
  deps: deps
  logs: logs
  results: results
  plugins: plugins
  temp: temp
  models:
    - "{{ home_dir }}/models"
    - "{{ config_dir }}/models"
repos:
  tvm:
    url: "https://github.com/apache/tvm.git"
    ref: de6d8067754d746d88262c530b5241b5577b9aae
  tvm:
    url: "https://github.com/apache/tvm.git"
    ref: de6d8067754d746d88262c530b5241b5577b9aae
frameworks:
  default: tvm
  tvm:
    enabled: true
    backends:
      default: tvmllvm
      tvmllvm:
        enabled: true
        features:
          autotuned: true
    features: []
frontends:
  tflite:
    enabled: true
    features: []
toolchains:
  gcc: true
platforms:
  tvm:
    enabled: true
    features:
      autotune: true
targets:
  tvm_cpu:
    enabled: true
```

Do not forget to set your `MLONMCU_HOME` environment variable first if not using the default location!

## Usage

In addition to the TUNE stage in the MLonMCU flow which is skipped by default are two tuning-related features provided mu MLonMCU:
- `autotune`: Use this to enable the `TUNE` stage. Tuning records will be written as an artifact but ignored in later stages.
- `autotuned`: If this is enabled provided tuning records/metrics are used by TVM in the `BUILD` stage. If no tuning was executed in the previous stage, it will instead accept tuning logs provided by the user.

### A) Command Line Interface

Let's run a benchmark without tuning first.

In [1]:
!python -m mlonmcu.cli.main flow run toycar -b tvmllvm -t tvm_cpu

INFO - Loading environment cache from file
INFO - Successfully initialized cache
INFO - Loading extensions.py (User)
INFO - [session-393]  Processing stage LOAD
INFO - [session-393]  Processing stage BUILD
INFO - [session-393]  Processing stage RUN
INFO - All runs completed successfuly!
INFO - Postprocessing session report
INFO - [session-393] Done processing runs
INFO - Report:
   Session  Run   Model Frontend Framework  Backend Platform   Target  Runtime [s] Features                                             Config Postprocesses Comment
0      393    0  toycar   tflite       tvm  tvmllvm      tvm  tvm_cpu     0.000128       []  {'tflite.use_inout_data': False, 'tflite.visua...            []       -


Now we enable and configure the tuning as follows:

In [4]:
!python -m mlonmcu.cli.main flow run toycar -b tvmllvm -t tvm_cpu \
        -f autotune -f autotuned -c autotune.trials=100 -c tvm.print_oututs=1

INFO - Loading environment cache from file
INFO - Successfully initialized cache
INFO - Loading extensions.py (User)
INFO - [session-396]  Processing stage LOAD
INFO - [session-396]  Processing stage TUNE
INFO - [session-396]  Processing stage BUILD
INFO - [session-396]  Processing stage RUN
INFO - All runs completed successfuly!
INFO - Postprocessing session report
INFO - [session-396] Done processing runs
INFO - Report:
   Session  Run   Model Frontend Framework  Backend Platform   Target  Runtime [s]               Features                                             Config Postprocesses Comment
0      396    0  toycar   tflite       tvm  tvmllvm      tvm  tvm_cpu     0.000078  [autotuned, autotune]  {'tflite.use_inout_data': False, 'tflite.visua...            []       -


It seems like we alreadu achieve a nice performance improvement. Feel free to have a log at the generated tunig records as well:

In [8]:
!mlonmcu export /tmp/exported --run -f
!head /tmp/exported/best_tuning_results.log.txt

INFO - Loading environment cache from file
INFO - Successfully initialized cache
INFO - Loading extensions.py (User)
Creating directory: /tmp/exported
Done
{"input": ["llvm -keys=cpu ", "dense_nopack.x86", [["TENSOR", [1, 640], "int16"], ["TENSOR", [128, 640], "int16"], null, "int32"], {}], "config": {"index": 52, "code_hash": null, "entity": [["tile_y", "sp", [-1, 1]], ["tile_x", "sp", [-1, 16]], ["tile_k", "sp", [-1, 16]]]}, "result": [[1.6941999999999998e-05], 0, 0.1651315689086914, 1675771354.8624063], "version": 0.2, "tvm_version": "0.10.dev0"}
{"input": ["llvm -keys=cpu ", "dense_pack.x86", [["TENSOR", [1, 640], "int16"], ["TENSOR", [128, 640], "int16"], null, "int32"], {}], "config": {"index": 411, "code_hash": null, "entity": [["tile_y", "sp", [-1, 1, 1]], ["tile_x", "sp", [-1, 1, 4]], ["tile_k", "sp", [-1, 80]], ["tile_inner", "sp", [-1, 1]]]}, "result": [[1.0921000000000001e-05], 0, 0.28124165534973145, 1675771365.984978], "version": 0.2, "tvm_version": "0.10.dev0"}
{"input":

Alternatively we can pass previously generated tuning logs to MLonMCU using `-c autotuned.results_file=/path/to/records.txt`