# Supertonic-2 QNN Porting Guide

This notebook walks through the complete process of porting **Supertone/supertonic-2** TTS model to Qualcomm QNN for deployment on QCS6490.

**Model**: [Supertone/supertonic-2](https://huggingface.co/Supertone/supertonic-2) (66M params, ONNX)

## Pipeline Architecture

```
Text Input → [Duration Predictor] → duration
          → [Text Encoder] → text_emb  
          → [Vector Estimator x N steps] → denoised_latent
          → [Vocoder] → Audio WAV (44.1kHz)
```

## Model Specifications (Actual ONNX Shapes)

| Model | Size | Key Inputs | Output |
|-------|------|------------|--------|
| Duration Predictor | 1.5 MB | text_ids [1,seq], style_dp [1,8,16], text_mask [1,1,seq] | duration [1] |
| Text Encoder | 27 MB | text_ids [1,seq], style_ttl [1,50,256], text_mask [1,1,seq] | text_emb [1,256,seq] |
| Vector Estimator | 132 MB | noisy_latent [1,144,lat], text_emb [1,256,seq], +5 more | denoised_latent [1,144,lat] |
| Vocoder | 101 MB | latent [1,144,lat] | wav_tts [1,samples] |

Note: `seq` = text sequence length (dynamic), `lat` = latent length (dynamic)

## Step 0: Environment Setup

In [9]:
import os
import json
import numpy as np
import onnxruntime as ort

## Step 1: Generate Calibration Data

Run the ONNX models with representative inputs to create calibration data for QNN quantization.

In [1]:
%%bash
unset PYTHONPATH
unset LD_LIBRARY_PATH
source /opt/qcom/aistack/qairt/2.37.1.250807/bin/envsetup.sh

[INFO] QAIRT_SDK_ROOT=/opt/qcom/aistack/qairt/2.37.1.250807
[WARN] QNN_SDK_ROOT/SNPE_ROOT set to QAIRT_SDK_ROOT for backwards compatibility and will be deprecated in a future release.
[INFO] QAIRT SDK environment setup complete


## Step 2: ONNX → QNN Conversion

Convert each ONNX model to QNN format with HTP (Hexagon Tensor Processor) optimization and 8-bit quantization.

In [None]:
%%bash
# Convert Duration Predictor
unset PYTHONPATH
unset LD_LIBRARY_PATH
source /opt/qcom/aistack/qairt/2.37.1.250807/bin/envsetup.sh

qnn-onnx-converter \
    --input_network ./model/onnx/duration_predictor.onnx \
    --input_dim text_ids 1,128 \
    --input_dim style_dp 1,8,16 \
    --input_dim text_mask 1,1,128 \
    --output_path ./QNN_Models/duration_predictor_htp.cpp \
    --input_list ./calibration_data/duration_predictor_input_list.txt \
    --param_quantizer tf \
    --act_quantizer tf \
    --weights_bitwidth 8 \
    --act_bitwidth 16 

In [None]:
%%bash
# Compile all 4 models for aarch64 target
unset LD_LIBRARY_PATH
source /opt/qcom-sdk/environment-setup-armv8-2a-qcom-linux
source /opt/qcom/aistack/qairt/2.37.1.250807/bin/envsetup.sh

MODELS=("duration_predictor_htp" "text_encoder_htp" "vector_estimator_htp" "vocoder_htp")

for MODEL in "${MODELS[@]}"; do
    echo "=== Compiling $MODEL ==="
    qnn-model-lib-generator \
        -c ./QNN_Models/${MODEL}.cpp \
        -b ./QNN_Models/${MODEL}.bin \
        -o ./QNN_Model_lib/ \
        -t aarch64-oe-linux-gcc11.2
done

echo "=== All models compiled ==="
ls -lh QNN_Model_lib/aarch64-oe-linux-gcc11.2/*.so

## Key Configuration Notes

**Quantization Settings:**
- Weights: 8-bit (w8)
- Activations: 8-bit (a8)
- Float operations: 32-bit
- Quantizers: TensorFlow (tf) for both params and activations

**Critical Requirement:**
- QNN automatically converts INT64 inputs to INT32
- All `text_ids` inputs in calibration data must be INT32 (not INT64)
- Other inputs (FP32) remain unchanged

**Fixed Input Dimensions:**
- text_ids: [1, 128]
- style_dp: [1, 8, 16]
- style_ttl: [1, 50, 256]
- noisy_latent: [1, 144, 192]
- latent: [1, 144, 192]

## Step 3: Compile to Shared Libraries

Compile all QNN models into `.so` shared libraries for ARM64 deployment.

In [None]:
%%bash
# Convert Vector Estimator
unset PYTHONPATH
unset LD_LIBRARY_PATH
source /opt/qcom/aistack/qairt/2.37.1.250807/bin/envsetup.sh

qnn-onnx-converter \
    --input_network ./model/onnx/vector_estimator.onnx \
    --input_dim noisy_latent 1,144,192 \
    --input_dim text_emb 1,256,128 \
    --input_dim style_ttl 1,50,256 \
    --input_dim latent_mask 1,1,192 \
    --input_dim text_mask 1,1,128 \
    --input_dim current_step 1 \
    --input_dim total_step 1 \
    --output_path ./QNN_Models/vector_estimator_htp.cpp \
    --input_list ./calibration_data/vector_estimator_input_list.txt \
    --param_quantizer tf \
    --act_quantizer tf \
    --weights_bitwidth 8 \
    --act_bitwidth 8 \
    --float_bitwidth 32

In [None]:
%%bash
# Convert Text Encoder
unset PYTHONPATH
unset LD_LIBRARY_PATH
source /opt/qcom/aistack/qairt/2.37.1.250807/bin/envsetup.sh

qnn-onnx-converter \
    --input_network ./model/onnx/text_encoder.onnx \
    --input_dim text_ids 1,128 \
    --input_dim style_ttl 1,50,256 \
    --input_dim text_mask 1,1,128 \
    --output_path ./QNN_Models/text_encoder_htp.cpp \
    --input_list ./calibration_data/text_encoder_input_list.txt \
    --param_quantizer tf \
    --act_quantizer tf \
    --weights_bitwidth 8 \
    --act_bitwidth 8 \
    --float_bitwidth 32

In [8]:
%%bash
unset LD_LIBRARY_PATH
source /home/advantech/qcom-wayland_sdk/environment-setup-armv8-2a-qcom-linux
source /opt/qcom/aistack/qairt/2.37.1.250807/bin/envsetup.sh

qnn-model-lib-generator \
    -c ./QNN_Models/duration_predictor_htp.cpp \
    -b ./QNN_Models/duration_predictor_htp.bin \
    -o ./QNN_Model_lib/ \
    -t aarch64-oe-linux-gcc11.2

SDK environment now set up; additionally you may now run devtool to perform development tasks.
Run devtool --help for further details.
[INFO] QAIRT_SDK_ROOT=/opt/qcom/aistack/qairt/2.37.1.250807
[WARN] QNN_SDK_ROOT/SNPE_ROOT set to QAIRT_SDK_ROOT for backwards compatibility and will be deprecated in a future release.
[INFO] QAIRT SDK environment setup complete
2026-02-15 23:04:47,744 -    INFO - qnn-model-lib-generator: Model cpp file path  : QNN_Models/duration_predictor_htp.cpp
2026-02-15 23:04:47,744 -    INFO - qnn-model-lib-generator: Model bin file path  : QNN_Models/duration_predictor_htp.bin
2026-02-15 23:04:47,744 -    INFO - qnn-model-lib-generator: Library target       : [['aarch64-oe-linux-gcc11.2']]
2026-02-15 23:04:47,744 -    INFO - qnn-model-lib-generator: Library name         : duration_predictor_htp
2026-02-15 23:04:47,744 -    INFO - qnn-model-lib-generator: Output directory     : QNN_Model_lib
2026-02-15 23:04:47,744 -    INFO - qnn-model-lib-generator: Output libra