The Qualcomm AI Engine Direct SDK allows clients to run ML models on HTP hardware. The following steps describe how to prepare and execute the Stable Diffusion models on Linux platforms with HTP capability.

This document uses the term Qualcomm Neural Network (QNN) and Qualcomm AI Engine Direct SDK interchangeably.


# Prerequisites

1. Qualcomm AI Engine Direct SDK (with Ubuntu Linux support)
2. Ubuntu 20.04 installation with required packages for QNN Tools
3. Android Platform tools version 31 or greater
4. This notebook could be executed with Anaconda (with the supplied environment.yaml) or a virtual environment(venv)
5. Stable diffusion `.onnx` files and their corresponding AIMET encodings (generated via AIMET workflow)

This work flow assumes that you have generated the Stable Diffusion model artifacts following the AIMET Stable Diffusion workflow:

- Stable Diffusion text encoder model and its AIMET encodings
- Stable Diffusion U-Net model and its AIMET encodings
- Stable Diffusion Variational Auto Encoder (VAE) model and its AIMET encodings
- `fp32.npy` file - a numpy object array saved as a Python pickle that contains data that is required as part of the model conversion step. 


# Tested Environment

**Linux x86 PC**

- Distributor ID: Ubuntu
- Description:    Ubuntu 20.04.5 LTS
- Release:        20.04
- Platform: x86_64 AMD



# Workflow


The three models and encodings are processed independently via different executable QNN utilities available in the Qualcomm AI Engine Direct SDK.

To prepare Stable Diffusion models for inference, the QNN executable utilities require an Ubuntu 20.04 environment

1. Convert the `.onnx` files to their equivalent QNN representation with `A16W8` (16-bit activation and 8-bit weights)
2. Generate the QNN model libraries
3. Generate the QNN context binaries for the QNN HTP backend

After preparing the Stable Diffusion models for inference, the next step is to execute the QNN context binaries for inference on a Snapdragon Android device. See qnn_model_execution_on_android.ipynb.


![QNN Work flow](./jupyter_notebook_assets/qnn-workflow.png)

The Python environment can be set up using either Anaconda or Python virtual environment (venv).

**Note:** One of the following two steps to setup the Python environment must be executed before executing the notebook.

If you have already started the jupyer notebook, configure the Python environment before you continue. After configuring the Python environment, restart the notebook server and select the correct kernel.

# Setup

### Set up Anaconda in an Ubuntu 20.04 terminal

1. Install Anaconda from : https://repo.anaconda.com/archive/Anaconda3-2023.03-1-Linux-x86_64.sh.

2. Execute the setup script with the following command.

    `chmod a+x Anaconda3-2023.03-1-Linux-x86_64.sh && bash Anaconda3-2023.03-1-Linux-x86_64.sh`

3. Configure an Anaconda environment with the following commands in the Ubuntu 20.04 terminal.

    `conda create --name stable_diffusion_env python=3.8`
    
    `conda activate stable_diffusion_env`
    
    `conda install ipykernel` 
    
    `ipython kernel install --user --name=stable_diffusion_env` 

### Setup venv (non-Anaconda) in an Ubuntu 20.04 terminal

The following steps install the packages required to use the QNN tools in an Ubuntu 20.04 environment (Ubuntu terminal window).

1. Update the package index files.

    `sudo apt-get update`

2. Install Python3.8 and necessary packages.

    By default Ubuntu 20.04 should come with Python 3.8 and you don't need to install it again. However to reinstall it run the following command.

    `sudo bash -c 'apt-get update && apt-get install software-properties-common && add-apt-repository ppa:deadsnakes/ppa && apt-get install python3.8 python3.8-distutils libpython3.8'`

3. Install python3-pip.

    `sudo apt-get install python3-pip`

4. Install python3 virtual environnment support.

    `sudo apt install python3-virtualenv`

5. Create and activate a Python 3.8 virtual environment by executing the following commands.
    ```
    virtualenv -p /usr/bin/python3.8 venv_stable_diffusion
    source venv_stable_diffusion/bin/activate
    ```

### Install the required python packages

In [None]:
pip install --quiet -r requirements.txt

## Set up the Qualcomm AI Engine Direct SDK

The following steps configure the Qualcomm AI Engine Direct SDK, which enables running Stable Diffusion on the device. 
Execute the following on an Ubuntu 20.04 terminal. 

**NOTE:** These steps require sudo or root privileges.

1. After setting up Python and pip in Ubuntu, check QNN tool dependencies; see <QNN_SDK>/docs/QNN/general/setup.html for more information about QNN setup and the ML framework dependencies. 
2. Set the `QNN_SDK_ROOT` environment variable to the location of the Qualcomm AI Engine Directory. For **Linux**, `export QNN_SDK_ROOT="./assets/qnn_assets/unzipped_qnn_sdk_linux/"`
3. Check and install Linux dependencies.

    ```
    source $QNN_SDK_ROOT/bin/check-linux-dependency.sh
    sudo apt-get install -y libtinfo5
    ```

## Set up Python dependencies for the Qualcomm AI Engine

In [None]:
import os
#Check and install Python dependencies
QNN_SDK_ROOT="/opt/qcom/aistack/qairt/2.21.0.240401/"
!python $QNN_SDK_ROOT/bin/check-python-dependency

# Prepare Stable Diffusion Models for Inference

The following section uses the Qualcomm AI Engine Direct SDK to prepare stable diffusion models for on-target inference.

In [None]:
# Set up environment variable to reference STABLE_DIFFUSION_MODELS
STABLE_DIFFUSION_MODELS = os.path.join(os.getcwd(), "../landscapePhotoreal_v1/")

## Convert the model from ONNX representation to QNN representation

The Qualcomm AI Engine Direct SDK `qnn-onnx-conerter` tool converts a model from ONNX representation to its equivalent QNN representation in `A16W8` precision. The encoding files generated from the AIMET workflow are provided as an input to this step via the `–quantization_overrides model.encodings` option.

This step generates a `.cpp` file that represents the model as a series of QNN API calls and a `.bin` file that contains static data that is typically model weights and referenced by the `.cpp` file.

This step must be done independently for all three models.

### Generate model inputs list/data

In [None]:
inputs_pickle_path=STABLE_DIFFUSION_MODELS + '/fp32.npy'
!python3 generate_inputs.py --pickle_path $inputs_pickle_path --working_dir $STABLE_DIFFUSION_MODELS

### Set up environment variables for the Qualcomm AI Direct SDK tools

In [None]:
env = os.environ.copy()
env["QNN_SDK_ROOT"] = QNN_SDK_ROOT
env["PYTHONPATH"] = QNN_SDK_ROOT + "/benchmarks/QNN/:" + QNN_SDK_ROOT + "/lib/python"
env["PATH"] = QNN_SDK_ROOT + "/bin/x86_64-linux-clang:" + env["PATH"]
env["LD_LIBRARY_PATH"] = QNN_SDK_ROOT + "/lib/x86_64-linux-clang:" + os.environ['CONDA_PREFIX'] + "/lib"
env["HEXAGON_TOOLS_DIR"] = QNN_SDK_ROOT + "/bin/x86_64-linux-clang"

### Convert the text encoder

In [None]:
import subprocess

!mkdir -p $STABLE_DIFFUSION_MODELS/converted_text_encoder
proc = subprocess.Popen([QNN_SDK_ROOT + "/bin/x86_64-linux-clang/qnn-onnx-converter",
                  "-o", STABLE_DIFFUSION_MODELS + "/converted_text_encoder/qnn_model.cpp",
                   "-i",STABLE_DIFFUSION_MODELS + "/text_encoder_onnx/text_encoder.onnx",
                   "--input_list", STABLE_DIFFUSION_MODELS + "/text_encoder_onnx/text_encoder_input_list.txt",
                   "--act_bw", "16",
                   "--bias_bw", "32",
                   "--quantization_overrides", STABLE_DIFFUSION_MODELS + "/text_encoder_onnx/text_encoder.encodings"
                 ],stdout=subprocess.PIPE, stderr=subprocess.PIPE,env=env)
output, error = proc.communicate()
print(output.decode(),error.decode())

# Rename the model files to make them unique and helpful for subsequent stages
!mv $STABLE_DIFFUSION_MODELS/converted_text_encoder/qnn_model.cpp $STABLE_DIFFUSION_MODELS/converted_text_encoder/text_encoder.cpp
!mv $STABLE_DIFFUSION_MODELS/converted_text_encoder/qnn_model.bin $STABLE_DIFFUSION_MODELS/converted_text_encoder/text_encoder.bin
!mv $STABLE_DIFFUSION_MODELS/converted_text_encoder/qnn_model_net.json $STABLE_DIFFUSION_MODELS/converted_text_encoder/text_encoder_net.json

### Convert U-Net
Expected execution time: ~ 1.5 hours

In [None]:
!mkdir -p $STABLE_DIFFUSION_MODELS/converted_unet

proc = subprocess.Popen([QNN_SDK_ROOT + "/bin/x86_64-linux-clang/qnn-onnx-converter",
                  "-o", STABLE_DIFFUSION_MODELS + "/converted_unet/qnn_model.cpp",
                   "-i",STABLE_DIFFUSION_MODELS + "/unet_onnx/unet.onnx",
                   "--input_list", STABLE_DIFFUSION_MODELS + "/unet_onnx/unet_input_list.txt",
                   "--act_bw", "16",
                   "--bias_bw", "32",
                   "--quantization_overrides", STABLE_DIFFUSION_MODELS + "/unet_onnx/unet.encodings",
                   "-l", "input_3", "NONTRIVIAL"
                 ],stdout=subprocess.PIPE, stderr=subprocess.PIPE,env=env)
output, error = proc.communicate()
print(output.decode(),error.decode())

# Rename the model files to make them unique and helpful for subsequent stages
!mv $STABLE_DIFFUSION_MODELS/converted_unet/qnn_model.cpp $STABLE_DIFFUSION_MODELS/converted_unet/unet.cpp
!mv $STABLE_DIFFUSION_MODELS/converted_unet/qnn_model.bin $STABLE_DIFFUSION_MODELS/converted_unet/unet.bin
!mv $STABLE_DIFFUSION_MODELS/converted_unet/qnn_model_net.json $STABLE_DIFFUSION_MODELS/converted_unet/unet_net.json

### Convert the VAE decoder
Expected execution time: ~25 minutes

In [None]:
!mkdir -p $STABLE_DIFFUSION_MODELS/converted_vae_decoder

proc = subprocess.Popen([QNN_SDK_ROOT + "/bin/x86_64-linux-clang/qnn-onnx-converter",
                  "-o", STABLE_DIFFUSION_MODELS + "/converted_vae_decoder/qnn_model.cpp",
                   "-i",STABLE_DIFFUSION_MODELS + "/vae_decoder_onnx/vae_decoder.onnx",
                   "--input_list", STABLE_DIFFUSION_MODELS + "/vae_decoder_onnx/vae_decoder_input_list.txt",
                   "--act_bw", "16",
                   "--bias_bw", "32",
                   "--quantization_overrides", STABLE_DIFFUSION_MODELS + "/vae_decoder_onnx/vae_decoder.encodings"
                 ],stdout=subprocess.PIPE, stderr=subprocess.PIPE,env=env)
output, error = proc.communicate()
print(output.decode(),error.decode())


# Rename for uniqueness
!mv $STABLE_DIFFUSION_MODELS/converted_vae_decoder/qnn_model.cpp $STABLE_DIFFUSION_MODELS/converted_vae_decoder/vae_decoder.cpp
!mv $STABLE_DIFFUSION_MODELS/converted_vae_decoder/qnn_model.bin $STABLE_DIFFUSION_MODELS/converted_vae_decoder/vae_decoder.bin
!mv $STABLE_DIFFUSION_MODELS/converted_vae_decoder/qnn_model_net.json $STABLE_DIFFUSION_MODELS/converted_vae_decoder/vae_decoder_net.json

## QNN model library

The  Qualcomm AI Engine Direct SDK `qnn-model-lib-generator` compiles the model `.cpp` and `.bin` files into a shared object library for a specific target. This example generates a shared object library for x86_64-linux target.

The inputs to this stage are the `model.cpp` and `model.bin` files generated in the previous step.

### Generate the text encoder model library

In [None]:
proc = subprocess.Popen([QNN_SDK_ROOT + "/bin/x86_64-linux-clang/qnn-model-lib-generator",
                        "-c", STABLE_DIFFUSION_MODELS + "/converted_text_encoder/text_encoder.cpp",
                        "-b", STABLE_DIFFUSION_MODELS + "/converted_text_encoder/text_encoder.bin",
                        "-t", "x86_64-linux-clang",
                        "-o", STABLE_DIFFUSION_MODELS + "/converted_text_encoder"
                        ],stdout=subprocess.PIPE, stderr=subprocess.PIPE,env=env)
output, error = proc.communicate()
print(output.decode(),error.decode())

### Generate the U-Net model library
Expected execution time: ~25 minutes

In [None]:
proc = subprocess.Popen([QNN_SDK_ROOT + "/bin/x86_64-linux-clang/qnn-model-lib-generator",
                        "-c", STABLE_DIFFUSION_MODELS + "/converted_unet/unet.cpp",
                        "-b", STABLE_DIFFUSION_MODELS + "/converted_unet/unet.bin",
                        "-t", "x86_64-linux-clang",
                        "-o", STABLE_DIFFUSION_MODELS + "/converted_unet"
                        ],stdout=subprocess.PIPE, stderr=subprocess.PIPE,env=env)
output, error = proc.communicate()
print(output.decode(),error.decode())

### Generate the VAE decoder model library

In [None]:
proc = subprocess.Popen([QNN_SDK_ROOT + "/bin/x86_64-linux-clang/qnn-model-lib-generator",
                        "-c", STABLE_DIFFUSION_MODELS + "/converted_vae_decoder/vae_decoder.cpp",
                        "-b", STABLE_DIFFUSION_MODELS + "/converted_vae_decoder/vae_decoder.bin",
                        "-t", "x86_64-linux-clang",
                        "-o", STABLE_DIFFUSION_MODELS + "/converted_vae_decoder"
                        ],stdout=subprocess.PIPE, stderr=subprocess.PIPE,env=env)
output, error = proc.communicate()
print(output.decode(),error.decode())

## QNN HTP context binary

The  Qualcomm AI Engine Direct SDK `qnn-context-binary-generator` tool creates a QNN context binary applicable to the QNN HTP backend. This binary can be deployed to run on a Snapdragon Gen2 device the runs Android. This step requires the model shared object library from the previous step and the `libQnnHtp.so` library, available in the Qualcomm AI Engine Direct SDK.

Provie additional options that pertain to the QNN HTP backend by passing the `libQnnHtpBackendExtensions.so` library that implements extensions for the QNN HTP backend. The library is available in the Qualcomm AI Engine Direct SDK. The library and configurations are provided as a `.json` format as shown below. Documentation on backend extensions and configuraton parameters is available in the Qualcomm AI Engine Direct SDK Documents.

In [None]:
# HTP backend extensions config file (htp_backend_extensions.json) example
htp_backend_extensions_data = '''
{
    "backend_extensions": {
        "shared_library_path": "libQnnHtpNetRunExtensions.so",
        "config_file_path": "/tmp/htp_config.json"
    }
}
'''

# HTP backend config file (htp_config.json) example
htp_backend_config_data = '''
{
    "graphs": [{
        "vtcm_mb":8,
        "graph_names":["qnn_model"]
    }],
    "devices": [
        {
            "soc_id": 57,
            "dsp_arch": "v73",
            "cores":[{
                "core_id": 0,
                "perf_profile": "burst",
                "rpc_control_latency":100
            }]
        }
    ]
}
'''
#write the config files to a temporary location
with open('/tmp/htp_backend_extensions.json','w') as f:
    f.write(htp_backend_extensions_data)
with open('/tmp/htp_config.json','w') as f:
    f.write(htp_backend_config_data)

In [None]:
# Create a path under the models directory for serialized binaries
!mkdir -p $STABLE_DIFFUSION_MODELS/serialized_binaries

### Generate the QNN context binary for text encoder 

In [None]:
proc = subprocess.Popen([QNN_SDK_ROOT + "/bin/x86_64-linux-clang/qnn-context-binary-generator",
                             "--model", STABLE_DIFFUSION_MODELS + "/converted_text_encoder/x86_64-linux-clang/libtext_encoder.so",
                             "--backend", "libQnnHtp.so",
                             "--output_dir",  STABLE_DIFFUSION_MODELS + "/serialized_binaries",
                             "--binary_file", "text_encoder.serialized",
                             "--config_file", "/tmp/htp_backend_extensions.json",
                             "--log_level", "verbose"
                        ],stdout=subprocess.PIPE, stderr=subprocess.PIPE,env=env)
output, error = proc.communicate()
print(output.decode(),error.decode())

###  Generate the QNN context binary for U-Net
Expected execution time: ~2 minutes


In [None]:
proc = subprocess.Popen([QNN_SDK_ROOT + "/bin/x86_64-linux-clang/qnn-context-binary-generator",
                             "--model", STABLE_DIFFUSION_MODELS + "/converted_unet/x86_64-linux-clang/libunet.so",
                             "--backend", "libQnnHtp.so",
                             "--output_dir",  STABLE_DIFFUSION_MODELS + "/serialized_binaries",
                             "--binary_file", "unet.serialized",
                             "--config_file", "/tmp/htp_backend_extensions.json"
                        ],stdout=subprocess.PIPE, stderr=subprocess.PIPE,env=env)
output, error = proc.communicate()
print(output.decode(),error.decode())

### Generate the QNN context binary for VAE Decoder
Expected execution time: ~1.5 minutes

In [None]:
proc = subprocess.Popen([QNN_SDK_ROOT + "/bin/x86_64-linux-clang/qnn-context-binary-generator",
                             "--model", STABLE_DIFFUSION_MODELS + "/converted_vae_decoder/x86_64-linux-clang/libvae_decoder.so",
                             "--backend", "libQnnHtp.so",
                             "--output_dir",  STABLE_DIFFUSION_MODELS + "/serialized_binaries",
                             "--binary_file", "vae_decoder.serialized",
                             "--config_file", "/tmp/htp_backend_extensions.json"
                        ],stdout=subprocess.PIPE, stderr=subprocess.PIPE,env=env)
output, error = proc.communicate()
print(output.decode(),error.decode())

Upon completion of these steps to prepare Stable Diffusion models for inference, QNN context binaries for the three models are available in `$STABLE_DIFFUSION_MODELS/serialized_binaries/`

The next step is to execute the prepared models (now represented as serialized context binaries) on a Snapdragon Gen2 Android device using executable utilities available in the Qualcomm AI Engine Direct SDK.


Copyright (c) 2023 Qualcomm Technologies, Inc. and/or its subsidiaries.