diff --git a/content/learning-paths/mobile-graphics-and-gaming/run-stable-audio-open-small-with-lite-rt/1-prerequisites.md b/content/learning-paths/mobile-graphics-and-gaming/run-stable-audio-open-small-with-lite-rt/1-prerequisites.md index dc71fb22b3..a5435413d0 100644 --- a/content/learning-paths/mobile-graphics-and-gaming/run-stable-audio-open-small-with-lite-rt/1-prerequisites.md +++ b/content/learning-paths/mobile-graphics-and-gaming/run-stable-audio-open-small-with-lite-rt/1-prerequisites.md @@ -19,7 +19,7 @@ Your first task is to prepare a development environment with the required softwa ### Create workspace directory -Create a separate directory for all dependencies and repositories that this Learning Path uses. +Create a separate directory for all the dependencies and repositories that this Learning Path uses. Export the `WORKSPACE` variable to point to this directory, which you will use in the following steps: @@ -74,7 +74,7 @@ See the [CMake install guide](/install-guides/cmake/) for troubleshooting instru ### Install Bazel -Bazel is an open-source build tool which we will use to build LiteRT libraries. +Bazel is an open-source build tool which you will use to build LiteRT libraries. {{< tabpane code=true >}} {{< tab header="Linux">}} @@ -98,22 +98,24 @@ wget https://dl.google.com/android/repository/android-ndk-r25b-linux.zip unzip android-ndk-r25b-linux.zip {{< /tab >}} {{< tab header="MacOS">}} -brew install --cask android-studio temurin +wget https://dl.google.com/android/repository/android-ndk-r25b-darwin.zip +unzip android-ndk-r25b-darwin +mv android-ndk-r25b-darwin ~/Library/Android/android-ndk-r25b {{< /tab >}} {{< /tabpane >}} -For easier access and execution of Android NDK tools, add these to the `PATH` and set the `ANDROID_NDK` variable: +For easier access and execution of Android NDK tools, add these to the `PATH` and set the `NDK_PATH` variable: {{< tabpane code=true >}} {{< tab header="Linux">}} -export ANDROID_NDK=$WORKSPACE/android-ndk-r25b/ -export PATH=$ANDROID_NDK/toolchains/llvm/prebuilt/linux-x86_64/bin/:$PATH +export NDK_PATH=$WORKSPACE/android-ndk-r25b/ +export PATH=$NDK_PATH/toolchains/llvm/prebuilt/linux-x86_64/bin/:$PATH {{< /tab >}} {{< tab header="MacOS">}} -export ANDROID_NDK=~/Library/Android/sdk/ndk/27.0.12077973/ -export PATH=$PATH:$ANDROID_NDK/toolchains/llvm/prebuilt/darwin-x86_64/bin +export NDK_PATH=~/Library/Android/android-ndk-r25b +export PATH=$PATH:$NDK_PATH/toolchains/llvm/prebuilt/darwin-x86_64/bin export PATH=$PATH:~/Library/Android/sdk/cmdline-tools/latest/bin {{< /tab >}} {{< /tabpane >}} -Now that your development environment is ready and all pre-requisites installed, you can test the Audio Stable Open model. +Now that your development environment is ready and all the prerequisites are installed, you can move on to test the Stable Audio Open Small model. diff --git a/content/learning-paths/mobile-graphics-and-gaming/run-stable-audio-open-small-with-lite-rt/2-testing-model.md b/content/learning-paths/mobile-graphics-and-gaming/run-stable-audio-open-small-with-lite-rt/2-testing-model.md index ca6758fb39..beef52daea 100644 --- a/content/learning-paths/mobile-graphics-and-gaming/run-stable-audio-open-small-with-lite-rt/2-testing-model.md +++ b/content/learning-paths/mobile-graphics-and-gaming/run-stable-audio-open-small-with-lite-rt/2-testing-model.md @@ -8,12 +8,12 @@ layout: learningpathall ## Download the model -Stable Audio Open is an open-source model optimized for generating short audio samples, sound effects, and production elements using text prompts. +Stable Audio Open Small is an open-source model optimized for generating short audio samples, sound effects, and production elements using text prompts. [Log in](https://huggingface.co/login) to HuggingFace and navigate to the model landing page: ```bash -https://huggingface.co/stabilityai/stable-audio-open-small +https://huggingface.co/stabilityai/stable-audio-open-small/tree/main ``` You may need to fill out a form with your contact information to use the model: @@ -26,15 +26,11 @@ Download and copy the configuration file `model_config.json` and the model itsel ls $WORKSPACE/model_config.json $WORKSPACE/model.ckpt ``` -## Test the model +You can learn more about this model [here](https://huggingface.co/stabilityai/stable-audio-open-small). -To test the model, use the Stable Audio demo site, which lets you experiment directly through a web-based interface: +### Good prompting practices -```bash -https://stableaudio.com/ -``` - -Use the UI to enter a prompt. A good prompt can include: +A good prompt for the Stable Audio Open Small model can include the following elements: * Music genre and subgenre. * Musical elements (texture, rhythm and articulation). @@ -45,5 +41,5 @@ The order of prompt parameters matters. For more information, see the [Prompt st You can explore training and inference code for audio generation models in the [Stable Audio Tools repository](https://github.com/Stability-AI/stable-audio-tools). -Now that you've downloaded and tested the model, continue to the next section to convert the model to LiteRT. +Now that you've downloaded the model, you're ready to convert it to LiteRT format in the next step. diff --git a/content/learning-paths/mobile-graphics-and-gaming/run-stable-audio-open-small-with-lite-rt/3-converting-model.md b/content/learning-paths/mobile-graphics-and-gaming/run-stable-audio-open-small-with-lite-rt/3-converting-model.md index 18c9653308..c180670e3e 100644 --- a/content/learning-paths/mobile-graphics-and-gaming/run-stable-audio-open-small-with-lite-rt/3-converting-model.md +++ b/content/learning-paths/mobile-graphics-and-gaming/run-stable-audio-open-small-with-lite-rt/3-converting-model.md @@ -1,12 +1,15 @@ --- -title: Convert Open Stable Audio Small model to LiteRT +title: Convert Stable Audio Open Small model to LiteRT weight: 4 ### FIXED, DO NOT MODIFY layout: learningpathall --- +In this section, you will learn about the audio generation model. You will then clone a repository that contains the scripts required to convert the model submodules into LiteRT format and generate the inference application. -## Stable Audio Open Small Model +## Stable Audio Open Small + +The open-source model consists of three main submodules. They are described in the table below, and come together through the pipeline shown in the image. |Submodule|Description| |------|------| @@ -14,18 +17,21 @@ layout: learningpathall |Diffusion Transformer (DiT)| Denoises random noise over multiple steps to produce structured latent audio, guided by conditioner embeddings. | |AutoEncoder| Compresses audio waveforms into a latent representation for processing by the DiT model, and decompresses the output back into audio. | -The submodules work together to provide the pipeline as shown below: + ![Model structure#center](./model.png) -As part of this section, you will covert each of the three submodules into [LiteRT](https://ai.google.dev/edge/litert) format, using two separate conversion routes: -1. Conditioners submodule - ONNX to LiteRT using [onnx2tf](https://github.com/PINTO0309/onnx2tf) tool. -2. DiT and AutoEncoder submodules - PyTorch to LiteRT using Google AI Edge Torch tool. +In this section, you will explore two different conversion routes, to convert the submodules to [LiteRT](https://ai.google.dev/edge/litert) format. Both methods will be run using Python wrapper scripts from the examples repository. + +1. **ONNX to LiteRT**: using the `onnx2tf` tool. This is the traditional two-step approach (PyTorch -> ONNX -> LiteRT). You will use it to convert the Conditioners submodule. + +2. **PyTorch to LiteRT**: using the Google AI Edge Torch tool. You will use this tool to convert the DiT and AutoEncoder submodules. + -### Create virtual environment and install dependencies +## Download the sample code The Conditioners submodule is made of the T5Encoder model. You will use the ONNX to TFLite conversion for this submodule. -To avoid dependency issues, create a virtual environment. In this guide, we will use `virtualenv`: +To avoid dependency issues, create a virtual environment. For example, you can use the following command: ```bash cd $WORKSPACE @@ -37,11 +43,11 @@ Clone the examples repository: ```bash cd $WORKSPACE -git clone https://github.com/ARM-software/ML-examples/tree/main/kleidiai-examples/audiogen -cd audio-stale-open-litert +git clone https://github.com/ARM-software/ML-examples.git +cd ML-examples/kleidiai-examples/audiogen/ ``` -We now install the needed python packages for this, including *onnx2tf* and *ai_edge_litert* +Install the required Python packages for this, including *onnx2tf* and *ai_edge_litert* ```bash bash install_requirements.sh @@ -61,20 +67,19 @@ ImportError: cannot import name 'AttrsDescriptor' from 'triton.compiler.compiler ($WORKSPACE/env/lib/python3.10/site-packages/triton/compiler/compiler.py) ``` -Install the following dependency and rerun the script: +Reinstall the following dependency: ```bash pip install triton==3.2.0 -bash install_requirements.sh ``` {{% /notice %}} ### Convert Conditioners Submodule -The Conditioners submodule is based on the T5Encoder model. We convert it first to ONNX, then to LiteRT. +The Conditioners submodule is based on the T5Encoder model. First, convert it to ONNX, then to LiteRT. -For this conversion we include the following steps: -1. Load the Conditioners submodule from the Stable Audio Open model configuration and checkpoint. +For this conversion, the following steps are required: +1. Load the Conditioners submodule from the Stable Audio Open Small model configuration and checkpoint. 2. Export the Conditioners submodule to ONNX via *torch.onnx.export()*. 3. Convert the resulting ONNX file to LiteRT using *onnx2tf*. @@ -84,28 +89,29 @@ You can use the provided script to convert the Conditioners submodule: python3 ./scripts/export_conditioners.py --model_config "$WORKSPACE/model_config.json" --ckpt_path "$WORKSPACE/model.ckpt" ``` -After successful conversion, you now have a `conditioners.onnx` model in your current directory. +After successful conversion, you now have a `tflite_conditioners` directory containing models with different precisions (e.g., float16, float32). + +You will be using the float32.tflite model for on-device inference. ### Convert DiT and AutoEncoder -To convert the DiT and AutoEncoder submodules, use the [Generative API](https://github.com/google-ai-edge/ai-edge-torch/tree/main/ai_edge_torch/generative/) provided by the ai-edge-torch tools. This enables you to export a generative PyTorch model directly to tflite using three main steps: +To convert the DiT and AutoEncoder submodules, use the [Generative API](https://github.com/google-ai-edge/ai-edge-torch/tree/main/ai_edge_torch/generative/) provided by the ai-edge-torch tools. This enables you to export a generative PyTorch model directly to `.tflite` using three main steps: 1. Model re-authoring. 2. Quantization. 3. Conversion. -Convert the DiT and AutoEncoder submodules using the provided python script: +Convert the DiT and AutoEncoder submodules using the provided Python script: + ```bash -CUDA_VISIBLE_DEVICES="" python3 ./scripts/export_dit_autoencoder.py --model_config "$WORKSPACE/model_config.json" --ckpt_path "$WORKSPACE/model.ckpt" +python3 ./scripts/export_dit_autoencoder.py --model_config "$WORKSPACE/model_config.json" --ckpt_path "$WORKSPACE/model.ckpt" ``` -After successful conversion, you now have `dit_model.tflite` and `autoencoder_model.tflite` models in your current directory and can deactivate the virtual environment: +After successful conversion, you now have `dit_model.tflite` and `autoencoder_model.tflite` models in your current directory. -```bash -deactivate -``` +A more detailed explanation of the above scripts is available [here](https://github.com/ARM-software/ML-examples/blob/main/kleidiai-examples/audiogen/scripts/README.md). -For easier access, we add all needed models to one directory: +For easy access, add all the required models to one directory: ```bash export LITERT_MODELS_PATH=$WORKSPACE/litert-models @@ -115,7 +121,7 @@ cp dit_model.tflite $LITERT_MODELS_PATH cp autoencoder_model.tflite $LITERT_MODELS_PATH ``` -With all three submodules converted to LiteRT format, you're ready to build LiteRT and run the model on a mobile device in the next step. +With all three submodules now converted to LiteRT format, you're ready to build the runtime and run Stable Audio Open Small directly on an Android device in the next step. diff --git a/content/learning-paths/mobile-graphics-and-gaming/run-stable-audio-open-small-with-lite-rt/4-building-litert.md b/content/learning-paths/mobile-graphics-and-gaming/run-stable-audio-open-small-with-lite-rt/4-building-litert.md index c3797a24b9..f61a11611b 100644 --- a/content/learning-paths/mobile-graphics-and-gaming/run-stable-audio-open-small-with-lite-rt/4-building-litert.md +++ b/content/learning-paths/mobile-graphics-and-gaming/run-stable-audio-open-small-with-lite-rt/4-building-litert.md @@ -8,7 +8,7 @@ layout: learningpathall ## LiteRT -LiteRT (short for Lite Runtime), formerly known as TensorFlow Lite, is Google's high-performance runtime for on-device AI. +LiteRT (short for Lite Runtime), formerly known as TensorFlow Lite, is Google's high-performance runtime for on-device AI. Designed for low-latency, resource-efficient execution, LiteRT is optimized for mobile and embedded environments — making it a natural fit for Arm CPUs running models like Stable Audio Open Small. You’ll build the runtime using the Bazel build tool. ## Build LiteRT libraries @@ -20,55 +20,51 @@ git clone https://github.com/tensorflow/tensorflow.git tensorflow_src cd tensorflow_src ``` -We will use a specific commit of tensorflow for build so you can checkout and set the `TF_SRC_PATH`: +Check out the specified commit of TensorFlow, and set the `TF_SRC_PATH`: ```bash git checkout 84dd28bbc29d75e6a6d917eb2998e4e8ea90ec56 export TF_SRC_PATH=$(pwd) ``` -We can use `bazel` to build LiteRT libraries, first we use configure script to create a custom configuration for this: +A script is available to configure the `bazel` build environment. Run it to create a custom TFLite build for Android: -You can now create a custom TFLite build for android: - -Ensure the `ANDROID_NDK` variable is set to your previously installed Android NDK: +{{% notice Reminder %}} +Ensure the `NDK_PATH` variable is set to your previously installed Android NDK: {{< tabpane code=true >}} {{< tab header="Linux">}} -export ANDROID_NDK=$WORKSPACE/android-ndk-r25b/ -export PATH=$ANDROID_NDK/toolchains/llvm/prebuilt/linux-x86_64/bin/:$PATH +export NDK_PATH=$WORKSPACE/android-ndk-r25b/ +export PATH=$NDK_PATH/toolchains/llvm/prebuilt/linux-x86_64/bin/:$PATH {{< /tab >}} {{< tab header="MacOS">}} -export TF_CXX_FLAGS="-DTF_MAJOR_VERSION=0 -DTF_MINOR_VERSION=0 -DTF_PATCH_VERSION=0 -DTF_VERSION_SUFFIX=''" -export ANDROID_NDK=~/Library/Android/sdk/ndk/27.0.12077973/ -export PATH=$PATH:$ANDROID_NDK/toolchains/llvm/prebuilt/darwin-x86_64/bin -export PATH=$PATH:~/Library/Android/sdk/cmdline-tools/latest/bin +export NDK_PATH=~/Library/Android/android-ndk-r25b +export PATH=$PATH:$NDK_PATH/toolchains/llvm/prebuilt/darwin-x86_64/bin {{< /tab >}} {{< /tabpane >}} +{{% /notice %}} -Set the TensorFlow version +The configuration script is interactive. Run it using the command below, and use the table to set the parameters for this Learning Path use-case. ```bash -export TF_CXX_FLAGS="-DTF_MAJOR_VERSION=0 -DTF_MINOR_VERSION=0 -DTF_PATCH_VERSION=0 -DTF_VERSION_SUFFIX=''" -``` - - -Now you can configure TensorFlow. Here you can set the custom build parameters needed as follows: - -```bash { output_lines = "2-14" } python3 ./configure.py -Please specify the location of python. [Default is $WORKSPACE/bin/python3]: -Please input the desired Python library path to use. Default is [$WORKSPACE/lib/python3.10/site-packages] -Do you wish to build TensorFlow with ROCm support? [y/N]: n -Do you wish to build TensorFlow with CUDA support? [y/N]: n -Do you want to use Clang to build TensorFlow? [Y/n]: n -Would you like to interactively configure ./WORKSPACE for Android builds? [y/N]: y -Please specify the home path of the Android NDK to use. [Default is /home/user/Android/Sdk/ndk-bundle]: /home/user/Workspace/tools/ndk/android-ndk-r25b -Please specify the (min) Android NDK API level to use. [Available levels: [16, 17, 18, 19, 21, 22, 23, 24, 26, 27, 28, 29, 30, 31, 32, 33]] [Default is 21]: 30 -Please specify the home path of the Android SDK to use. [Default is /home/user/Android/Sdk]: -Please specify the Android SDK API level to use. [Available levels: ['31', '33', '34', '35']] [Default is 35]: -Please specify an Android build tools version to use. [Available versions: ['30.0.3', '34.0.0', '35.0.0']] [Default is 35.0.0]: ``` -Once the bazel configuration is complete, you can build TFLite as follows: +|Question|Input| +|---|---| +|Please specify the location of python. [Default is $WORKSPACE/bin/python3]:| Enter (default) | +|Please input the desired Python library path to use[$WORKSPACE/lib/python3.10/site-packages] | Enter | +|Do you wish to build TensorFlow with ROCm support? [y/N]|N (No)| +|Do you wish to build TensorFlow with CUDA support?|N| +|Do you want to use Clang to build TensorFlow? [Y/n]|N| +|Would you like to interactively configure ./WORKSPACE for Android builds? [y/N]|y (Yes) | +|Please specify the home path of the Android NDK to use. [Default is /home/user/Android/Sdk/ndk-bundle]| Enter | +|Please specify the (min) Android NDK API level to use. [Default is 21] | 27 | +|Please specify the home path of the Android SDK to use. [Default is /home/user/Android/Sdk]| Enter | +|Please specify the Android SDK API level to use. [Default is 35]| Enter | +|Please specify an Android build tools version to use. [Default is 35.0.0]| Enter | +|Do you wish to build TensorFlow with iOS support? [y/N]:| n | + +Once the Bazel configuration is complete, you can build TFLite as follows: + ```console bazel build -c opt --config android_arm64 //tensorflow/lite:libtensorflowlite.so \ --define tflite_with_xnnpack=true \ @@ -77,7 +73,15 @@ bazel build -c opt --config android_arm64 //tensorflow/lite:libtensorflowlite.so --define tflite_with_xnnpack_qu8=true ``` -This will produce a `libtensorflowlite.so` shared library for android with XNNPack enabled, which we will use to build the example next. +The final step is to build flatbuffers used by the application: +``` +cd $WORKSPACE/tensorflow_src +mkdir flatc-native-build && cd flatc-native-build +cmake ../tensorflow/lite/tools/cmake/native_tools/flatbuffers +cmake --build . +``` + +Now that LiteRT and FlatBuffers are built, you're ready to compile and deploy the Stable Audio Open Small inference application on your Android device. diff --git a/content/learning-paths/mobile-graphics-and-gaming/run-stable-audio-open-small-with-lite-rt/5-creating-simple-program.md b/content/learning-paths/mobile-graphics-and-gaming/run-stable-audio-open-small-with-lite-rt/5-creating-simple-program.md index e60cd8027e..2696b39b8a 100644 --- a/content/learning-paths/mobile-graphics-and-gaming/run-stable-audio-open-small-with-lite-rt/5-creating-simple-program.md +++ b/content/learning-paths/mobile-graphics-and-gaming/run-stable-audio-open-small-with-lite-rt/5-creating-simple-program.md @@ -8,15 +8,17 @@ layout: learningpathall ## Create and build a simple program -You'll now build a simple program that runs inference on all three submodules directly on an Android device. +As a final step, you’ll build a simple program that runs inference on all three submodules directly on an Android device. The program takes a text prompt as input and generates an audio file as output. + ```bash -cd $WORKSPACE/audio-stale-open-litert/app +cd $WORKSPACE/ML-examples/kleidiai-examples/audiogen/app mkdir build && cd build ``` -Ensure the NDK path is set correctly and build with cmake: +Ensure the NDK path is set correctly and build with `cmake`: + ```bash cmake -DCMAKE_TOOLCHAIN_FILE=$NDK_PATH/build/cmake/android.toolchain.cmake \ -DCMAKE_POLICY_VERSION_MINIMUM=3.5 \ @@ -26,34 +28,39 @@ cmake -DCMAKE_TOOLCHAIN_FILE=$NDK_PATH/build/cmake/android.toolchain.cmake \ -DFLATBUFFER_INCLUDE_PATH=$TF_SRC_PATH/flatc-native-build/flatbuffers/include \ .. -cmake --build . -j1 +make -j ``` +After the example application builds successfully, a binary file named `audiogen` is created. + +A SentencePiece model is a type of subword tokenizer which is used by the audiogen application, you’ll need to download the *spiece.model* file from: -Since the tokenizer used in the audiogen application is based on SentencePiece, you’ll need to download the spiece.model file from: ```bash -https://huggingface.co/google-t5/t5-base/tree/main +cd $WORKSPACE +wget https://huggingface.co/google-t5/t5-base/tree/main ``` -we will save this model in `WORKSPACE` for ease of access. + +Verify this model was downloaded to your `WORKSPACE`. + ```text -cp spiece.moel $WORKSPACE +ls $WORKSPACE/spiece.model ``` -After the SAO example builds successfully, a binary file named `audiogen_main` is created. -Now use adb (Android Debug Bridge) to push the necessary files to the device: +Connect your Android device to your development machine using a cable. adb (Android Debug Bridge) is available as part of the Android SDK. + +You should see your device listed when you run the following command: ```bash -adb shell +adb devices ``` -Create a directory for all the required resources: -```bash -cd /data/local/tmp -mkdir audiogen -exit +```output + device ``` -Push all necessary files into the `audiogen` folder on Android: + +Note that you may have to approve the connection on your phone for this to work. Now, use `adb` to push all necessary files into the `audiogen` folder on Android device: + ```bash -cd $WORKSPACE/audio-stale-open-litert/app/build +cd $WORKSPACE/ML-examples/kleidiai-examples/audiogen/app/build adb shell mkdir -p /data/local/tmp/app adb push audiogen /data/local/tmp/app adb push $LITERT_MODELS_PATH/conditioners_float32.tflite /data/local/tmp/app @@ -63,14 +70,24 @@ adb push $WORKSPACE/spiece.model /data/local/tmp/app adb push ${TF_SRC_PATH}/bazel-bin/tensorflow/lite/libtensorflowlite.so /data/local/tmp/app ``` -Finally, run the program on your Android device: -``` +Start a new shell to access the device's system from your development machine: + +```bash adb shell +``` + +Finally, run the program on your Android device. Play around with the advice from [Download the model](../2-testing-model) section. + +```bash cd /data/local/tmp/app LD_LIBRARY_PATH=. ./audiogen . "warm arpeggios on house beats 120BPM with drums effect" 4 +exit ``` -The successful execution of the app will create `output.wav` of your chosen audio defined by the prompt, you can pull it back to your host machine and enjoy! +You can now pull the generated `output.wav` back to your host machine and listen to the result. + ```bash adb pull /data/local/tmp/app/output.wav ``` + +You should now have gained hands-on experience running the Stable Audio Open Small model with LiteRT on Arm-based devices. This includes setting up the environment, optimizing the model for on-device inference, and understanding how efficient runtimes like LiteRT make low-latency generative AI possible at the edge. You’re now better equipped to explore and deploy AI-powered audio applications on mobile and embedded platforms. \ No newline at end of file diff --git a/content/learning-paths/mobile-graphics-and-gaming/run-stable-audio-open-small-with-lite-rt/_index.md b/content/learning-paths/mobile-graphics-and-gaming/run-stable-audio-open-small-with-lite-rt/_index.md index 5ddfad3b6d..f20be9c201 100644 --- a/content/learning-paths/mobile-graphics-and-gaming/run-stable-audio-open-small-with-lite-rt/_index.md +++ b/content/learning-paths/mobile-graphics-and-gaming/run-stable-audio-open-small-with-lite-rt/_index.md @@ -6,12 +6,12 @@ minutes_to_complete: 30 who_is_this_for: This is an introductory topic for developers looking to deploy the Stable Audio Open Small text-to-audio model using LiteRT on an Android device. learning_objectives: - - Deploy the Stable Audio Open Small model on Android using LiteRT. - - Build a simple program to generate audio. - - Compile the application and for an Arm CPU. + - Download and test the Stable Audio Open Small model. + - Convert the Stable Audio Open Small model to the LiteRT (.tflite) format. + - Compile the application for an Arm CPU. + - Create a simple application that generates audio. - Run the application on an Android smartphone and generate an audio snippet. - prerequisites: - A Linux-based x86 or macOS development machine with at least 8 GB of RAM (tested on Ubuntu 20.04.4 LTS with x86_64). - A [HuggingFace](https://huggingface.co/) account. @@ -38,9 +38,9 @@ operatingsystems: further_reading: - resource: - title: Introducing Stable Audio 2.0 - link: https://stability.ai/news/stable-audio-2-0 - type: documentation + title: Stability AI and Arm Collaborate to Release Stable Audio Open Small, Enabling Real-World Deployment for On-Device Audio Generation + link: https://stability.ai/news/stability-ai-and-arm-release-stable-audio-open-small-enabling-real-world-deployment-for-on-device-audio-control + type: blog - resource: title: Stability AI optimized its audio generation model to run on Arm chips link: https://techcrunch.com/2025/03/03/stability-ai-optimized-its-audio-generation-model-to-run-on-arm-chips/ diff --git a/content/learning-paths/mobile-graphics-and-gaming/vision-llm-inference-on-android-with-kleidiai-and-mnn/1-devenv-and-model.md b/content/learning-paths/mobile-graphics-and-gaming/vision-llm-inference-on-android-with-kleidiai-and-mnn/1-devenv-and-model.md index f78efcf40f..22fa2701a4 100644 --- a/content/learning-paths/mobile-graphics-and-gaming/vision-llm-inference-on-android-with-kleidiai-and-mnn/1-devenv-and-model.md +++ b/content/learning-paths/mobile-graphics-and-gaming/vision-llm-inference-on-android-with-kleidiai-and-mnn/1-devenv-and-model.md @@ -1,5 +1,5 @@ --- -title: Build the MNN Android Demo with GUI +title: Environment setup and prepare model weight: 3 ### FIXED, DO NOT MODIFY @@ -9,7 +9,7 @@ layout: learningpathall In this section, you'll set up your development environment by installing dependencies and preparing the Qwen vision model. -Install the Android NDK (Native Development Kit) and git-lfs. This Learning Path was tested with NDK version `28.0.12916984` and CMake version `3.31.6`. +Install the Android NDK (Native Development Kit) and git-lfs. This Learning Path was tested with NDK version `28.0.12916984` and CMake version `4.0.0-rc1`. For Ubuntu or Debian systems, install CMake and git-lfs with the following commands: @@ -18,9 +18,9 @@ sudo apt update sudo apt install cmake git-lfs -y ``` -You can use Android Studio to obtain the NDK. +You can use Android Studio to obtain the NDK. -Click **Tools > SDK Manager** and navigate to the **SDK Tools** tab. +Click **Tools > SDK Manager** and navigate to the **SDK Tools** tab. Select the **NDK (Side by side)** and **CMake** checkboxes, as shown below: @@ -48,7 +48,7 @@ If Python 3.x is not the default version, try running `python3 --version` and `p ## Set up Phone Connection -You need to set up an authorized connection with your phone. The Android SDK Platform Tools package, included with Android Studio, provides Android Debug Bridge (ADB) for transferring files. +You need to set up an authorized connection with your phone. The Android SDK Platform Tools package, included with Android Studio, provides Android Debug Bridge (ADB) for transferring files. Connect your phone to your computer using a USB cable, and enable USB debugging on your phone. To do this, tap the **Build Number** in your **Settings** app 7 times, then enable **USB debugging** in **Developer Options**. @@ -65,9 +65,18 @@ List of devices attached device ``` -## Download and Convert the Model +## Download the quantized Model -The following commands download the model from Hugging Face, and clone a tool for exporting the LLM model to the MNN framework. +The pre-quantized model is available in Hugging Face, you can download with the following command: + +```bash +git lfs install +git clone https://huggingface.co/taobao-mnn/Qwen2.5-VL-3B-Instruct-MNN +git checkout 9057334b3f85a7f106826c2fa8e57c1aee727b53 +``` + +## (Optional) Download and Convert the Model +If you need to quantize the model with customized parameter, the following commands download the model from Hugging Face, and clone a tool for exporting the LLM model to the MNN framework. ```bash cd $HOME @@ -95,11 +104,13 @@ To learn more about the parameters, see the [transformers README.md](https://git Verify that the model was built correctly by checking that the `Qwen2-VL-2B-Instruct-convert-4bit-per_channel` directory is at least 1 GB in size. +## Push the model to Android device + Push the model onto the device: ```shell adb shell mkdir /data/local/tmp/models/ -adb push Qwen2-VL-2B-Instruct-convert-4bit-per_channel /data/local/tmp/models +adb push Qwen2.5-VL-3B-Instruct-MNN /data/local/tmp/models ``` -With the model set up, you're ready to use Android Studio to build and run an example application. +With the model set up, you're ready to build and run an example application. diff --git a/content/learning-paths/mobile-graphics-and-gaming/vision-llm-inference-on-android-with-kleidiai-and-mnn/3-benchmark.md b/content/learning-paths/mobile-graphics-and-gaming/vision-llm-inference-on-android-with-kleidiai-and-mnn/2-benchmark.md similarity index 75% rename from content/learning-paths/mobile-graphics-and-gaming/vision-llm-inference-on-android-with-kleidiai-and-mnn/3-benchmark.md rename to content/learning-paths/mobile-graphics-and-gaming/vision-llm-inference-on-android-with-kleidiai-and-mnn/2-benchmark.md index 658deb17ab..b831ee28bf 100644 --- a/content/learning-paths/mobile-graphics-and-gaming/vision-llm-inference-on-android-with-kleidiai-and-mnn/3-benchmark.md +++ b/content/learning-paths/mobile-graphics-and-gaming/vision-llm-inference-on-android-with-kleidiai-and-mnn/2-benchmark.md @@ -1,15 +1,15 @@ --- title: Build the MNN Command-line ViT Demo -weight: 5 +weight: 4 ### FIXED, DO NOT MODIFY layout: learningpathall --- ## Prepare an Example Image -In this section, you'll benchmark model performance with and without KleidiAI kernels. To run optimized inference, you'll first need to compile the required library files. You'll also need an example image to run command-line prompts. +In this section, you'll benchmark model performance with and without KleidiAI kernels. To run optimized inference, you'll first need to compile the required library files. You'll also need an example image to run command-line prompts. -You can use the provided image of the tiger below that this Learning Path uses, or choose your own. +You can use the provided image of the tiger below that this Learning Path uses, or choose your own. Whichever you select, rename the image to `example.png` to use the commands in the following sections. @@ -23,24 +23,30 @@ adb push example.png /data/local/tmp/ ## Build Binaries for Command-line Inference -Navigate to the Vision Language Models project that you cloned in the previous section. +Run the following commands to clone the MNN repository and checkout the source tree: + +```bash +cd $HOME +git clone https://github.com/alibaba/MNN.git +cd MNN +git checkout 282cebeb785118865b9c903decc4b5cd98d5025e +``` + +Create a build directory and run the build script. The first time that you do this, build the binaries with the `-DMNN_KLEIDIAI` flag set to `FALSE`. ```bash -cmake ./vit/ -B build \ --DCMAKE_TOOLCHAIN_FILE=$ANDROID_NDK/build/cmake/android.toolchain.cmake \ --DCMAKE_BUILD_TYPE=Release \ --DANDROID_ABI="arm64-v8a" \ --DANDROID_STL=c++_static \ --DANDROID_NATIVE_API_LEVEL=android-21 \ --DMNN_BUILD_OPENCV=true \ --DMNN_IMGCODECS=true \ --DMNN_KLEIDIAI=false -cmake --build ./build +cd $HOME/MNN/project/android +mkdir build_64 && cd build_64 + +../build_64.sh "-DMNN_LOW_MEMORY=true -DLLM_SUPPORT_VISION=true -DMNN_KLEIDIAI=FALSE \ + -DMNN_CPU_WEIGHT_DEQUANT_GEMM=true -DMNN_BUILD_LLM=true \ + -DMNN_SUPPORT_TRANSFORMER_FUSE=true -DMNN_ARM82=true -DMNN_OPENCL=true \ + -DMNN_USE_LOGCAT=true -DMNN_IMGCODECS=true -DMNN_BUILD_OPENCV=true" ``` {{% notice Note %}} -If your NDK toolchain isn't set up correctly, you might run into issues with the above script. Make a note of where the NDK was installed - this will be a directory named after the version you downloaded earlier. Try exporting the following environment variables before re-running above commands: +If your NDK toolchain isn't set up correctly, you might run into issues with the above script. Make a note of where the NDK was installed - this will be a directory named after the version you downloaded earlier. Try exporting the following environment variables before re-running `build_64.sh`: ```bash export ANDROID_NDK_HOME=/ndk/28.0.12916984 @@ -55,7 +61,7 @@ export ANDROID_NDK=$ANDROID_NDK_HOME Push the required files to your Android device, then enter a shell on the device using ADB: ```bash -adb push build/bin/vision_llm build/lib/*.so /data/local/tmp +adb push *so llm_demo tools/cv/*so /data/local/tmp/ adb shell ``` @@ -63,7 +69,7 @@ Run the following commands in the ADB shell. Navigate to the directory you pushe ```bash cd /data/local/tmp/ -chmod +x vision_llm +chmod +x llm_demo export LD_LIBRARY_PATH=$PWD echo "./example.pngDescribe the content of the image." > prompt ``` @@ -71,7 +77,7 @@ echo "./example.pngDescribe the content of the image." > prompt Finally, run an inference on the model with the following command: ```bash -./vision_llm models/Qwen-VL-2B-convert-4bit-per_channel/config.json prompt +./llm_demo models/Qwen2.5-VL-3B-Instruct-MNN/config.json prompt ``` If the launch is successful, you should see the following output, with the performance benchmark at the end: @@ -96,28 +102,22 @@ prefill speed = 192.28 tok/s ## Enable KleidiAI and Re-run Inference -The next step is to re-generate the binaries with KleidiAI activated. This is done by updating the flag `-DMNN_KLEIDIAI` to `TRUE`. +The next step is to re-generate the binaries with KleidiAI activated. This is done by updating the flag `-DMNN_KLEIDIAI` to `TRUE`. -From the `build` directory, run: +From the `build_64` directory, run: ```bash -cmake ./vit/ -B build \ --DCMAKE_TOOLCHAIN_FILE=$ANDROID_NDK/build/cmake/android.toolchain.cmake \ --DCMAKE_BUILD_TYPE=Release \ --DANDROID_ABI="arm64-v8a" \ --DANDROID_STL=c++_static \ --DANDROID_NATIVE_API_LEVEL=android-21 \ --DMNN_BUILD_OPENCV=true \ --DMNN_IMGCODECS=true \ --DMNN_KLEIDIAI=false -cmake --build ./build +../build_64.sh "-DMNN_LOW_MEMORY=true -DLLM_SUPPORT_VISION=true -DMNN_KLEIDIAI=TRUE \ +-DMNN_CPU_WEIGHT_DEQUANT_GEMM=true -DMNN_BUILD_LLM=true \ +-DMNN_SUPPORT_TRANSFORMER_FUSE=true -DMNN_ARM82=true -DMNN_OPENCL=true \ +-DMNN_USE_LOGCAT=true -DMNN_IMGCODECS=true -DMNN_BUILD_OPENCV=true" ``` ## Update Files on the Device First, remove existing binaries from your Android device, then push the updated files: ```bash -adb shell "cd /data/local/tmp; rm -rf *so vision_llm" -adb push build/bin/vision_llm build/lib/*.so /data/local/tmp +adb shell "cd /data/local/tmp; rm -rf *so llm_demo tools/cv/*so" +adb push *so llm_demo tools/cv/*so /data/local/tmp/ adb shell ``` @@ -127,7 +127,7 @@ With the new ADB shell, run the following commands: cd /data/local/tmp/ chmod +x llm_demo export LD_LIBRARY_PATH=$PWD -./llm_demo models/Qwen-VL-2B-convert-4bit-per_channel/config.json prompt +./llm_demo models/Qwen2.5-VL-3B-Instruct-MNN/config.json prompt ``` ## Benchmark Results @@ -154,7 +154,7 @@ This time, you should see an improvement in the benchmark. Below is an example t | Prefill Speed | 192.28 tok/s | 266.13 tok/s | | Decode Speed | 34.73 tok/s | 44.96 tok/s | -**Prefill speed** describes how fast the model processes the input prompt. +**Prefill speed** describes how fast the model processes the input prompt. **Decode Speed** indicates how quickly the model generates new tokens after the input is processed. diff --git a/content/learning-paths/mobile-graphics-and-gaming/vision-llm-inference-on-android-with-kleidiai-and-mnn/2-generate-apk.md b/content/learning-paths/mobile-graphics-and-gaming/vision-llm-inference-on-android-with-kleidiai-and-mnn/2-generate-apk.md deleted file mode 100644 index 3663e412c0..0000000000 --- a/content/learning-paths/mobile-graphics-and-gaming/vision-llm-inference-on-android-with-kleidiai-and-mnn/2-generate-apk.md +++ /dev/null @@ -1,53 +0,0 @@ ---- -title: Benchmark the Vision Transformer performance with KleidiAI -weight: 4 - -### FIXED, DO NOT MODIFY -layout: learningpathall ---- - -## Clone Vision Language Models repo - -In this section, you will run the Qwen model in action using a demo application using a Android Package Kit (APK). - -A repository is set up to enable building the app as an Android Studio project. It uses MNN as an inference engine which is integrated with KleidiAI. Run the following commands to clone the repository and checkout the source tree: - -```bash -cd $HOME -git clone https://gitlab.arm.com/kleidi/kleidi-examples/vision-language-models -cd vision-language-models -``` - -## Build the App Using Android Studio - -You can use Android Studio to build the app and create an APK. - -### Open project and build - -Open Android Studio. - -Go to **File > Open**. - -Navigate into the cloned `vision-language-models`, and click `Open`. - -This triggers a build of the project, and you should see output similar to the following on completion: - -```output -BUILD SUCCESSFUL in 1m 42s -``` - -### Generate and Run the APK - -Navigate to **Build > Generate App Bundles or APKs**. Select **Generate APKs**. - -The build will be executed, and then the app will be copied and installed on the Android device. - -After opening the app, you will see the splash screen: - -![Loading screenshot](Loading_page.png) - -Finally, you can use the UI to chat with the app. Try uploading an image and ask a question on it. - -![Loading screenshot](chat2.png) - -The final step is to examine how KleidiAI can improve the performance of the model. Continue to the next section to find out. \ No newline at end of file