diff --git a/content/learning-paths/mobile-graphics-and-gaming/Vision-LLM-inference-on-Android-with-KleidiAI-and-MNN/1-devenv-and-model.md b/content/learning-paths/mobile-graphics-and-gaming/Vision-LLM-inference-on-Android-with-KleidiAI-and-MNN/1-devenv-and-model.md index fc3a588cb0..04d20f61de 100644 --- a/content/learning-paths/mobile-graphics-and-gaming/Vision-LLM-inference-on-Android-with-KleidiAI-and-MNN/1-devenv-and-model.md +++ b/content/learning-paths/mobile-graphics-and-gaming/Vision-LLM-inference-on-Android-with-KleidiAI-and-MNN/1-devenv-and-model.md @@ -5,27 +5,30 @@ weight: 3 ### FIXED, DO NOT MODIFY layout: learningpathall --- +## Install Required Software -In this section, you will set up a development environment by installing dependencies and preparing the Qwen vision model. +In this section, you'll set up your development environment by installing dependencies and preparing the Qwen vision model. -## Install required software +Install the Android NDK (Native Development Kit) and git-lfs. This Learning Path was tested with NDK version `28.0.12916984` and CMake version `4.0.0-rc1`. -Install the Android NDK (Native Development Kit) and git-lfs. This learning path was tested with NDK version `28.0.12916984` and CMake version `4.0.0-rc1`. - -For Ubuntu or Debian systems, you can install CMake and git-lfs with the following command: +For Ubuntu or Debian systems, install CMake and git-lfs with the following commands: ```bash sudo apt update sudo apt install cmake git-lfs -y ``` -You can use Android Studio to obtain the NDK. Click **Tools > SDK Manager**, and navigate to the the SDK Tools tab. Select the NDK (Side by side) and CMake checkboxes, as shown below: +Alternatively, you can use Android Studio to obtain the NDK. + +Click **Tools > SDK Manager** and navigate to the **SDK Tools** tab. + +Select the **NDK (Side by side)** and **CMake** checkboxes, as shown below: ![Install NDK](./install_ndk.png) -Refer to [Install NDK and CMake](https://developer.android.com/studio/projects/install-ndk) for other installation methods. +See [Install NDK and CMake](https://developer.android.com/studio/projects/install-ndk) for other installation methods. -Make sure Python and pip is installed by verifying a version is printed on running this command: +Ensure that Python and pip are installed by verifying the version with these commands: ```bash python --version @@ -33,25 +36,26 @@ pip --version ``` {{% notice Note %}} -The above commands may fail when Python is installed if Python 3.x is not the default version. You can try running `python3 --version` and `pip3 --version` to be sure. +If Python 3.x is not the default version, try running `python3 --version` and `pip3 --version`. {{% /notice %}} -## Set up phone connection +## Set up Phone Connection -You will need to set up an authorized connection with your phone. The Android SDK Platform Tools package, included in Android Studio, comes with Android Debug Bridge (ADB). You will use this tool to transfer files later on. +You need to set up an authorized connection with your phone. The Android SDK Platform Tools package, included with Android Studio, provides Android Debug Bridge (ADB) for transferring files. -Connect your phone to the computer using a USB cable. You will need to activate USB debugging on your phone. Find the **Build Number** in your **Settings** app and tap it 7 times. Then, enable **USB debugging** in **Developer Options**. +Connect your phone to your computer using a USB cable, and enable USB debugging on your phone. To do this, tap the **Build Number** in your **Settings** app 7 times, then enable **USB debugging** in **Developer Options**. -You should now see your device listed upon running `adb devices`: +Verify the connection by running `adb devices`: ```output List of devices attached device ``` +You should see your device listed. -## Download and convert the model +## Download and Convert the Model -The following commands download the model from Hugging Face, and clones a tool for exporting LLM model to the MNN framework. +The following commands download the model from Hugging Face, and clone a tool for exporting the LLM model to the MNN framework. ```bash cd $HOME @@ -60,8 +64,7 @@ huggingface-cli download Qwen/Qwen2-VL-2B-Instruct --local-dir ./Qwen2-VL-2B-Ins git clone https://github.com/wangzhaode/llm-export cd llm-export && pip install . ``` - -You can use the `llm-export` repository to quantize the model with the following options: +Use the `llm-export` repository to quantize the model with these options: ```bash llmexport --path ../Qwen2-VL-2B-Instruct/ --export mnn --quant_bit 4 \ @@ -72,13 +75,13 @@ The table below gives you an explanation of the different arguments: | Parameter | Description | Explanation | |------------------|-------------|--------------| -| `--quant_bit` | mnn quant bit, 4 or 8, default is 4 | `4` represents q4 quantization. | -| `--quant_block` | mnn quant block, default is 0 | `0` represents per-channel quantization, `128` represents 128 per-block quantization. | -| `--sym` | symmetric quantization (without zeropoint), defualt is False. | The quantization parameter that enables symmetrical quantization. | +| `--quant_bit` | MNN quant bit, 4 or 8, default is 4. | `4` represents q4 quantization. | +| `--quant_block` | MNN quant block, default is 0. | `0` represents per-channel quantization; `128` represents 128 per-block quantization. | +| `--sym` | Symmetric quantization (without zeropoint); default is False. | The quantization parameter that enables symmetrical quantization. | -To learn more about the parameters, refer to the [transformers README.md](https://github.com/alibaba/MNN/tree/master/transformers). +To learn more about the parameters, see the [transformers README.md](https://github.com/alibaba/MNN/tree/master/transformers). -Verify the model is built correct by checking the size of the resulting model. The `Qwen2-VL-2B-Instruct-convert-4bit-per_channel` directory should be at least 1 GB in size. +Verify that the model was built correctly by checking that the `Qwen2-VL-2B-Instruct-convert-4bit-per_channel` directory is at least 1 GB in size. Push the model onto the device: @@ -87,4 +90,4 @@ adb shell mkdir /data/local/tmp/models/ adb push Qwen2-VL-2B-Instruct-convert-4bit-per_channel /data/local/tmp/models ``` -With the model set up, it's time to use Android Studio to build and run an example application. +With the model set up, you're ready to use Android Studio to build and run an example application. diff --git a/content/learning-paths/mobile-graphics-and-gaming/Vision-LLM-inference-on-Android-with-KleidiAI-and-MNN/2-generate-apk.md b/content/learning-paths/mobile-graphics-and-gaming/Vision-LLM-inference-on-Android-with-KleidiAI-and-MNN/2-generate-apk.md index 7bce012711..2fb3358634 100644 --- a/content/learning-paths/mobile-graphics-and-gaming/Vision-LLM-inference-on-Android-with-KleidiAI-and-MNN/2-generate-apk.md +++ b/content/learning-paths/mobile-graphics-and-gaming/Vision-LLM-inference-on-Android-with-KleidiAI-and-MNN/2-generate-apk.md @@ -6,11 +6,13 @@ weight: 4 layout: learningpathall --- -In this section, you will try the Qwen model in action using a demo application using a Android Package Kit (APK) - ## Clone MNN repo -A fork of the upstream MNN repository is set up to enable building the app as an Android Studio project. Run the following to clone the repository and checkout the source tree: +In this section, you will run the Qwen model in action using a demo application using a Android Package Kit (APK). + +A fork of the upstream MNN repository is set up to enable building the app as an Android Studio project. + +Run the following commands to clone the repository and checkout the source tree: ```bash cd $HOME @@ -19,19 +21,23 @@ cd MNN git checkout origin/llm_android_demo ``` -## Build the app using Android Studio +## Build the App Using Android Studio ### Open project and build -Open Android Studio. Go to **File > Open**. Navigate to the MNN repository you just cloned. Expand the `transformers/llm/engine/` directories, select the `android` one and click `Open`. +Open Android Studio. + +Go to **File > Open**. + +Navigate to the cloned MNN repository, expand the `transformers/llm/engine/` directories, select the `android` directory, and click `Open`. -This will trigger a build of the project, and you should see a similar output on completion: +This triggers a build of the project, and you should see output similar to the following on completion: ```output BUILD SUCCESSFUL in 1m 42s ``` -### Generate and run the APK +### Generate and Run the APK Navigate to **Build > Generate App Bundles or APKs**. Select **Generate APKs**. diff --git a/content/learning-paths/mobile-graphics-and-gaming/Vision-LLM-inference-on-Android-with-KleidiAI-and-MNN/3-benchmark.md b/content/learning-paths/mobile-graphics-and-gaming/Vision-LLM-inference-on-Android-with-KleidiAI-and-MNN/3-benchmark.md index 075157b721..399f673f94 100644 --- a/content/learning-paths/mobile-graphics-and-gaming/Vision-LLM-inference-on-Android-with-KleidiAI-and-MNN/3-benchmark.md +++ b/content/learning-paths/mobile-graphics-and-gaming/Vision-LLM-inference-on-Android-with-KleidiAI-and-MNN/3-benchmark.md @@ -5,12 +5,13 @@ weight: 5 ### FIXED, DO NOT MODIFY layout: learningpathall --- +## Prepare an Example Image -In this section, you will use the model to benchmark performance with and without KleidiAI kernels. You will need to compile library files to run the optimized inference. +In this section, you'll benchmark model performance with and without KleidiAI kernels. To run optimized inference, you'll first need to compile the required library files. You'll also need an example image to run command-line prompts. -## Prepare an example image +You can use the provided image of the tiger below that this Learning Path uses, or choose your own. -You will use an image to run a command-line prompt. In this learning path, the tiger below will be used as an example. You can save this image or provide one of your own. Re-name the image to `example.png` in order to use the commands in the following sections. +Whichever you select, rename the image to `example.png` to use the commands in the following sections. ![example image](example.png) @@ -20,9 +21,13 @@ Use ADB to load the image onto your phone: adb push example.png /data/local/tmp/ ``` -## Build binaries for command-line inference +## Build Binaries for Command-line Inference -Navigate to the MNN project you cloned in the previous section. Create a build directory and run the script. The first time, you will build the binaries with the `-DMNN_KLEIDIAI` flag set to `FALSE`. +Navigate to the MNN project that you cloned in the previous section. + +Create a build directory and run the build script. + +The first time that you do this, build the binaries with the `-DMNN_KLEIDIAI` flag set to `FALSE`. ```bash cd $HOME/MNN/project/android @@ -34,7 +39,7 @@ mkdir build_64 && cd build_64 -DMNN_USE_LOGCAT=true -DMNN_IMGCODECS=true -DMNN_BUILD_OPENCV=true" ``` {{% notice Note %}} -If your NDK toolchain isn't set up correctly, you may run into issues with the above script. Make note of where the NDK was installed - this will be a directory named after the version you downloaded earlier. Try exporting the following environment variables before re-running `build_64.sh`. +If your NDK toolchain isn't set up correctly, you might run into issues with the above script. Make a note of where the NDK was installed - this will be a directory named after the version you downloaded earlier. Try exporting the following environment variables before re-running `build_64.sh`: ```bash export ANDROID_NDK_HOME=/ndk/28.0.12916984 @@ -44,14 +49,16 @@ export ANDROID_NDK=$ANDROID_NDK_HOME ``` {{% /notice %}} -Push the files to your mobile device. Then, enter a shell on the phone using ADB. +## Push Files and Run Inference via ADB + +Push the required files to your Android device, then enter a shell on the device using ADB: ```bash adb push *so llm_demo tools/cv/*so /data/local/tmp/ adb shell ``` -The following commands should be run in the ADB shell. Navigate to the directory you pushed the files to, add executable permissions to the `llm_demo` file and export an environment variable for it to run properly. After this, use the example image you transferred earlier to create a file containing the text content for the prompt. +Run the following commands in the ADB shell. Navigate to the directory you pushed the files to, add executable permissions to the `llm_demo` file and export an environment variable for it to run properly. After this, use the example image you transferred earlier to create a file containing the text content for the prompt. ```bash cd /data/local/tmp/ @@ -60,13 +67,13 @@ export LD_LIBRARY_PATH=$PWD echo "./example.pngDescribe the content of the image." > prompt ``` -Finally, run an inference on the model with the following command. +Finally, run an inference on the model with the following command: ```bash ./llm_demo models/Qwen-VL-2B-convert-4bit-per_channel/config.json prompt ``` -If the launch is successful, you should see the following output, with the performance benchmark at the end. +If the launch is successful, you should see the following output, with the performance benchmark at the end: ```output config path is models/Qwen-VL-2B-convert-4bit-per_channel/config.json @@ -86,17 +93,20 @@ prefill speed = 192.28 tok/s ################################## ``` -## Enable KleidiAI and re-run inference +## Enable KleidiAI and Re-run Inference -The next step is to re-generate the binaries with KleidiAI activated. This is done by updating the flag `-DMNN_KLEIDIAI` to `TRUE`. From the `build_64` directory, run: +The next step is to re-generate the binaries with KleidiAI activated. This is done by updating the flag `-DMNN_KLEIDIAI` to `TRUE`. + +From the `build_64` directory, run: ```bash ../build_64.sh "-DMNN_LOW_MEMORY=true -DLLM_SUPPORT_VISION=true -DMNN_KLEIDIAI=TRUE \ -DMNN_CPU_WEIGHT_DEQUANT_GEMM=true -DMNN_BUILD_LLM=true \ -DMNN_SUPPORT_TRANSFORMER_FUSE=true -DMNN_ARM82=true -DMNN_OPENCL=true \ -DMNN_USE_LOGCAT=true -DMNN_IMGCODECS=true -DMNN_BUILD_OPENCV=true" ``` +## Update Files on the Device -The next step is to update the files on your phone. Start by removing the ones used in the previous step. Then, push the new ones with the same command as before. +First, remove existing binaries from your Android device, then push the updated files: ```bash adb shell "cd /data/local/tmp; rm -rf *so llm_demo tools/cv/*so" @@ -104,7 +114,7 @@ adb push *so llm_demo tools/cv/*so /data/local/tmp/ adb shell ``` -In the new ADB shell, preform the same steps as in the previous section. +With the new ADB shell, run the following commands: ```bash cd /data/local/tmp/ @@ -112,8 +122,10 @@ chmod +x llm_demo export LD_LIBRARY_PATH=$PWD ./llm_demo models/Qwen-VL-2B-convert-4bit-per_channel/config.json prompt ``` +## Benchmark Results + +After running with KleidiAI enabled, you should see improved benchmarks. Example results: -The same output should be displayed, with the benchmark printed at the end: ```output ################################# prompt tokens num = 243 @@ -127,7 +139,7 @@ prefill speed = 266.13 tok/s ################################## ``` -This time, you should see an improvement in the benchmark. Below is an example table showing the uplift on three relevant metrics after enabling the KleidiAI kernels. +This time, you should see an improvement in the benchmark. Below is an example table showing the uplift on three relevant metrics after enabling the KleidiAI kernels: | Benchmark | Without KleidiAI | With KleidiAI | |---------------------|------------------|---------------| @@ -135,6 +147,8 @@ This time, you should see an improvement in the benchmark. Below is an example t | Prefill Speed | 192.28 tok/s | 266.13 tok/s | | Decode Speed | 34.73 tok/s | 44.96 tok/s | -The prefill speed describes how fast the model processes the input prompt. The decode speed corresponds to the rate at which the model generates new tokens after the input is processed +**Prefill speed** describes how fast the model processes the input prompt. + +**Decode Speed** indicates how quickly the model generates new tokens after the input is processed. -This shows the advantages of using Arm optimized kernels for your ViT use-cases. \ No newline at end of file +These benchmarks clearly demonstrate the performance advantages of using Arm-optimized KleidiAI kernels for vision transformer (ViT) workloads. \ No newline at end of file diff --git a/content/learning-paths/mobile-graphics-and-gaming/Vision-LLM-inference-on-Android-with-KleidiAI-and-MNN/_index.md b/content/learning-paths/mobile-graphics-and-gaming/Vision-LLM-inference-on-Android-with-KleidiAI-and-MNN/_index.md index 9fcd800938..e6e6541ea7 100644 --- a/content/learning-paths/mobile-graphics-and-gaming/Vision-LLM-inference-on-Android-with-KleidiAI-and-MNN/_index.md +++ b/content/learning-paths/mobile-graphics-and-gaming/Vision-LLM-inference-on-Android-with-KleidiAI-and-MNN/_index.md @@ -3,18 +3,18 @@ title: Vision LLM inference on Android with KleidiAI and MNN minutes_to_complete: 30 -who_is_this_for: This learning path is for developers who want to run Vision Transformers (ViT) efficiently on an Android device. +who_is_this_for: This Learning Path is for developers who want to run Vision Transformers (ViT) efficiently on Android. learning_objectives: - - Download the a Vision Large Language Model (LLM) from Hugging Face. + - Download a Vision Large Language Model (LLM) from Hugging Face. - Convert the model to the Mobile Neural Network (MNN) framework. - - Install an Android demo application with the model to run an inference. - - Compare model inference performance with and without KleidiAI Arm optimized micro-kernels. + - Install an Android demo application using the model to run an inference. + - Compare inference performance with and without KleidiAI Arm-optimized micro-kernels. prerequisites: - A development machine with [Android Studio](https://developer.android.com/studio) installed. - - A 64-bit Arm powered smartphone running Android with `i8mm` and `dotprod` supported. + - A 64-bit Arm-powered smartphone running Android with support for `i8mm` and `dotprod`. supported. author: - Shuheng Deng @@ -36,7 +36,7 @@ operatingsystems: further_reading: - resource: - title: "MNN : A UNIVERSAL AND EFFICIENT INFERENCE ENGINE" + title: "MNN: A Universal and Efficient Inference Engine" link: https://arxiv.org/pdf/2002.12418 type: documentation - resource: diff --git a/content/learning-paths/mobile-graphics-and-gaming/Vision-LLM-inference-on-Android-with-KleidiAI-and-MNN/background.md b/content/learning-paths/mobile-graphics-and-gaming/Vision-LLM-inference-on-Android-with-KleidiAI-and-MNN/background.md index 5f2641fd28..acea511628 100644 --- a/content/learning-paths/mobile-graphics-and-gaming/Vision-LLM-inference-on-Android-with-KleidiAI-and-MNN/background.md +++ b/content/learning-paths/mobile-graphics-and-gaming/Vision-LLM-inference-on-Android-with-KleidiAI-and-MNN/background.md @@ -8,20 +8,21 @@ layout: learningpathall ## Mobile Neural Network (MNN) -MNN is a high-performance, lightweight deep learning framework designed for both inference and training. Optimized for on-device deployment, it delivers industry-leading efficiency across various applications. Currently, MNN is integrated into over 30 Alibaba Inc. apps, including Taobao, Tmall, Youku, DingTalk, and Xianyu. It powers more than 70 real-world scenarios, such as live streaming, short video processing, search recommendations, image-based product searches, and more. +MNN is a high-performance, lightweight deep learning framework designed for both inference and training. Optimized for on-device deployment, it delivers industry-leading efficiency across various applications. Currently, MNN is integrated into more than 30 Alibaba Inc. apps - including Taobao, Tmall, Youku, DingTalk, and Xianyu - and powers over 70 real-world scenarios such as live streaming, short video processing, search recommendations, and image-based product searches. -**MNN-LLM** is a large language model (LLM) runtime solution built on the MNN engine, designed to enable local deployment of LLMs across diverse platforms, including mobile devices, PCs, and IoT systems. It supports leading models such as Qianwen, Baichuan, Zhipu, and Llama, ensuring efficient and accessible AI-powered experiences. +**MNN-LLM** is a large language model (LLM) runtime solution built on the MNN engine. It enables local deployment of LLMs across diverse platforms, including mobile devices, PCs, and IoT systems, and supports leading models such as Qianwen, Baichuan, Zhipu, and Llama for efficient, accessible AI-powered experiences. -KleidiAI, a collection of optimized AI micro-kernels, is integrated into the MNN framework, enhancing the inference performance of large language models (LLMs) within MNN. The Android app in this learning path demonstrates Vision Transformer inference using the MNN framework. You will use KleidiAI to speed up inference for the [Qwen Vision 2B](https://huggingface.co/Qwen/Qwen2-VL-2B-Instruct) model. +KleidiAI, a collection of optimized AI micro-kernels, is integrated into the MNN framework to enhance the inference performance of LLMs. In this Learning Path, the Android app demonstrates Vision Transformer inference using the MNN framework. You will use KleidiAI to speed up inference for the [Qwen Vision 2B](https://huggingface.co/Qwen/Qwen2-VL-2B-Instruct) model. -## Vision Transformer(ViT) -The ViT is a deep learning model designed for image recognition tasks. Unlike traditional convolutional neural networks (CNNs), which process images using convolutional layers, ViT leverages the transformer architecture originally developed for natural language processing (NLP). -The ViT workflow integrates the following features: +## Vision Transformer (ViT) +The Vision Transformer (ViT) is a deep learning model designed for image recognition tasks. Unlike traditional convolutional neural networks (CNNs) that use convolutional layers, ViT leverages the transformer architecture originally developed for natural language processing (NLP). -- **Image Patching** - The input image is divided into fixed-size patches, similar to how text is tokenized in NLP tasks. -- **Linear Embedding** - Each image patch is flattened and linearly embedded into a vector. -- **Position Encoding** - Positional information is added to the patch embeddings to retain spatial information. -- **Transformer Encoder** - The embedded patches are fed into a standard transformer encoder, which uses self-attention mechanisms to process the patches and capture relationships between them. -- **Classification** - The output of the transformer encoder is used for image classification or other vision tasks. +Its workflow includes: -ViT has shown competitive performance on various image classification benchmarks and has been widely adopted in computer vision research. +- **Image Patching**: Dividing the input image into fixed-size patches, similar to tokenizing text in NLP. +- **Linear Embedding**: Flattening each image patch and linearly embedding it into a vector. +- **Position Encoding**: Adding positional information to the patch embeddings to preserve spatial details. +- **Transformer Encoder**: Processing the embedded patches using a standard transformer encoder with self-attention mechanisms to capture relationships. +- **Classification**: Using the encoder's output for image classification or other vision tasks. + +ViT has demonstrated competitive performance on various image classification benchmarks and is widely adopted in computer vision research.