diff --git a/content/learning-paths/mobile-graphics-and-gaming/ai-camera-pipelines/1-prerequisites.md b/content/learning-paths/mobile-graphics-and-gaming/ai-camera-pipelines/1-prerequisites.md new file mode 100644 index 0000000000..79ce2a4211 --- /dev/null +++ b/content/learning-paths/mobile-graphics-and-gaming/ai-camera-pipelines/1-prerequisites.md @@ -0,0 +1,96 @@ +--- +title: Prerequisites +weight: 3 + +### FIXED, DO NOT MODIFY +layout: learningpathall +--- + +## Host machine requirements + +This learning path demonstrates the benefits of using KleidiCV and KleidiAI in applications running on Arm, so you will need an aarch64 machine, preferably running the Ubuntu distribution. The instructions in this learning path assume an Ubuntu distribution. + +## Install software required for this Learning Path + +You need to ensure you have the following tools: +- `git`, the version control system, for cloning the Voice Assistant codebase +- `git lfs`, an extension to `git` that helps manage large files by storing references to the files in the repository instead of the actual files themselves +- `docker`, an open-source containerization platform +- `libomp`, LLVM's OpenMP runtime library + +### `git` and `git lfs` + +These tools can be installed by running the following command (depending on your machine's OS): + +{{< tabpane code=true >}} + {{< tab header="Linux/Ubuntu" language="bash">}} +sudo apt install git git-lfs + {{< /tab >}} + {{< tab header="macOS" language="bash">}} +brew install git git-lfs + {{< /tab >}} +{{< /tabpane >}} + +### `docker` + +Start by checking that `docker` is installed on your machine by typing the following command line in a terminal: + +```BASH { output_lines="2" } +docker --version +Docker version 27.3.1, build ce12230 +``` + +If the above command fails with a message similar to "`docker: command not found`," then follow the steps from the [Docker Install Guide](https://learn.arm.com/install-guides/docker/). + +{{% notice Note %}} +You might need to log in again or restart your machine for the changes to take effect. +{{% /notice %}} + +Once you have confirmed that Docker is installed on your machine, you can check that it is operating normally with the following: + +```BASH { output_lines="2-27" } +docker run hello-world +Unable to find image 'hello-world:latest' locally +latest: Pulling from library/hello-world +478afc919002: Pull complete +Digest: sha256:305243c734571da2d100c8c8b3c3167a098cab6049c9a5b066b6021a60fcb966 +Status: Downloaded newer image for hello-world:latest + +Hello from Docker! +This message shows that your installation appears to be working correctly. + +To generate this message, Docker followed these steps: + + 1. The Docker client contacted the Docker daemon. + + 2. The Docker daemon pulled the "hello-world" image from Docker Hub. + (arm64v8) + + 3. The Docker daemon created a new container from that image which runs the + executable that produces the output you are currently reading. + + 4. The Docker daemon streamed that output to the Docker client, which sent it + to your terminal. + +To try something more ambitious, you can run an Ubuntu container with: + $ docker run -it ubuntu bash + +Share images, automate workflows, and more with a free Docker ID: + https://hub.docker.com/ + +For more examples and ideas, visit: + https://docs.docker.com/get-started/ +``` + +### `libomp` + +`libomp` can be installed by running the following command (depending on your machine's OS): + +{{< tabpane code=true >}} + {{< tab header="Linux/Ubuntu" language="bash">}} +sudo apt install libomp-19-dev + {{< /tab >}} + {{< tab header="macOS" language="bash">}} +brew install libomp + {{< /tab >}} +{{< /tabpane >}} \ No newline at end of file diff --git a/content/learning-paths/mobile-graphics-and-gaming/ai-camera-pipelines/2-overview.md b/content/learning-paths/mobile-graphics-and-gaming/ai-camera-pipelines/2-overview.md new file mode 100644 index 0000000000..d72946adc8 --- /dev/null +++ b/content/learning-paths/mobile-graphics-and-gaming/ai-camera-pipelines/2-overview.md @@ -0,0 +1,47 @@ +--- + +title: Overview +weight: 4 + +### FIXED, DO NOT MODIFY +layout: learningpathall + +--- + +## KleidiAI + +[KleidiAI](https://gitlab.arm.com/kleidi/kleidiai) is an open-source library that provides optimized performance-critical routines, also known as micro-kernels, for artificial intelligence (AI) workloads tailored for Arm CPUs. + +These routines are tuned to exploit the capabilities of specific Arm hardware architectures, aiming to maximize performance. The [KleidiAI](https://gitlab.arm.com/kleidi/kleidiai) library has been designed for ease of adoption into C or C++ machine learning (ML) and AI frameworks. A number of AI frameworks already take advantage of [KleidiAI](https://gitlab.arm.com/kleidi/kleidiai) to improve performances on Arm platforms. + +## KleidiCV + +The open-source [KleidiCV](https://gitlab.arm.com/kleidi/kleidicv) library provides high-performance image processing functions for AArch64. It is designed to be simple to integrate into a wide variety of projects and some computer vision frameworks (like OpenCV) take advantage of [KleidiCV](https://gitlab.arm.com/kleidi/kleidicv) to improve performances on Arm platforms. + +## The AI camera pipelines + +The AI camera pipelines are 2 example applications, implemented with a combination of AI and CV (Computer Vision) computations: +- Background Blur +- Low Light Enhancement + +For both applications: +- The input and output images are stored in `ppm` (portable pixmap) format, with 3 channels (Red, Green, and Blue) and 256 color levels each (also known as `RGB8`). +- The images are first converted to the `YUV420` color space, where the background blur or low-light enhancement operations will take place. After the processing is done, the images are converted back to `RGB8` and saved in `ppm` format. + +### Background Blur + +The pipeline that has been implemented for background blur looks like this: + +![example image alt-text#center](blur_pipeline.png "Figure 1: Background Blur Pipeline Diagram") + +### Low Light Enhancement + +The pipeline implemented for low-light enhancement is adapted from the LiveHDR+ pipeline, as originally proposed by Google Research in 2017, and looks like this: + +![example image alt-text#center](lle_pipeline.png "Figure 2: Low Light Enhancement Pipeline Diagram") + +where the Low-Resolution Coefficient Prediction Network (implemented with TFLite) includes computations like: +- strided convolutions +- local feature extraction with convolutional layers +- global feature extraction with convolutional + fully connected layers +- add, convolve, and reshape \ No newline at end of file diff --git a/content/learning-paths/mobile-graphics-and-gaming/ai-camera-pipelines/3-build.md b/content/learning-paths/mobile-graphics-and-gaming/ai-camera-pipelines/3-build.md new file mode 100644 index 0000000000..2bc4ff2fdf --- /dev/null +++ b/content/learning-paths/mobile-graphics-and-gaming/ai-camera-pipelines/3-build.md @@ -0,0 +1,82 @@ +--- +title: Build the Pipelines +weight: 5 + +### FIXED, DO NOT MODIFY +layout: learningpathall +--- + +## Download the AI Camera Pipelines Project + +```BASH +git clone https://git.gitlab.arm.com/kleidi/kleidi-examples/ai-camera-pipelines.git ai-camera-pipelines.git +``` + +Check out the data files: + +```BASH +cd ai-camera-pipelines.git +git lfs install +git lfs pull +``` + +## Create a Build Container + +The pipelines will be built from a container, so you first need to build the container: + +```BASH +docker build -t ai-camera-pipelines -f docker/Dockerfile --build-arg DOCKERHUB_MIRROR=docker.io --build-arg CI_UID=$(id -u) . +``` + +## Build the AI Camera Pipelines + +Start a shell in the container you just built with: + +```BASH +docker run --rm --volume $PWD:/home/cv-examples/example -it ai-camera-pipelines +``` + +And execute the following commands and leave the container: + +```BASH +ENABLE_SME2=0 +TENSORFLOW_GIT_TAG=ddceb963c1599f803b5c4beca42b802de5134b44 + +# Build flatbuffers +git clone https://github.com/google/flatbuffers.git +cd flatbuffers +git checkout v24.3.25 +mkdir build +cd build +cmake .. -DCMAKE_INSTALL_PREFIX=../install +cmake --build . -j16 +cmake --install . +cd ../.. + +# Build the pipelines +mkdir build +cd build +cmake -GNinja -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=../install -DARMNN_TFLITE_PARSER=0 -DTENSORFLOW_GIT_TAG=$TENSORFLOW_GIT_TAG -DTFLITE_HOST_TOOLS_DIR=../flatbuffers/install/bin -DENABLE_SME2=$ENABLE_SME2 -DENABLE_KLEIDICV:BOOL=ON -DXNNPACK_ENABLE_KLEIDIAI:BOOL=ON -DCMAKE_TOOLCHAIN_FILE=toolchain.cmake -S ../example -B . +cmake --build . -j16 +cmake --install . + +# Package and export the pipelines. +cd .. +tar cfz example/install.tar.gz install + +# Leave the container (ctrl+D) +``` + +You can note on the `cmake` configuration step command line: +- `-DENABLE_SME2=$ENABLE_SME2` with `ENABLE_SME2=0`: SME2 is not (yet) enabled --- but stay tuned ! +- `-DARMNN_TFLITE_PARSER=0` configure the `ai-camera-pipelines` repository to use TFLite (with XNNPack) instead of ArmNN +- `-DENABLE_KLEIDICV:BOOL=ON`: KleidiCV is enabled +- `-DXNNPACK_ENABLE_KLEIDIAI:BOOL=ON`: TFLite+XNNPack with use KleidiAI + +## Install the Pipelines + +```BASH +cd $HOME +tar xfz ai-camera-pipelines.git/install.tar.gz +mv install ai-camera-pipelines +``` \ No newline at end of file diff --git a/content/learning-paths/mobile-graphics-and-gaming/ai-camera-pipelines/4-run.md b/content/learning-paths/mobile-graphics-and-gaming/ai-camera-pipelines/4-run.md new file mode 100644 index 0000000000..ed9bfce566 --- /dev/null +++ b/content/learning-paths/mobile-graphics-and-gaming/ai-camera-pipelines/4-run.md @@ -0,0 +1,29 @@ +--- + +title: Run the Pipelines +weight: 6 + +### FIXED, DO NOT MODIFY +layout: learningpathall +--- + +In the previous section, we built the AI Camera Pipelines. In this section, you will run the AI Camera pipelines to transform an image. + +## Background Blur + +```BASH +cd $HOME/ai-camera-pipelines +bin/cinematic_mode resources/test_input2.ppm test_output2.ppm resources/depth_and_saliency_v3_2_assortedv2_w_augment_mobilenetv2_int8_only_ptq.tflite +``` + +![example image alt-text#center](test_input2.png "Figure 3: Original picture") +![example image alt-text#center](test_output2.png "Figure 4: Picture with blur applied") + +## Low Light Enhancement + +```BASH +cd $HOME/ai-camera-pipelines +bin/low_light_image_enhancement resources/test_input2.ppm test_output2_lime.ppm resources/HDRNetLIME_lr_coeffs_v1_1_0_mixed_low_light_perceptual_l2_loss_int8_only_ptq.tflite +``` + +![example image alt-text#center](test_output2_lime.png "Figure 5: Picture with low light enhancement applied") \ No newline at end of file diff --git a/content/learning-paths/mobile-graphics-and-gaming/ai-camera-pipelines/5-performances.md b/content/learning-paths/mobile-graphics-and-gaming/ai-camera-pipelines/5-performances.md new file mode 100644 index 0000000000..6346dbace0 --- /dev/null +++ b/content/learning-paths/mobile-graphics-and-gaming/ai-camera-pipelines/5-performances.md @@ -0,0 +1,63 @@ +--- +title: Performances +weight: 7 + +### FIXED, DO NOT MODIFY +layout: learningpathall +--- + +The application that was previously built has a *benchmark* mode that will run the core function multiple times in a hot loop: + +- `ai-camera-pipelines/bin/cinematic_mode_benchmark` +- `ai-camera-pipelines/bin/low_light_image_enhancement_benchmark` + +The performance of the camera pipelines have been improved by using KleidiCV and KleidiAI: +- KleidiCV improves the performances of OpenCV with computation kernels optimized for the Arm processors. +- KleidiAI improves the performances of TFLite+XNNPack with computations kernels dedicatd to AI tasks on Arm processors. + +## Performances with KleidiCV and KleidiAI + +By default, the OpenCV library is built with KleidiCV support, and TFLite+xnnpack is built with KleidiAI support, so let's measure the performance of the applications we have already built: + +```BASH +$ bin/cinematic_mode_benchmark 20 resources/depth_and_saliency_v3_2_assortedv2_w_augment_mobilenetv2_int8_only_ptq.tflite +INFO: Created TensorFlow Lite XNNPACK delegate for CPU. +Total run time over 20 iterations: 2023.39 ms + +$ bin/low_light_image_enhancement_benchmark 20 resources/HDRNetLIME_lr_coeffs_v1_1_0_mixed_low_light_perceptual_l2_loss_int8_only_ptq.tflite +INFO: Created TensorFlow Lite XNNPACK delegate for CPU. +Total run time over 20 iterations: 54.3546 ms +``` + +It can be seen from above that: +- `cinematic_mode_benchmark` performed 20 iterations in 1985.99 ms, +- `low_light_image_enhancement_benchmark` performed 20 iterations in 52.3448 ms. + +## Performances without KleidiCV and KleidiAI + +Now re-run the build steps from the previous section and change CMake's invocation to use `-DENABLE_KLEIDICV:BOOL=OFF -DXNNPACK_ENABLE_KLEIDIAI:BOOL=OFF` in order *not* to use KleidiCV and KleidiAI. + +You can run the benchmarks again: + +```BASH +$ bin/cinematic_mode_benchmark 20 resources/depth_and_saliency_v3_2_assortedv2_w_augment_mobilenetv2_int8_only_ptq.tflite +INFO: Created TensorFlow Lite XNNPACK delegate for CPU. +Total run time over 20 iterations: 2029.25 ms + +$ bin/low_light_image_enhancement_benchmark 20 resources/HDRNetLIME_lr_coeffs_v1_1_0_mixed_low_light_perceptual_l2_loss_int8_only_ptq.tflite +INFO: Created TensorFlow Lite XNNPACK delegate for CPU. +Total run time over 20 iterations: 79.431 ms +``` + +Let's put all those numbers together in a simple table to compare them easily: + +| Benchmark | Without KleidiCV+KleidiAI | With KleidiCV+KleidiAI | +|-------------------------------------------|---------------------------|------------------------| +| `cinematic_mode_benchmark` | 2029.25 ms | 2023.39 ms | +| `low_light_image_enhancement_benchmark` | 79.431 ms | 54.3546 ms | + +As can be seen, the blur pipeline (`cinematic_mode_benchmark`) benefits marginally from KleidiCV+KleidiAI, whereas low light enhancement got almost a 30% boost. + +## Future Performance Uplift with SME2 + +A nice benefit of using KleidiCV and KleidiAI is that whenever the hardware adds support for new and more powerful instructions, the applications will be able to get a performance uplift without requiring complex software changes — KleidiCV and KleidiAI operate as abstraction layers that will be able to build on hardware improvements to boost future performance. An example of such a performance boost *for free* will take place in a couple of months when processors implementing SME2 become available. \ No newline at end of file diff --git a/content/learning-paths/mobile-graphics-and-gaming/ai-camera-pipelines/_index.md b/content/learning-paths/mobile-graphics-and-gaming/ai-camera-pipelines/_index.md new file mode 100644 index 0000000000..c00a35593c --- /dev/null +++ b/content/learning-paths/mobile-graphics-and-gaming/ai-camera-pipelines/_index.md @@ -0,0 +1,58 @@ +--- +title: AI Camera Pipelines + +minutes_to_complete: 30 + +who_is_this_for: This Learning Path is an introductory topic on improving the performance of camera pipelines using KleidiAI and KleidiCV. + +learning_objectives: + - Compile and run camera pipeline applications + - Use KleidiCV and KleidiAI to boost the performance of camera pipelines + +prerequisites: + - CMake + - Git + Git LFS + - Docker + +author: Arnaud de Grandmaison + +test_images: + - ubuntu:latest +test_link: null +test_maintenance: false + +### Tags +skilllevels: Introductory +subjects: Performance and Architecture +armips: + - Cortex-A +tools_software_languages: + - C++ +operatingsystems: + - Linux + - macOS + - Windows + +further_reading: + + - resource: + title: Accelerate Generative AI Workloads Using KleidiAI + link: https://learn.arm.com/learning-paths/cross-platform/kleidiai-explainer + type: website + + - resource: + title: LLM Inference on Android with KleidiAI, MediaPipe, and XNNPACK + link: https://learn.arm.com/learning-paths/mobile-graphics-and-gaming/kleidiai-on-android-with-mediapipe-and-xnnpack/ + type: website + + - resource: + title: Vision LLM Inference on Android with KleidiAI and MNN + link: https://learn.arm.com/learning-paths/mobile-graphics-and-gaming/vision-llm-inference-on-android-with-kleidiai-and-mnn/ + type: website + +### FIXED, DO NOT MODIFY +# ================================================================================ +weight: 1 # _index.md always has a weight of 1 to order correctly +layout: "learningpathall" # All files under learning paths have this same wrapper +learning_path_main_page: "yes" # This should be surfaced when looking for related content. Only set for _index.md of learning path content. +--- diff --git a/content/learning-paths/mobile-graphics-and-gaming/ai-camera-pipelines/_next-steps.md b/content/learning-paths/mobile-graphics-and-gaming/ai-camera-pipelines/_next-steps.md new file mode 100644 index 0000000000..921f569dd7 --- /dev/null +++ b/content/learning-paths/mobile-graphics-and-gaming/ai-camera-pipelines/_next-steps.md @@ -0,0 +1,8 @@ +--- +# ================================================================================ +# FIXED, DO NOT MODIFY +# ================================================================================ +weight: 21 # set to always be larger than the content in this path, and one more than 'review' +title: "Next Steps" # Always the same +layout: "learningpathall" # All files under learning paths have this same wrapper +--- diff --git a/content/learning-paths/mobile-graphics-and-gaming/ai-camera-pipelines/_review.md b/content/learning-paths/mobile-graphics-and-gaming/ai-camera-pipelines/_review.md new file mode 100644 index 0000000000..4cc37284fd --- /dev/null +++ b/content/learning-paths/mobile-graphics-and-gaming/ai-camera-pipelines/_review.md @@ -0,0 +1,35 @@ +--- +review: + - questions: + question: > + What is KleidiAI? + answers: + - An anime about a little AI lost in a giant world. + - A software library + correct_answer: 2 + explanation: > + KleidiAI is an open-source software library that provides optimized, + performance-critical micro-kernels for artificial intelligence (AI) + workloads tailored for Arm processors. + + - questions: + question: > + How does KleidiAI optimize performance? + answers: + - Lots of magic, and let's be honest, a bit of hard work. + - It takes advantage of various available Arm processor architectural features. + correct_answer: 2 + explanation: > + Processor architectural features, e.g., ``FEAT_DotProd``, when implemented, enable + the software to use specific instructions dedicated to efficiently performing some + tasks or computations. For example, ``FEAT_DotProd`` adds the + ``UDOT`` and ``SDOT`` 8-bit dot product instructions, which are critical for + improving the performance of dot product computations. + +# ================================================================================ +# FIXED, DO NOT MODIFY +# ================================================================================ +title: "Review" # Always the same title +weight: 20 # Set to always be larger than the content in this path +layout: "learningpathall" # All files under learning paths have this same wrapper +--- \ No newline at end of file diff --git a/content/learning-paths/mobile-graphics-and-gaming/ai-camera-pipelines/blur_pipeline.png b/content/learning-paths/mobile-graphics-and-gaming/ai-camera-pipelines/blur_pipeline.png new file mode 100644 index 0000000000..6d1de206a9 Binary files /dev/null and b/content/learning-paths/mobile-graphics-and-gaming/ai-camera-pipelines/blur_pipeline.png differ diff --git a/content/learning-paths/mobile-graphics-and-gaming/ai-camera-pipelines/lle_pipeline.png b/content/learning-paths/mobile-graphics-and-gaming/ai-camera-pipelines/lle_pipeline.png new file mode 100644 index 0000000000..dcc515e46d Binary files /dev/null and b/content/learning-paths/mobile-graphics-and-gaming/ai-camera-pipelines/lle_pipeline.png differ diff --git a/content/learning-paths/mobile-graphics-and-gaming/ai-camera-pipelines/test_input2.png b/content/learning-paths/mobile-graphics-and-gaming/ai-camera-pipelines/test_input2.png new file mode 100644 index 0000000000..42d323adc4 Binary files /dev/null and b/content/learning-paths/mobile-graphics-and-gaming/ai-camera-pipelines/test_input2.png differ diff --git a/content/learning-paths/mobile-graphics-and-gaming/ai-camera-pipelines/test_output2.png b/content/learning-paths/mobile-graphics-and-gaming/ai-camera-pipelines/test_output2.png new file mode 100644 index 0000000000..d834ce721d Binary files /dev/null and b/content/learning-paths/mobile-graphics-and-gaming/ai-camera-pipelines/test_output2.png differ diff --git a/content/learning-paths/mobile-graphics-and-gaming/ai-camera-pipelines/test_output2_lime.png b/content/learning-paths/mobile-graphics-and-gaming/ai-camera-pipelines/test_output2_lime.png new file mode 100644 index 0000000000..18042cafea Binary files /dev/null and b/content/learning-paths/mobile-graphics-and-gaming/ai-camera-pipelines/test_output2_lime.png differ