Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,96 @@
---
title: Prerequisites
weight: 3

### FIXED, DO NOT MODIFY
layout: learningpathall
---

## Host machine requirements

This learning path demonstrates the benefits of using KleidiCV and KleidiAI in applications running on Arm, so you will need an aarch64 machine, preferably running the Ubuntu distribution. The instructions in this learning path assume an Ubuntu distribution.

## Install software required for this Learning Path

You need to ensure you have the following tools:
- `git`, the version control system, for cloning the Voice Assistant codebase
- `git lfs`, an extension to `git` that helps manage large files by storing references to the files in the repository instead of the actual files themselves
- `docker`, an open-source containerization platform
- `libomp`, LLVM's OpenMP runtime library

### `git` and `git lfs`

These tools can be installed by running the following command (depending on your machine's OS):

{{< tabpane code=true >}}
{{< tab header="Linux/Ubuntu" language="bash">}}
sudo apt install git git-lfs
{{< /tab >}}
{{< tab header="macOS" language="bash">}}
brew install git git-lfs
{{< /tab >}}
{{< /tabpane >}}

### `docker`

Start by checking that `docker` is installed on your machine by typing the following command line in a terminal:

```BASH { output_lines="2" }
docker --version
Docker version 27.3.1, build ce12230
```

If the above command fails with a message similar to "`docker: command not found`," then follow the steps from the [Docker Install Guide](https://learn.arm.com/install-guides/docker/).

{{% notice Note %}}
You might need to log in again or restart your machine for the changes to take effect.
{{% /notice %}}

Once you have confirmed that Docker is installed on your machine, you can check that it is operating normally with the following:

```BASH { output_lines="2-27" }
docker run hello-world
Unable to find image 'hello-world:latest' locally
latest: Pulling from library/hello-world
478afc919002: Pull complete
Digest: sha256:305243c734571da2d100c8c8b3c3167a098cab6049c9a5b066b6021a60fcb966
Status: Downloaded newer image for hello-world:latest

Hello from Docker!
This message shows that your installation appears to be working correctly.

To generate this message, Docker followed these steps:

1. The Docker client contacted the Docker daemon.

2. The Docker daemon pulled the "hello-world" image from Docker Hub.
(arm64v8)

3. The Docker daemon created a new container from that image which runs the
executable that produces the output you are currently reading.

4. The Docker daemon streamed that output to the Docker client, which sent it
to your terminal.

To try something more ambitious, you can run an Ubuntu container with:
$ docker run -it ubuntu bash

Share images, automate workflows, and more with a free Docker ID:
https://hub.docker.com/

For more examples and ideas, visit:
https://docs.docker.com/get-started/
```

### `libomp`

`libomp` can be installed by running the following command (depending on your machine's OS):

{{< tabpane code=true >}}
{{< tab header="Linux/Ubuntu" language="bash">}}
sudo apt install libomp-19-dev
{{< /tab >}}
{{< tab header="macOS" language="bash">}}
brew install libomp
{{< /tab >}}
{{< /tabpane >}}
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
---

title: Overview
weight: 4

### FIXED, DO NOT MODIFY
layout: learningpathall

---

## KleidiAI

[KleidiAI](https://gitlab.arm.com/kleidi/kleidiai) is an open-source library that provides optimized performance-critical routines, also known as micro-kernels, for artificial intelligence (AI) workloads tailored for Arm CPUs.

These routines are tuned to exploit the capabilities of specific Arm hardware architectures, aiming to maximize performance. The [KleidiAI](https://gitlab.arm.com/kleidi/kleidiai) library has been designed for ease of adoption into C or C++ machine learning (ML) and AI frameworks. A number of AI frameworks already take advantage of [KleidiAI](https://gitlab.arm.com/kleidi/kleidiai) to improve performances on Arm platforms.

## KleidiCV

The open-source [KleidiCV](https://gitlab.arm.com/kleidi/kleidicv) library provides high-performance image processing functions for AArch64. It is designed to be simple to integrate into a wide variety of projects and some computer vision frameworks (like OpenCV) take advantage of [KleidiCV](https://gitlab.arm.com/kleidi/kleidicv) to improve performances on Arm platforms.

## The AI camera pipelines

The AI camera pipelines are 2 example applications, implemented with a combination of AI and CV (Computer Vision) computations:
- Background Blur
- Low Light Enhancement

For both applications:
- The input and output images are stored in `ppm` (portable pixmap) format, with 3 channels (Red, Green, and Blue) and 256 color levels each (also known as `RGB8`).
- The images are first converted to the `YUV420` color space, where the background blur or low-light enhancement operations will take place. After the processing is done, the images are converted back to `RGB8` and saved in `ppm` format.

### Background Blur

The pipeline that has been implemented for background blur looks like this:

![example image alt-text#center](blur_pipeline.png "Figure 1: Background Blur Pipeline Diagram")

### Low Light Enhancement

The pipeline implemented for low-light enhancement is adapted from the LiveHDR+ pipeline, as originally proposed by Google Research in 2017, and looks like this:

![example image alt-text#center](lle_pipeline.png "Figure 2: Low Light Enhancement Pipeline Diagram")

where the Low-Resolution Coefficient Prediction Network (implemented with TFLite) includes computations like:
- strided convolutions
- local feature extraction with convolutional layers
- global feature extraction with convolutional + fully connected layers
- add, convolve, and reshape
Original file line number Diff line number Diff line change
@@ -0,0 +1,82 @@
---
title: Build the Pipelines
weight: 5

### FIXED, DO NOT MODIFY
layout: learningpathall
---

## Download the AI Camera Pipelines Project

```BASH
git clone https://git.gitlab.arm.com/kleidi/kleidi-examples/ai-camera-pipelines.git ai-camera-pipelines.git
```

Check out the data files:

```BASH
cd ai-camera-pipelines.git
git lfs install
git lfs pull
```

## Create a Build Container

The pipelines will be built from a container, so you first need to build the container:

```BASH
docker build -t ai-camera-pipelines -f docker/Dockerfile --build-arg DOCKERHUB_MIRROR=docker.io --build-arg CI_UID=$(id -u) .
```

## Build the AI Camera Pipelines

Start a shell in the container you just built with:

```BASH
docker run --rm --volume $PWD:/home/cv-examples/example -it ai-camera-pipelines
```

And execute the following commands and leave the container:

```BASH
ENABLE_SME2=0
TENSORFLOW_GIT_TAG=ddceb963c1599f803b5c4beca42b802de5134b44

# Build flatbuffers
git clone https://github.com/google/flatbuffers.git
cd flatbuffers
git checkout v24.3.25
mkdir build
cd build
cmake .. -DCMAKE_INSTALL_PREFIX=../install
cmake --build . -j16
cmake --install .
cd ../..

# Build the pipelines
mkdir build
cd build
cmake -GNinja -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=../install -DARMNN_TFLITE_PARSER=0 -DTENSORFLOW_GIT_TAG=$TENSORFLOW_GIT_TAG -DTFLITE_HOST_TOOLS_DIR=../flatbuffers/install/bin -DENABLE_SME2=$ENABLE_SME2 -DENABLE_KLEIDICV:BOOL=ON -DXNNPACK_ENABLE_KLEIDIAI:BOOL=ON -DCMAKE_TOOLCHAIN_FILE=toolchain.cmake -S ../example -B .
cmake --build . -j16
cmake --install .

# Package and export the pipelines.
cd ..
tar cfz example/install.tar.gz install

# Leave the container (ctrl+D)
```

You can note on the `cmake` configuration step command line:
- `-DENABLE_SME2=$ENABLE_SME2` with `ENABLE_SME2=0`: SME2 is not (yet) enabled --- but stay tuned !
- `-DARMNN_TFLITE_PARSER=0` configure the `ai-camera-pipelines` repository to use TFLite (with XNNPack) instead of ArmNN
- `-DENABLE_KLEIDICV:BOOL=ON`: KleidiCV is enabled
- `-DXNNPACK_ENABLE_KLEIDIAI:BOOL=ON`: TFLite+XNNPack with use KleidiAI

## Install the Pipelines

```BASH
cd $HOME
tar xfz ai-camera-pipelines.git/install.tar.gz
mv install ai-camera-pipelines
```
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
---

title: Run the Pipelines
weight: 6

### FIXED, DO NOT MODIFY
layout: learningpathall
---

In the previous section, we built the AI Camera Pipelines. In this section, you will run the AI Camera pipelines to transform an image.

## Background Blur

```BASH
cd $HOME/ai-camera-pipelines
bin/cinematic_mode resources/test_input2.ppm test_output2.ppm resources/depth_and_saliency_v3_2_assortedv2_w_augment_mobilenetv2_int8_only_ptq.tflite
```

![example image alt-text#center](test_input2.png "Figure 3: Original picture")
![example image alt-text#center](test_output2.png "Figure 4: Picture with blur applied")

## Low Light Enhancement

```BASH
cd $HOME/ai-camera-pipelines
bin/low_light_image_enhancement resources/test_input2.ppm test_output2_lime.ppm resources/HDRNetLIME_lr_coeffs_v1_1_0_mixed_low_light_perceptual_l2_loss_int8_only_ptq.tflite
```

![example image alt-text#center](test_output2_lime.png "Figure 5: Picture with low light enhancement applied")
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
---
title: Performances
weight: 7

### FIXED, DO NOT MODIFY
layout: learningpathall
---

The application that was previously built has a *benchmark* mode that will run the core function multiple times in a hot loop:

- `ai-camera-pipelines/bin/cinematic_mode_benchmark`
- `ai-camera-pipelines/bin/low_light_image_enhancement_benchmark`

The performance of the camera pipelines have been improved by using KleidiCV and KleidiAI:
- KleidiCV improves the performances of OpenCV with computation kernels optimized for the Arm processors.
- KleidiAI improves the performances of TFLite+XNNPack with computations kernels dedicatd to AI tasks on Arm processors.

## Performances with KleidiCV and KleidiAI

By default, the OpenCV library is built with KleidiCV support, and TFLite+xnnpack is built with KleidiAI support, so let's measure the performance of the applications we have already built:

```BASH
$ bin/cinematic_mode_benchmark 20 resources/depth_and_saliency_v3_2_assortedv2_w_augment_mobilenetv2_int8_only_ptq.tflite
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
Total run time over 20 iterations: 2023.39 ms

$ bin/low_light_image_enhancement_benchmark 20 resources/HDRNetLIME_lr_coeffs_v1_1_0_mixed_low_light_perceptual_l2_loss_int8_only_ptq.tflite
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
Total run time over 20 iterations: 54.3546 ms
```

It can be seen from above that:
- `cinematic_mode_benchmark` performed 20 iterations in 1985.99 ms,
- `low_light_image_enhancement_benchmark` performed 20 iterations in 52.3448 ms.

## Performances without KleidiCV and KleidiAI

Now re-run the build steps from the previous section and change CMake's invocation to use `-DENABLE_KLEIDICV:BOOL=OFF -DXNNPACK_ENABLE_KLEIDIAI:BOOL=OFF` in order *not* to use KleidiCV and KleidiAI.

You can run the benchmarks again:

```BASH
$ bin/cinematic_mode_benchmark 20 resources/depth_and_saliency_v3_2_assortedv2_w_augment_mobilenetv2_int8_only_ptq.tflite
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
Total run time over 20 iterations: 2029.25 ms

$ bin/low_light_image_enhancement_benchmark 20 resources/HDRNetLIME_lr_coeffs_v1_1_0_mixed_low_light_perceptual_l2_loss_int8_only_ptq.tflite
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
Total run time over 20 iterations: 79.431 ms
```

Let's put all those numbers together in a simple table to compare them easily:

| Benchmark | Without KleidiCV+KleidiAI | With KleidiCV+KleidiAI |
|-------------------------------------------|---------------------------|------------------------|
| `cinematic_mode_benchmark` | 2029.25 ms | 2023.39 ms |
| `low_light_image_enhancement_benchmark` | 79.431 ms | 54.3546 ms |

As can be seen, the blur pipeline (`cinematic_mode_benchmark`) benefits marginally from KleidiCV+KleidiAI, whereas low light enhancement got almost a 30% boost.

## Future Performance Uplift with SME2

A nice benefit of using KleidiCV and KleidiAI is that whenever the hardware adds support for new and more powerful instructions, the applications will be able to get a performance uplift without requiring complex software changes — KleidiCV and KleidiAI operate as abstraction layers that will be able to build on hardware improvements to boost future performance. An example of such a performance boost *for free* will take place in a couple of months when processors implementing SME2 become available.
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
---
title: AI Camera Pipelines

minutes_to_complete: 30

who_is_this_for: This Learning Path is an introductory topic on improving the performance of camera pipelines using KleidiAI and KleidiCV.

learning_objectives:
- Compile and run camera pipeline applications
- Use KleidiCV and KleidiAI to boost the performance of camera pipelines

prerequisites:
- CMake
- Git + Git LFS
- Docker

author: Arnaud de Grandmaison

test_images:
- ubuntu:latest
test_link: null
test_maintenance: false

### Tags
skilllevels: Introductory
subjects: Performance and Architecture
armips:
- Cortex-A
tools_software_languages:
- C++
operatingsystems:
- Linux
- macOS
- Windows

further_reading:

- resource:
title: Accelerate Generative AI Workloads Using KleidiAI
link: https://learn.arm.com/learning-paths/cross-platform/kleidiai-explainer
type: website

- resource:
title: LLM Inference on Android with KleidiAI, MediaPipe, and XNNPACK
link: https://learn.arm.com/learning-paths/mobile-graphics-and-gaming/kleidiai-on-android-with-mediapipe-and-xnnpack/
type: website

- resource:
title: Vision LLM Inference on Android with KleidiAI and MNN
link: https://learn.arm.com/learning-paths/mobile-graphics-and-gaming/vision-llm-inference-on-android-with-kleidiai-and-mnn/
type: website

### FIXED, DO NOT MODIFY
# ================================================================================
weight: 1 # _index.md always has a weight of 1 to order correctly
layout: "learningpathall" # All files under learning paths have this same wrapper
learning_path_main_page: "yes" # This should be surfaced when looking for related content. Only set for _index.md of learning path content.
---
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
---
# ================================================================================
# FIXED, DO NOT MODIFY
# ================================================================================
weight: 21 # set to always be larger than the content in this path, and one more than 'review'
title: "Next Steps" # Always the same
layout: "learningpathall" # All files under learning paths have this same wrapper
---
Loading