diff --git a/content/learning-paths/mobile-graphics-and-gaming/ai-camera-pipelines/2-overview.md b/content/learning-paths/mobile-graphics-and-gaming/ai-camera-pipelines/2-overview.md index 4287f39064..6906788253 100644 --- a/content/learning-paths/mobile-graphics-and-gaming/ai-camera-pipelines/2-overview.md +++ b/content/learning-paths/mobile-graphics-and-gaming/ai-camera-pipelines/2-overview.md @@ -12,27 +12,30 @@ layout: learningpathall [KleidiAI](https://gitlab.arm.com/kleidi/kleidiai) is an open-source library that provides optimized, performance-critical routines - also known as micro-kernels - for artificial intelligence (AI) workloads on Arm CPUs. -These routines are tuned to take full advantage of specific Arm hardware architectures to maximize performance. The [KleidiAI](https://gitlab.arm.com/kleidi/kleidiai) library is designed for easy integration into C or C++ machine learning (ML) and AI frameworks. +These routines are tuned to take full advantage of specific Arm hardware architectures to maximize performance. The [KleidiAI](https://gitlab.arm.com/kleidi/kleidiai) library is designed for easy integration into C or C++ machine learning (ML) and AI frameworks. Several popular AI frameworks already take advantage of [KleidiAI](https://gitlab.arm.com/kleidi/kleidiai) to improve performance on Arm platforms. ## KleidiCV -[KleidiCV](https://gitlab.arm.com/kleidi/kleidicv) is an open-source library that provides high-performance image processing functions for AArch64. +[KleidiCV](https://gitlab.arm.com/kleidi/kleidicv) is an open-source library that provides high-performance image processing functions for AArch64. It is designed to be lightweight and simple to integrate into a wide variety of projects. Some computer vision frameworks, such as OpenCV, leverage [KleidiCV](https://gitlab.arm.com/kleidi/kleidicv) to accelerate image processing on Arm devices. ## AI camera pipelines -This Learning Path provides two example applications that combine AI and computer vision (CV) techniques: -- Background Blur. -- Low-Light Enhancement. +This Learning Path provides three example applications that combine AI and computer vision (CV) techniques: +- Background Blur, +- Low-Light Enhancement, +- Neural Denoising. + +## Background Blur and Low Light Enhancement Both applications: -- Use input and output images that are stored in `ppm` (Portable Pixmap format), with three RGB channels (Red, Green, and Blue). Each channel supports 256 intensity levels (0-255) commonly referred to as `RGB8`. +- Use input and output images that are stored in `png` format, with three RGB channels (Red, Green, and Blue). Each channel supports 256 intensity levels (0-255) commonly referred to as `RGB8`. - Convert the images to the `YUV420` color space for processing. - Apply the relevant effect (background blur or low-light enhancement). -- Convert the processed images back to `RGB8` and save them as `ppm` files. +- Convert the processed images back to `RGB8` and save them as `.png` files. ### Background Blur @@ -50,4 +53,32 @@ The Low-Resolution Coefficient Prediction Network (implemented with LiteRT) perf - Strided convolutions. - Local feature extraction using convolutional layers. - Global feature extraction using convolutional and fully connected layers. -- Add, convolve, and reshape operations. \ No newline at end of file +- Add, convolve, and reshape operations. + +## Neural Denoising + +Every smartphone photographer has seen it: images that look sharp in daylight +but fall apart in dim lighting. This is because _signal-to-noise ratio (SNR)_ +drops dramatically when sensors capture fewer photons. At 1000 lux, the signal +dominates and images look clean; at 1 lux, readout noise becomes visible as +grain, color speckles, and loss of fine detail. + +That’s why _neural camera denoising_ is one of the most critical --- and +computationally demanding --- steps in a camera pipeline. Done well, it +transforms noisy frames into sharp, vibrant captures. Done poorly, it leaves +smudges and artifacts that ruin the shot. + +As depicted in the diagram below, the Neural Denoising pipeline is using 2 +algorithms to process the frames: +- either temporally, with an algorithm named `ultralite` in the code +repository, +- or spatially, with an algorithm named `collapsenet` in the code repository, +- or both. + +Temporal denoising uses some frames as history. + +![example image alt-text#center](denoising_pipeline.png "Neural Denoising Pipeline Diagram") + +The Neural Denoising application works on frames, as emitted by a camera sensor in Bayer format: +- the input frames are in RGGB 1080x1920x4 format, +- the output frames in YGGV 4x1080x1920 format. diff --git a/content/learning-paths/mobile-graphics-and-gaming/ai-camera-pipelines/3-build.md b/content/learning-paths/mobile-graphics-and-gaming/ai-camera-pipelines/3-build.md index f96e92d4fc..7ac81efd76 100644 --- a/content/learning-paths/mobile-graphics-and-gaming/ai-camera-pipelines/3-build.md +++ b/content/learning-paths/mobile-graphics-and-gaming/ai-camera-pipelines/3-build.md @@ -30,7 +30,7 @@ Build the Docker container used to compile the pipelines: docker build -t ai-camera-pipelines -f docker/Dockerfile \ --build-arg DOCKERHUB_MIRROR=docker.io \ --build-arg CI_UID=$(id -u) \ - docker + docker/ ``` ## Build the AI Camera Pipelines @@ -45,7 +45,7 @@ Inside the container, run the following commands: ```bash ENABLE_SME2=0 -TENSORFLOW_GIT_TAG=ddceb963c1599f803b5c4beca42b802de5134b44 +TENSORFLOW_GIT_TAG="v2.19.0" # Build flatbuffers git clone https://github.com/google/flatbuffers.git diff --git a/content/learning-paths/mobile-graphics-and-gaming/ai-camera-pipelines/4-run.md b/content/learning-paths/mobile-graphics-and-gaming/ai-camera-pipelines/4-run.md index 828852f459..ebecb088d5 100644 --- a/content/learning-paths/mobile-graphics-and-gaming/ai-camera-pipelines/4-run.md +++ b/content/learning-paths/mobile-graphics-and-gaming/ai-camera-pipelines/4-run.md @@ -9,26 +9,55 @@ layout: learningpathall ## Apply transformations -In the previous section, you built the AI Camera Pipelines. In this section, you'll run them to apply transformations to an input image. +In the previous section, you built the AI Camera Pipelines. In this section, you'll run them to apply transformations to an input image or input frames. + + +```bash +cd $HOME/ai-camera-pipelines +python3 -m venv venv +. venv/bin/activate +pip install -r ai-camera-pipelines.git/docker/python-requirements.txt +``` ### Background Blur -Run the background blur pipeline: +Run the background Blur pipeline, using `resources/test_input.png` as the input image and write the transformed image to `test_output.png`: ```bash cd $HOME/ai-camera-pipelines -bin/cinematic_mode resources/test_input2.ppm test_output2.ppm resources/depth_and_saliency_v3_2_assortedv2_w_augment_mobilenetv2_int8_only_ptq.tflite +bin/cinematic_mode resources/test_input.png test_output.png resources/depth_and_saliency_v3_2_assortedv2_w_augment_mobilenetv2_int8_only_ptq.tflite ``` -![example image alt-text#center](test_input2.png "Original picture") -![example image alt-text#center](test_output2.png "Picture with blur applied") +![example image alt-text#center](test_input2.png "Input image") +![example image alt-text#center](test_output2.png "Image with blur applied") ### Low-Light Enhancement +Run the Low-Light Enhancement pipeline, using `resources/test_input.png` as the input image and write the transformed image to `test_output2_lime.png`: + ```bash cd $HOME/ai-camera-pipelines -bin/low_light_image_enhancement resources/test_input2.ppm test_output2_lime.ppm resources/HDRNetLIME_lr_coeffs_v1_1_0_mixed_low_light_perceptual_l2_loss_int8_only_ptq.tflite +bin/low_light_image_enhancement resources/test_input.png test_output2_lime.png resources/HDRNetLIME_lr_coeffs_v1_1_0_mixed_low_light_perceptual_l1_loss_float32.tflite +``` + +![example image alt-text#center](test_input2.png "Input image") +![example image alt-text#center](test_output2_lime.png "Image with low-light enhancement applied") + + +### Neural denoising + +When the SME extension is not available, only temporal neural denoising is +available, so this is what you will run for now --- but stay tuned as the SME extension +will become available very soon: + +```bash +./scripts/run_neural_denoiser_temporal.sh ``` -![example image alt-text#center](test_input2.png "Original picture") -![example image alt-text#center](test_output2_lime.png "Picture with low-light enhancement applied") \ No newline at end of file +The input frames are: + - first converted from `.png` files in the `resources/test-lab-sequence/` directory to the sensor format (RGGB Bayer) into `neural_denoiser_io/input_noisy*`, + - those frames are then processed by the Neural Denoiser and written into `neural_denoiser_io/output_denoised*`, + - last, the denoised frames are converted back to `.png` for easy visualization in directory `test-lab-sequence-out`. + +![example image alt-text#center](denoising_input_0010.png "Original frame") +![example image alt-text#center](denoising_output_0010.png "Frame with temporal denoising applied") \ No newline at end of file diff --git a/content/learning-paths/mobile-graphics-and-gaming/ai-camera-pipelines/5-performances.md b/content/learning-paths/mobile-graphics-and-gaming/ai-camera-pipelines/5-performances.md index 1372ae1b5e..bb0743e21b 100644 --- a/content/learning-paths/mobile-graphics-and-gaming/ai-camera-pipelines/5-performances.md +++ b/content/learning-paths/mobile-graphics-and-gaming/ai-camera-pipelines/5-performances.md @@ -12,19 +12,19 @@ The application you built earlier includes a *benchmark mode* that runs the core - `ai-camera-pipelines/bin/cinematic_mode_benchmark` - `ai-camera-pipelines/bin/low_light_image_enhancement_benchmark` +- `ai-camera-pipelines/bin/neural_denoiser_temporal_benchmark_4K` These benchmarks demonstrate the performance improvements enabled by KleidiCV and KleidiAI: - KleidiCV enhances OpenCV performance with computation kernels optimized for Arm processors. - - KleidiAI accelerates LiteRT + XNNPack inference using AI-optimized micro-kernels tailored for Arm CPUs. ## Performances with KleidiCV and KleidiAI -By default, the OpenCV library is built with KleidiCV support, and LiteRT+xnnpack is built with KleidiAI support. +By default, the OpenCV library is built with KleidiCV support, and LiteRT+xnnpack is built with KleidiAI support. You can run the benchmarks using the applications you built earlier. -Run the first benchmark: +Run the Background Blur benchmark: ```bash bin/cinematic_mode_benchmark 20 resources/depth_and_saliency_v3_2_assortedv2_w_augment_mobilenetv2_int8_only_ptq.tflite @@ -34,25 +34,38 @@ The output is similar to: ```output INFO: Created TensorFlow Lite XNNPACK delegate for CPU. -Total run time over 20 iterations: 2023.39 ms +Total run time over 20 iterations: 2028.745 ms ``` -Run the second benchmark: +Run the Low Light Enhancement benchmark: ```bash -bin/low_light_image_enhancement_benchmark 20 resources/HDRNetLIME_lr_coeffs_v1_1_0_mixed_low_light_perceptual_l2_loss_int8_only_ptq.tflite +bin/low_light_image_enhancement_benchmark 20 resources/HDRNetLIME_lr_coeffs_v1_1_0_mixed_low_light_perceptual_l1_loss_float32.tflite ``` The output is similar to: ```output INFO: Created TensorFlow Lite XNNPACK delegate for CPU. -Total run time over 20 iterations: 54.3546 ms +Total run time over 20 iterations: 58.2126 ms +``` + +Last, run the Neural Denoising benchmark: + +```bash +bin/neural_denoiser_temporal_benchmark_4K 20 +``` + +The output is similar to: + +```output +Total run time over 10 iterations: 37.6839 ms ``` From these results, you can see that: -- `cinematic_mode_benchmark` performed 20 iterations in 1985.99 ms. -- `low_light_image_enhancement_benchmark` performed 20 iterations in 52.3448 ms. +- `cinematic_mode_benchmark` performed 20 iterations in 2028.745 ms. +- `low_light_image_enhancement_benchmark` performed 20 iterations in 58.2126 ms. +- `neural_denoiser_temporal_benchmark_4K` performed 20 iterations in 37.6839 ms. ## Benchmark results without KleidiCV and KleidiAI @@ -61,7 +74,7 @@ To measure the performance without these optimizations, recompile the pipelines -DENABLE_KLEIDICV:BOOL=OFF -DXNNPACK_ENABLE_KLEIDIAI:BOOL=OFF ``` -Re-run the first benchmark: +Re-run the Background Blur benchmark: ```bash bin/cinematic_mode_benchmark 20 resources/depth_and_saliency_v3_2_assortedv2_w_augment_mobilenetv2_int8_only_ptq.tflite @@ -71,35 +84,52 @@ The new output is similar to: ```output INFO: Created TensorFlow Lite XNNPACK delegate for CPU. -Total run time over 20 iterations: 2029.25 ms +Total run time over 20 iterations: 2030.5525 ms ``` -Re-run the second benchmark: +Re-run the Low Light Enhancment benchmark: ```bash -bin/low_light_image_enhancement_benchmark 20 resources/HDRNetLIME_lr_coeffs_v1_1_0_mixed_low_light_perceptual_l2_loss_int8_only_ptq.tflite +bin/low_light_image_enhancement_benchmark 20 resources/HDRNetLIME_lr_coeffs_v1_1_0_mixed_low_light_perceptual_l1_loss_float32.tflite ``` The new output is similar to: ```output INFO: Created TensorFlow Lite XNNPACK delegate for CPU. -Total run time over 20 iterations: 79.431 ms +Total run time over 20 iterations: 58.0613 ms ``` -### Comparison table +Re-run the Neural Denoising benchmark: -| Benchmark | Without KleidiCV+KleidiAI | With KleidiCV+KleidiAI | -|-------------------------------------------|---------------------------|------------------------| -| `cinematic_mode_benchmark` | 2029.25 ms | 2023.39 ms | -| `low_light_image_enhancement_benchmark` | 79.431 ms | 54.3546 ms | - -As shown, the background blur pipeline (`cinematic_mode_benchmark`) gains only a small improvement, while the low-light enhancement pipeline sees a significant ~30% performance uplift when KleidiCV and KleidiAI are enabled. - -## Future performance uplift with SME2 +```bash +bin/neural_denoiser_temporal_benchmark_4K 20 +``` -A major benefit of using KleidiCV and KleidiAI is that they can automatically leverage new Arm architecture features - such as SME2 (Scalable Matrix Extension v2) - without requiring changes to your application code. +The new output is similar to: -As KleidiCV and KleidiAI operate as performance abstraction layers, any future hardware instruction support can be utilized by simply rebuilding the application. This enables better performance on newer processors without additional engineering effort. +```output +Total run time over 20 iterations: 38.0813 ms +``` +### Comparison table and future performance uplift with SME2 +| Benchmark | Without KleidiCV+KleidiAI | With KleidiCV+KleidiAI | +|-------------------------------------------|---------------------------|------------------------| +| `cinematic_mode_benchmark` | 2030.5525 ms | 2028.745 ms (-0.09%) | +| `low_light_image_enhancement_benchmark` | 58.0613 ms | 58.2126 ms (0.26%) | +| `neural_denoiser_temporal_benchmark_4K` | 38.0813 ms | 37.6839 ms (-1.04%) | + +As shown, the Background Blur (`cinematic_mode_benchmark`) and Neural Denoising +pipelines gains only a minor improvement, while the low-light enhancement pipeline +sees a minor performance degradation (0.26%) when KleidiCV and KleidiAI are +enabled. + +A major benefit of using KleidiCV and KleidiAI though is that they can +automatically leverage new Arm architecture features - such as SME2 (Scalable +Matrix Extension v2) - without requiring changes to your application code. + +As KleidiCV and KleidiAI operate as performance abstraction layers, any future +hardware instruction support can be utilized by simply rebuilding the +application. This enables better performance on newer processors without +additional engineering effort. diff --git a/content/learning-paths/mobile-graphics-and-gaming/ai-camera-pipelines/_index.md b/content/learning-paths/mobile-graphics-and-gaming/ai-camera-pipelines/_index.md index 1b39ea65e5..40022b42d5 100644 --- a/content/learning-paths/mobile-graphics-and-gaming/ai-camera-pipelines/_index.md +++ b/content/learning-paths/mobile-graphics-and-gaming/ai-camera-pipelines/_index.md @@ -1,5 +1,5 @@ --- -title: Accelerate Background Blur and Low-Light Camera Effects +title: Accelerate Denoising, Background Blur and Low-Light Camera Effects minutes_to_complete: 30 diff --git a/content/learning-paths/mobile-graphics-and-gaming/ai-camera-pipelines/denoising_input_0010.png b/content/learning-paths/mobile-graphics-and-gaming/ai-camera-pipelines/denoising_input_0010.png new file mode 100644 index 0000000000..c4484ac60f Binary files /dev/null and b/content/learning-paths/mobile-graphics-and-gaming/ai-camera-pipelines/denoising_input_0010.png differ diff --git a/content/learning-paths/mobile-graphics-and-gaming/ai-camera-pipelines/denoising_output_0010.png b/content/learning-paths/mobile-graphics-and-gaming/ai-camera-pipelines/denoising_output_0010.png new file mode 100644 index 0000000000..30bea1096c Binary files /dev/null and b/content/learning-paths/mobile-graphics-and-gaming/ai-camera-pipelines/denoising_output_0010.png differ diff --git a/content/learning-paths/mobile-graphics-and-gaming/ai-camera-pipelines/denoising_pipeline.png b/content/learning-paths/mobile-graphics-and-gaming/ai-camera-pipelines/denoising_pipeline.png new file mode 100644 index 0000000000..906ae9b625 Binary files /dev/null and b/content/learning-paths/mobile-graphics-and-gaming/ai-camera-pipelines/denoising_pipeline.png differ