LocalAI supported firmware analysis

EMBA is supporting AI services since 2023. Benedikt introduced the Q02 module in PR602. While this was a big step in 2023 and we enabled AI enhanced firmware analysis quite early for everyone, we were not completely satisfied with the complete situation. Back in time there was some GPT rate-limitation in place, and the free account was not allowed to fully use the OpenAI API. While we had multiple countermeasures in place, the situation was getting worse as the free account was further stripped down and could not use the API in any useful way after a while. This resulted in a mostly unused and unmaintained EMBA module. Especially as we usually deal with systems in critical environments, we are not authorized to use cloud and online services for any kind of vulnerability analysis.

Fast forward to PR1988 …

We switched the complete AI approach in EMBA to a self-hosted AI infrastructure. This means for you that you onle need some medium-sized GPU, and you are ready to go. The following documentation will guide you through a local setup on a system with a Nvidia GeForce RTX 4070 GPU with 8Gigabyte of dedicated GPU memory. This is not the ideal setup, but it works for basic tasks and a well-working integration into EMBA. In our setup we have EMBA already installed and running in a virtual machine in VMWare Workstation. From a networking perspective, the system is using the VMWare NAT mechanism. No further special needs for the EMBA installation right now.

Next, we are going to set up an Ubuntu machine on our local WSL environment of our Windows host:

PS C:\Windows\System32> wsl --install
Downloading: Windows Subsystem for Linux 2.6.3
Installing: Windows Subsystem for Linux 2.6.3
Windows Subsystem for Linux 2.6.3 has been installed.
Installing Windows optional component: VirtualMachinePlatform
 
Deployment Image Servicing and Management tool
Version: 10.0.26100.5074
 
Image Version: 10.0.26100.8037
 
Enabling feature(s)
[==========================100.0%==========================]
The operation completed successfully.
The requested operation is successful. Changes will not be effective until the system is rebooted.

Reboot your system and start a powershell again to get access to the available Linux systems:

PS C:\Windows\System32> wsl --list --online
The following is a list of valid distributions that can be installed.
Install using 'wsl.exe --install <Distro>'.
 
NAME                            FRIENDLY NAME
Ubuntu                          Ubuntu
Ubuntu-24.04                    Ubuntu 24.04 LTS
Ubuntu-22.04                    Ubuntu 22.04 LTS
Ubuntu-20.04                    Ubuntu 20.04 LTS

The next step installs the Ubuntu Linux which will be used for the LocalAI environment:

PS C:\Users\asdf> wsl --install Ubuntu-24.04
Downloading: Ubuntu 24.04 LTS
Installing: Ubuntu 24.04 LTS
Distribution successfully installed. It can be launched via 'wsl.exe -d Ubuntu-24.04'
Launching Ubuntu-24.04...
Provisioning the new WSL instance Ubuntu-24.04
This might take a while...
Create a default Unix user account: m1k3
New password:
Retype new password:
passwd: password updated successfully
To run a command as administrator (user "root"), use "sudo <command>".

With some further commands on your Windows terminal, it is possible to get some basic information of your new installed Linux system:

PS C:\Users\asdf> wsl --list --verbose
  NAME            STATE           VERSION
* Ubuntu-24.04    Running         2

Everything is now up and running and we can connect to our new Linux system now via CLI (ensure you are using your own user):

PS C:\Users\asdf> wsl -u m1k3

Install and start up the docker service in your new Linux:

m1k3@asdf:/mnt/c/Users/asdf$ curl https://get.docker.com/ | sh
m1k3@asdf:/mnt/c/Users/asdf$ sudo service docker status
m1k3@asdf:/mnt/c/Users/asdf$ sudo service docker start

Next, proceed to install all the Nvidia environment – the original documentation can be found here:

sudo apt-get update && sudo apt-get install -y --no-install-recommends    ca-certificates curl gnupg2
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg   && curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list |     sed 's#deb [https://#deb](https://github.com/e-m-b-a/emba/wiki/LocalAI-supported-firmware-analysis/_edit#deb) [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] [https://#g'](https://github.com/e-m-b-a/emba/wiki/LocalAI-supported-firmware-analysis/_edit#g') |     sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
sudo apt-get update
export NVIDIA_CONTAINER_TOOLKIT_VERSION=1.19.0-1
sudo apt-get install -y \
      nvidia-container-toolkit=${NVIDIA_CONTAINER_TOOLKIT_VERSION} \
      nvidia-container-toolkit-base=${NVIDIA_CONTAINER_TOOLKIT_VERSION} \
      libnvidia-container-tools=${NVIDIA_CONTAINER_TOOLKIT_VERSION} \
      libnvidia-container1=${NVIDIA_CONTAINER_TOOLKIT_VERSION}

With the above steps you should now be able to configure docker and get a first impression of your graphic card in your new Linux - see here:

m1k3@asdf:/mnt/c/Users/z0038ssy$ sudo nvidia-ctk runtime configure --runtime=docker
INFO[0000] Config file does not exist; using empty config
INFO[0000] Wrote updated config to /etc/docker/daemon.json
INFO[0000] It is recommended that docker daemon be restarted.

m1k3@asdf:/mnt/c/Users/z0038ssy$ sudo systemctl restart docker
m1k3@asdf:/mnt/c/Users/z0038ssy$ sudo docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi

sudo docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-gpu-nvidia-cuda-13

As soon as the container is up and running it is possible to access the LocalAI web interface on localhost on port 8080:

As we do not have any module installed already, we can start with importing a model. For a first test we suggest using the following model: “huggingface://Qwen/Qwen2.5-Coder-7B-Instruct-GGUF”. This model works well on a small performance system like ours and it is optimized for code analysis tasks. Further details can be found here

After importing the model, it can be used for a simple conversation via the chat interface. In the Windows task manager, it can be seen that the memory of the graphic card is now fully used. As our current environment is using Shared memory we need further optimizations:

As the model needs too much memory for our system, we need to tweak it a bit. This can be done in “System -> Models -> Edit configuration”

Adjust the yaml config the following way:

parameters:
    model: llama-cpp/models/Qwen2.5-Coder-7B-Instruct-GGUF/qwen2.5-coder-7b-instruct-q4_k_m-00001-of-00002.gguf
    # --- NEU: LLAMA.CPP OPTIMIERUNGEN ---
    # Aktiviert Flash Attention (spart VRAM & beschleunigt lange Kontexte)
    flash_attention: true
    
    # KV-Cache Quantisierung (halbiert den VRAM-Bedarf für den Kontext)
    cache_type_k: "q4_0"
    cache_type_v: "q4_0"
    
    # GPU-Einstellung: Alle Layer auf die GPU (RTX 4070)
    gpu_layers: 33 
    # ------------------------------------
    min_p: 0.1
    repeat_penalty: 1
    temperature: 0.7  # Empfehlung: Für Code-Analyse niedriger (Original war 1.5)
    top_k: 40         # Ein fester Wert ist oft stabiler als -1
    top_p: 0.95
repeat_penalty: 1
temperature: 1.5
template:
    use_tokenizer_template: true
top_k: -1
top_p: 0.95
context_size: 24000
f16_kv: false       # Spart VRAM bei der Key-Value-Cache Speicherung

Afterwards we can reload the model and perform some simple tests. This time no shared memory is used anymore:

With these modifications in place we can go ahead to our Kali linux with the EMBA installation: For some initial tests of the LocalAI API you can follow the next steps:

sudo apt-get install jq
LOCALAI_SERVER_IP="192.168.111.1"

If you have a similar setup as on my testing machine, your LocalAI environment is available on localhost and via the local network interface of the virtual machine (use ipconfig on your host):

First, check the environment that is available for you with requesting the system endpoint:

└─$ curl http://${LOCALAI_SERVER_IP}:8080/system | jq .
{"backends":["cuda13-llama-cpp","llama-cpp"],"loaded_models":[{"id":"Qwen2.5-Coder-7B-Instruct-GGUF"}]}

With the v1/models endpoint it is possible to request the available models:

└─$ curl http://${LOCALAI_SERVER_IP}:8080/v1/models
{"object":"list","data":[{"id":"Qwen2.5-Coder-7B-Instruct-GGUF","object":"model"}]}

And finally say hello to your LocalAI:

└─$ curl http://${LOCALAI_SERVER_IP}:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "'"$MODEL_LOCALAI"'",
    "messages": [{"role": "user", "content": "Hello!"}]
  }' | jq -r .choices[].message.content
Hello! How can I help you today?

Or do another quick test with multiple streamed requests:

curl http://${LOCALAI_SERVER_IP}:8080/v1/chat/completions   -H "Content-Type: application/json"   -d '{"model": "Qwen2.5-Coder-7B-Instruct-GGUF", "messages": [{"role": "user", "content": "Hello"}], "stream": true}'

As the LocalAI API is now available and ready to use, the next step is to ensure EMBA knows about it. For this EMBA includes the following configuration template in your EMBA installation config/ai_config.env.template

# Copy this template to config/ai_config.env and adjust the following variables:
LOCAL_AI_IP="192.168.111.1"
# EMBA checks if this model is available for chatting
# if this model is not available EMBA can't proceed
LOCAL_AI_MODEL="Qwen2.5-Coder-7B-Instruct-GGUF"
# if EMBA will finish earlier with the testing phase she will wait for Q03 until AI requests are done or the timeperiod is over
AI_MIN_RUNTIME="12h"
# maximal code size for AI analysis
AI_MAX_CHARS_TO_ANALYSE=5000

For a first run the LOCAL_AI_IP needs to be adjusted. Additionally, EMBA needs to know which model to use via the LOCAL_AI_MODEL parameter. Finally, we have the AI_MIN_RUNTIME variable which ensures that the testing is running for at least 12h and we collect as much data from the AI as possible in this timeframe. This means that if the EMBA run will be finished earlier we request further details of the firmware from the LocalAI which results in more than 12h scanning runtime. This is a quite nice feature for tests that are running over the weekend or overnight. Just adjust this parameter to the needed runtime. If you do not need that many AI responses, you can also adjust this parameter to 30m or 1h or something like this. Adjust the parameters and copy the template to config/ai_config.env. Afterwards, it is possible to enable the enhanced AI scanning mechanism in your scanning profile with the parameter export AI_OPTION=3. In the following AI ready profiles this is already enabled by default:

scan-profiles/default-scan-AI.emba
scan-profiles/quick-scan-AI.emba

After a firmware test it is possible to access the AI results via the EMBA web reporter. The main dashboard has now an entry to the AI results:

From there we can proceed our analysis in the Q03 module environment:

The modules that are included into AI analysis are linked to the AI results:

The details of the AI analysis are structured the following:

EMBA - firmware security scanning at its best

Sponsor EMBA and EMBArk:

The EMBA environment is free and open source!

We put a lot of time and energy into these tools and related research to make this happen. It's now possible for you to contribute as a sponsor!

If you like EMBA you have the chance to support future development by becoming a Sponsor

Thank You ❤️ Get a Sponsor
You can also buy us some beer here ❤️ Buy me a coffee

To show your love for EMBA with nice shirts or other merch you can check our Spreadshop

EMBA - firmware security scanning at its best

Home
- Motivation
- Publications
- Referring sites and talks
  - 2026
  - 2025
  - 2024
  - 2023
  - 2022 and before
- Your Feedback
- Star history
- Lines of Code history
The EMBA book
Feature overview
- OS-Support
- SBOM support
- User-mode emulator
- System emulation
- UEFI analysis
- LocalAI supported firmware-analysis
  - EMBA AI integration - outdated
- Firmware diffing
- Vulnerability detection
- Linux Kernel vulnerability verification
- Interactive-dependency-map
- Toolchain identification
- Dependency Track integration
- Aggregator
- Web report
Installation
- Prerequisites
- Classic
- Developer
- Ubuntu installation notes
- EMBA on ARM64 based MacOS
- Experimental WSL installation
- Dependencies
- System tools
- EMBA update
Usage
- Classic
- Launcher
- Developer
- Docker
- Arguments
- Tweaking EMBA
- Live system
Development
- Structure
- Modules
- Code quality
- Rebuilding system-emulator environment
Sponsoring EMBA
EMBA Merchandise
FAQ
EMBArk enterprise environment

Uh oh!

LocalAI supported firmware analysis

Fast forward to PR1988 …

Adjust the yaml config the following way:

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally