Skip to content

amd/IRON

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

20 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

🦾 - IRON: Unlocking the Full Potential of NPUs - 🦾

Discord Latest Release GitHub downloads Iron Tests PRs Welcome license: Apache Code style: black

IRONCLAD Logo

IRON is an open-source & close-to-metal Python API enabling fast and efficient execution on AMD Ryzenβ„’ AI NPUs. It relies on language bindings around the MLIR-AIE dialect.

The IRON Python API for Ryzenβ„’ AI NPUs is described in the following paper:

E. Hunhoff, J. Melber, K. Denolf, A. Bisca, S. Bayliss, S. Neuendorffer, J. Fifield, J. Lo, P. Vasireddy, P. James-Roxby, E. Keller. "Efficiency, Expressivity, and Extensibility in a Close-to-Metal NPU Programming Interface". In 33rd IEEE International Symposium On Field-Programmable Custom Computing Machines, May 2025.

🎯 Operator Dashboard

Section Description Datatype AIE2 AIE2P Status Design Example
Element-wise Add Element-wise addition kernel bfloat16 βœ“ βœ“ 🟒 example/elementwise_add/
Element-wise Mul Element-wise multiplication kernel bfloat16 βœ“ βœ“ 🟒 example/elementwise_mul/
GEMM General Matrix Multiplication kernel bfloat16 βœ“ βœ“ 🟒 example/gemm/
GEMV General Matrix-Vector Multiplication kernel bfloat16 βœ“ βœ“ 🟒 example/matrix_vector_mul/
GQA Grouped Query Attention kernel (Single pipeline) bfloat16 βœ“ 🟒 example/mha/
MHA Multi-Head Attention kernel & Grouped Query Attention bfloat16 βœ“ 🟒 example/mha/
RMSNorm RMSNorm kernel bfloat16 βœ“ βœ“ 🟒 example/rms_norm/
RoPE Rotary Positional Embedding kernel bfloat16 βœ“ βœ“ 🟒 example/rope/
SiLU Sigmoid Linear Unit activation kernel bfloat16 βœ“ βœ“ 🟒 example/silu/
Softmax Softmax kernel bfloat16 βœ“ βœ“ 🟒 example/softmax/
Weighted RMSNorm Weighted RMSNorm kernel bfloat16 βœ“ βœ“ 🟒 example/rms_norm/
Copy Copy bfloat16 βœ“ βœ“ 🟒 example/mem_copy/
Transpose Transpose bfloat16 βœ“ βœ“ 🟒 example/transpose/
AXPY AXPY bfloat16 βœ“ βœ“ 🟒 example/axpy/
Reduction Reduction bfloat16 🟑
Dequant Dequant Q4NX from AWQ to bfloat16 bfloat16 βœ“ βœ“ 🟒 example/dequant/
RELU RELU bfloat16 βœ“ βœ“ 🟒 example/relu/
Leaky RELU (WIP) Leaky RELU kernel bfloat16 βœ“ βšͺ example/leaky_relu/
GELU GELU bfloat16 βœ“ βœ“ 🟒 example/gelu/
LayerNorm LayerNorm bfloat16 βœ“ βœ“ 🟒 example/layer_norm/
Convolution Convolution bfloat16 🟑
MaxPool MaxPool bfloat16 βšͺ
AveragePool AveragePool bfloat16 βšͺ
Tanh Tanh kernel bfloat16 βœ“ βœ“ 🟒 example/tanh/
Sigmoid Sigmoid kernel bfloat16 βœ“ βœ“ 🟒 example/sigmoid/

Use this dashboard to quickly check the status of each kernel and locate relevant setup, build, and usage information.

πŸ“Œ Legend

Status Meaning
🟒 Done
🟑 In Development
βšͺ Not Assigned

Installation (Linux)

These instructions will guide you through everything required for building and executing a program on the Ryzenβ„’ AI NPU, starting from a fresh bare-bones Ubuntu 24.04 or Ubuntu 24.10 install.

Initial Setup

Be sure you have the latest BIOS on your laptop or mini-PC that enables the NPU. See here.

If starting from Ubuntu 24.04 you may need to update the Linux kernel to 6.11+ by installing the Hardware Enablement (HWE) stack:

sudo apt update
sudo apt install --install-recommends linux-generic-hwe-24.04
sudo reboot
  1. Install XDNAβ„’ Driver and XRT:

    Instructions from mlir-aie repository

  2. Install the packages needed for IRON and MLIR-AIE:

    # Python versions 3.10, 3.12 and 3.13 are currently supported by our wheels
    sudo apt install \
    build-essential clang clang-14 lld lld-14 cmake ninja-build python3-venv python3-pip
  3. Setup a virtual environment and activate it:

    python3 -m venv ironenv
    source ironenv/bin/activate
    python3 -m pip install --upgrade pip
  4. Source XRT (installed in step 1):

    source /opt/xilinx/xrt/setup.sh
  5. Install required Python packages (from requirements.txt):

    MLIR_PYTHON_EXTRAS_SET_VERSION="0.0.8.3" HOST_MLIR_PYTHON_PACKAGE_PREFIX="aie" pip install -r requirements.txt
  6. To test your installation, you can try to build and run the example below:

    cmake -B build
    cmake --build build --target silu_1_cols_1_channels_2048_tile_2048_run

Note: On a fresh install, if you get CMake Error: Could not find CMAKE_ROOT !!!, just deactivate and reactivate your python environment.

Building & Testing

NOTE: Be sure the XRT setup script has been sourced: source /opt/xilinx/xrt/setup.sh

IRON is a CMake-based project. To configure the project, run:

cmake -B build

Note: By default, the project is built for AIE2P. To build for AIE2, set the target using: cmake -B build -DIRONCLAD_AIE_TARGET=aie2

To build all designs, use:

cmake --build build

To test all the designs, use the following python script:

./scripts/run_tests.py --iter 1

You can select a single test to run using the --select flag.

Targets are listed when running cmake -B build with the following syntax:

Registering Executable: <TARGET_NAME>

If you want to build only a specific design, run:

# Example: cmake --build build --target silu_4_cols_1_channels_2048_tile_512
cmake --build build --target <TARGET_NAME>

You can also test an individual (or a selection of multiple) test(s) using the same script:

./scripts/run_tests.py --select <TARGET_ONE> --select <TARGET_TWO>

Additionally a target to build & run is made available under the <TARGET_NAME>_run symbol.

cmake --build build --target silu_4_cols_1_channels_2048_tile_512_run

Git Hooks (Optional but Recommended)

To ensure your code passes CI linting checks before pushing, install the pre-push hook:

cp scripts/hooks/pre-push .git/hooks/pre-push
chmod +x .git/hooks/pre-push

The hook will run the same linting checks as CI:

  • License checks (reuse)
  • Python formatting (black)
  • C++ formatting (clang-format)

To bypass the hook if needed: git push --no-verify


CopyrightΒ© 2025 Advanced Micro Devices, Inc

About

Close-to-metal programming for AMD NPUs

Resources

License

Code of conduct

Contributing

Stars

Watchers

Forks

Packages

No packages published

Contributors 6