@@ -35,31 +35,31 @@ The IRON Python API for Ryzen™ AI NPUs is described in the following paper:
3535
3636| Section | Description | Datatype | AIE2 | AIE2P | Status | Design Example |
3737| :--------| :------------| :---------| :-----| :------| :-------| :-------------|
38- | [ Element-wise Add] ( ./aie_kernels/generic/add.cc ) | Element-wise addition kernel | bfloat16 | ✓ | ✓ | 🟢 | [ operators/elementwise_add/] ( ./operators/elementwise_add/ ) |
39- | [ Element-wise Mul] ( ./aie_kernels/generic/mul.cc ) | Element-wise multiplication kernel | bfloat16 | ✓ | ✓ | 🟢 | [ operators/elementwise_mul/] ( ./operators/elementwise_mul/ ) |
40- | [ GEMM] ( ./aie_kernels/aie2p/mm.cc ) | General Matrix Multiplication kernel | bfloat16 | ✓ | ✓ | 🟢 | [ operators/gemm/] ( ./operators/gemm/ ) |
41- | [ GEMV] ( ./aie_kernels/generic/mv.cc ) | General Matrix-Vector Multiplication kernel | bfloat16 | ✓ | ✓ | 🟢 | [ operators/gemv/] ( ./operators/gemv/ ) |
42- | [ GQA] ( ./aie_kernels/aie2p/mha.cc ) | Grouped Query Attention kernel (Single pipeline) | bfloat16 | | ✓ | 🟢 | [ operators/mha/] ( ./operators/mha/ ) |
43- | [ MHA] ( ./aie_kernels/aie2p/mha.cc ) | Multi-Head Attention kernel & Grouped Query Attention | bfloat16 | | ✓ | 🟢 | [ operators/mha/] ( ./operators/mha/ ) |
44- | [ RMSNorm] ( ./aie_kernels/aie2/rms_norm.cc ) | RMSNorm kernel | bfloat16 | ✓ | ✓ | 🟢 | [ operators/rms_norm/] ( ./operators/rms_norm/ ) |
45- | [ RoPE] ( ./aie_kernels/generic/rope.cc ) | Rotary Positional Embedding kernel | bfloat16 | ✓ | ✓ | 🟢 | [ operators/rope/] ( ./operators/rope/ ) |
46- | [ SiLU] ( ./aie_kernels/aie2/silu.cc ) | Sigmoid Linear Unit activation kernel | bfloat16 | ✓ | ✓ | 🟢 | [ operators/silu/] ( ./operators/silu/ ) |
47- | [ Softmax] ( ./aie_kernels/aie2/softmax.cc ) | Softmax kernel | bfloat16 | ✓ | ✓ | 🟢 | [ operators/softmax/] ( ./operators/softmax/ ) |
48- | [ Weighted RMSNorm] ( ./aie_kernels/aie2/rms_norm.cc ) | Weighted RMSNorm kernel | bfloat16 | ✓ | ✓ | 🟢 | [ operators/rms_norm/] ( ./operators/rms_norm/ ) |
49- | [ Copy] ( ./aie_kernels/generic/passThrough.cc ) | Copy | bfloat16 | ✓ | ✓ | 🟢 | [ operators/mem_copy/] ( ./operators/mem_copy/ ) |
50- | [ Transpose] ( ./aie_kernels/generic/transpose.cc ) | Transpose | bfloat16 | ✓ | ✓ | 🟢 | [ operators/transpose/] ( ./operators/transpose/ ) |
51- | [ AXPY] ( ./aie_kernels/generic/axpy.cc ) | AXPY | bfloat16 | ✓ | ✓ | 🟢 | [ operators/axpy/] ( ./operators/axpy/ ) |
38+ | [ Element-wise Add] ( ./aie_kernels/generic/add.cc ) | Element-wise addition kernel | bfloat16 | ✓ | ✓ | 🟢 | [ iron/ operators/elementwise_add/] ( ./iron /operators/elementwise_add/ ) |
39+ | [ Element-wise Mul] ( ./aie_kernels/generic/mul.cc ) | Element-wise multiplication kernel | bfloat16 | ✓ | ✓ | 🟢 | [ iron/ operators/elementwise_mul/] ( ./iron /operators/elementwise_mul/ ) |
40+ | [ GEMM] ( ./aie_kernels/aie2p/mm.cc ) | General Matrix Multiplication kernel | bfloat16 | ✓ | ✓ | 🟢 | [ iron/ operators/gemm/] ( ./iron /operators/gemm/ ) |
41+ | [ GEMV] ( ./aie_kernels/generic/mv.cc ) | General Matrix-Vector Multiplication kernel | bfloat16 | ✓ | ✓ | 🟢 | [ iron/ operators/gemv/] ( ./iron /operators/gemv/ ) |
42+ | [ GQA] ( ./aie_kernels/aie2p/mha.cc ) | Grouped Query Attention kernel (Single pipeline) | bfloat16 | | ✓ | 🟢 | [ iron/ operators/mha/] ( ./iron /operators/mha/ ) |
43+ | [ MHA] ( ./aie_kernels/aie2p/mha.cc ) | Multi-Head Attention kernel & Grouped Query Attention | bfloat16 | | ✓ | 🟢 | [ iron/ operators/mha/] ( ./iron /operators/mha/ ) |
44+ | [ RMSNorm] ( ./aie_kernels/aie2/rms_norm.cc ) | RMSNorm kernel | bfloat16 | ✓ | ✓ | 🟢 | [ iron/ operators/rms_norm/] ( ./iron /operators/rms_norm/ ) |
45+ | [ RoPE] ( ./aie_kernels/generic/rope.cc ) | Rotary Positional Embedding kernel | bfloat16 | ✓ | ✓ | 🟢 | [ iron/ operators/rope/] ( ./iron /operators/rope/ ) |
46+ | [ SiLU] ( ./aie_kernels/aie2/silu.cc ) | Sigmoid Linear Unit activation kernel | bfloat16 | ✓ | ✓ | 🟢 | [ iron/ operators/silu/] ( ./iron /operators/silu/ ) |
47+ | [ Softmax] ( ./aie_kernels/aie2/softmax.cc ) | Softmax kernel | bfloat16 | ✓ | ✓ | 🟢 | [ iron/ operators/softmax/] ( ./iron /operators/softmax/ ) |
48+ | [ Weighted RMSNorm] ( ./aie_kernels/aie2/rms_norm.cc ) | Weighted RMSNorm kernel | bfloat16 | ✓ | ✓ | 🟢 | [ iron/ operators/rms_norm/] ( ./iron /operators/rms_norm/ ) |
49+ | [ Copy] ( ./aie_kernels/generic/passThrough.cc ) | Copy | bfloat16 | ✓ | ✓ | 🟢 | [ iron/ operators/mem_copy/] ( ./iron /operators/mem_copy/ ) |
50+ | [ Transpose] ( ./aie_kernels/generic/transpose.cc ) | Transpose | bfloat16 | ✓ | ✓ | 🟢 | [ iron/ operators/transpose/] ( ./iron /operators/transpose/ ) |
51+ | [ AXPY] ( ./aie_kernels/generic/axpy.cc ) | AXPY | bfloat16 | ✓ | ✓ | 🟢 | [ iron/ operators/axpy/] ( ./iron /operators/axpy/ ) |
5252| [ Reduction] ( ) | Reduction | bfloat16 | | | 🟡 | |
53- | [ Dequant] ( ./aie_kernels/generic/expand.cc ) | Dequant Q4NX from [ AWQ] ( https://github.com/mit-han-lab/llm-awq ) to bfloat16 | bfloat16 | ✓ | ✓ | 🟢 | [ operators/dequant/] ( ./operators/dequant/ ) |
54- | [ RELU] ( ./aie_kernels/aie2/relu.cc ) | RELU | bfloat16 | ✓ | ✓ | 🟢 | [ operators/relu/] ( ./operators/relu/ ) |
55- | [ Leaky RELU] ( ./aie_kernels/aie2p/leaky_relu.cc ) (WIP) | Leaky RELU kernel | bfloat16 | | ✓ | ⚪ | [ operators/leaky_relu/] ( ./operators/leaky_relu/ ) |
56- | [ GELU] ( ./aie_kernels/aie2/gelu.cc ) | GELU | bfloat16 | ✓ | ✓ | 🟢 | [ operators/gelu/] ( ./operators/gelu/ ) |
57- | [ LayerNorm] ( ./aie_kernels/aie2/layer_norm.cc ) | LayerNorm | bfloat16 | ✓ | ✓ | 🟢 | [ operators/layer_norm/] ( ./operators/layer_norm/ ) |
53+ | [ Dequant] ( ./aie_kernels/generic/expand.cc ) | Dequant Q4NX from [ AWQ] ( https://github.com/mit-han-lab/llm-awq ) to bfloat16 | bfloat16 | ✓ | ✓ | 🟢 | [ iron/ operators/dequant/] ( ./iron /operators/dequant/ ) |
54+ | [ RELU] ( ./aie_kernels/aie2/relu.cc ) | RELU | bfloat16 | ✓ | ✓ | 🟢 | [ iron/ operators/relu/] ( ./iron /operators/relu/ ) |
55+ | [ Leaky RELU] ( ./aie_kernels/aie2p/leaky_relu.cc ) (WIP) | Leaky RELU kernel | bfloat16 | | ✓ | ⚪ | [ iron/ operators/leaky_relu/] ( ./iron /operators/leaky_relu/ ) |
56+ | [ GELU] ( ./aie_kernels/aie2/gelu.cc ) | GELU | bfloat16 | ✓ | ✓ | 🟢 | [ iron/ operators/gelu/] ( ./iron /operators/gelu/ ) |
57+ | [ LayerNorm] ( ./aie_kernels/aie2/layer_norm.cc ) | LayerNorm | bfloat16 | ✓ | ✓ | 🟢 | [ iron/ operators/layer_norm/] ( ./iron /operators/layer_norm/ ) |
5858| [ Convolution] ( ) | Convolution | bfloat16 | | | 🟡 | |
5959| [ MaxPool] ( ) | MaxPool | bfloat16 | | | ⚪ | |
6060| [ AveragePool] ( ) | AveragePool | bfloat16 | | | ⚪ | |
61- | [ Tanh] ( ./aie_kernels/aie2/tanh.cc ) | Tanh kernel | bfloat16 | ✓ | ✓ | 🟢 | [ operators/tanh/] ( ./operators/tanh/ ) |
62- | [ Sigmoid] ( ./aie_kernels/aie2/sigmoid.cc ) | Sigmoid kernel | bfloat16 | ✓ | ✓ | 🟢 | [ operators/sigmoid/] ( ./operators/sigmoid/ ) |
61+ | [ Tanh] ( ./aie_kernels/aie2/tanh.cc ) | Tanh kernel | bfloat16 | ✓ | ✓ | 🟢 | [ iron/ operators/tanh/] ( ./iron /operators/tanh/ ) |
62+ | [ Sigmoid] ( ./aie_kernels/aie2/sigmoid.cc ) | Sigmoid kernel | bfloat16 | ✓ | ✓ | 🟢 | [ iron/ operators/sigmoid/] ( ./iron /operators/sigmoid/ ) |
6363
6464> Use this dashboard to quickly check the status of each kernel and locate relevant setup, build, and usage information.
6565
@@ -114,17 +114,17 @@ If starting from `Ubuntu 24.04` you may need to update the Linux kernel to 6.11+
114114
1151151. Install required Python packages (from requirements.txt):
116116 ` ` ` bash
117- MLIR_PYTHON_EXTRAS_SET_VERSION= " 0.0.8.3 " HOST_MLIR_PYTHON_PACKAGE_PREFIX= " aie " pip install -r requirements.txt
117+ pip install -r requirements.txt
118118 ` ` `
119119
1201201. To test your installation, you can try to build and run the example below:
121121 ` ` ` bash
122- ./operators/axpy/test.py
122+ ./iron/ operators/axpy/test.py
123123 ` ` `
124124
125125# ## Building/Using & Testing Operators
126126
127- All available operators can be found in ` operators` . These each contain:
127+ All available operators can be found in ` iron/ operators` . These each contain:
128128
129129* ` op.py` : The Python operator interface -- an easy access point to integrate operators into your project that prescribes how to compile the operator (build artifacts) and how to call it at runtime (buffer sizes, etc.)
130130* ` design.py` : The implementation of the operator' s NPU code. Often references a kernel in `aie_kernels` for the compute core code and describes the data movement using ObjectFIFOs.
@@ -137,17 +137,17 @@ All available operators can be found in `operators`. These each contain:
137137
138138To build and test all the operators:
139139``` bash
140- pytest operators/ -m "not extensive"
140+ pytest iron/ operators/ -m "not extensive"
141141```
142142
143143To run the extensive test suite:
144144``` bash
145- pytest operators/
145+ pytest iron/ operators/
146146```
147147
148148To run a specific operator' s tests:
149149` ` ` bash
150- pytest operators/axpy/
150+ pytest iron/ operators/axpy/
151151` ` `
152152
153153# ## Git Hooks (Optional but Recommended)
0 commit comments