-
Notifications
You must be signed in to change notification settings - Fork 5
Examples and Tutorials
Nallani Bhaskar edited this page Jun 15, 2026
·
4 revisions
AOCL-DLP ships with example programs in the examples/classic/ directory. Build them with:
cd aocl-dlp
mkdir build && cd build
cmake -DBUILD_EXAMPLES=ON ..
make -j$(nproc)Compiled examples are in build/examples/classic/.
| Example | Description | Key concepts |
|---|---|---|
simple_gemm_f32.c |
Float32 matrix multiplication | Basic GEMM call, row-major layout |
simple_gemm_bf16.c |
BFloat16 GEMM | BF16 input type, f32 accumulation |
simple_gemm_s8.c |
Signed int8 GEMM | Integer quantized GEMM |
| Example | Description | Key concepts |
|---|---|---|
simple_gemm_bf16s8.c |
BF16 activations with int8 weights | Mixed-precision, on-the-fly quantization |
simple_gemm_f32s8.c |
F32 activations with int8 weights | Mixed-precision quantized inference |
| Example | Description | Key concepts |
|---|---|---|
simple_gemm_with_bias.c |
GEMM with fused bias addition |
dlp_metadata_t, BIAS post-op |
simple_gemm_with_relu.c |
GEMM with fused ReLU activation | ELTWISE post-op, RELU |
simple_gemm_with_mish.c |
F32 GEMM with fused Mish activation |
aocl_gemm_f32f32f32of32, ELTWISE post-op, MISH algo_type |
post_ops_combinations.c |
Multiple chained post-operations | Chaining BIAS + ELTWISE, seq_vector |
| Example | Description | Key concepts |
|---|---|---|
quantization.c |
Symmetric quantization workflow |
DLP_SYMM_STAT_QUANT, sym_quant APIs |
simple_gemm_s8_sym_quant.c |
s8 x s8 -> f32 GEMM with symmetric static quantization |
aocl_gemm_s8s8s32of32_sym_quant, post_op_grp scales, group_size |
simple_gemm_per_token_quant.c |
W8A8 s8 x s8 GEMM with per-token (PerM) A dequant, incl. n=1 decoder path |
aocl_gemm_s8s8s32of32, SCALE post-op, DLP_PARAM_DIM_PER_TOKEN
|
simple_gemm_bf16s4.c |
BF16 activations x s4 weights, symmetric weight-only quantization (WOQ) |
aocl_gemm_bf16s4f32of32, aocl_reorder_bf16s4f32of32, pre_ops->b_scl
|
simple_gemm_bf16u4.c |
BF16 activations x u4 weights, asymmetric WOQ with B zero-point |
aocl_gemm_bf16u4f32of32, pre_ops b_scl + b_zp |
| Example | Description | Key concepts |
|---|---|---|
batch_gemm.c |
Batch GEMM for multiple matrices |
aocl_batch_gemm_*, group_count |
matrix_reorder.c |
Pre-reorder weights for repeated use |
aocl_reorder_*, mem_format_b = 'R' |
eltwise_ops.c |
Standalone element-wise operations | aocl_gemm_eltwise_ops_* |
| Example | Description | Key concepts |
|---|---|---|
multi_instance_gemm_f32.c |
Multiple GEMM instances in parallel | Thread-local settings, concurrent calls |
multi_instance_gemm_u8s8.c |
Multi-instance quantized GEMM | Parallel quantized inference |
version.c |
Query library version | dlp_version_query() |
If you are new to AOCL-DLP, work through the examples in this order:
- Quick Start -- Build and run your first program (inline example)
-
simple_gemm_f32.c-- Understand basic GEMM parameters -
simple_gemm_with_bias.c-- Learn how post-ops work -
matrix_reorder.c-- Optimize for repeated inference -
batch_gemm.c-- Process multiple matrices efficiently -
quantization.c-- Use integer quantization for inference
Then explore the guides for deeper understanding:
- GEMM Guide -- All data types, parameters, and reordering
- Post-Ops Guide -- Full post-operations reference
- Performance Guide -- Threading and optimization
If AOCL-DLP is already installed on your system, you can build examples standalone:
# Using shared library
gcc -o simple_gemm_f32 simple_gemm_f32.c -I/usr/local/include -L/usr/local/lib -laocl-dlp -lm
# Using static library
gcc -o simple_gemm_f32 simple_gemm_f32.c -I/usr/local/include -L/usr/local/lib \
-Wl,--whole-archive -laocl-dlp_static -Wl,--no-whole-archive -lstdc++ -lm -fopenmpSee the Integration Guide for CMake-based builds and troubleshooting.
Getting Started
User Guides
- Library Overview
- GEMM Guide
- Batch GEMM Guide
- Post-Operations
- Eltwise Operations
- Quantization
- API Lifecycle
Performance & Config
Testing & Benchmarking
Developer Guides
Reference