Skip to content

Examples and Tutorials

Nallani Bhaskar edited this page Jun 15, 2026 · 4 revisions

Examples and Tutorials

AOCL-DLP ships with example programs in the examples/classic/ directory. Build them with:

cd aocl-dlp
mkdir build && cd build
cmake -DBUILD_EXAMPLES=ON ..
make -j$(nproc)

Compiled examples are in build/examples/classic/.

Example Catalog

Basic GEMM

Example Description Key concepts
simple_gemm_f32.c Float32 matrix multiplication Basic GEMM call, row-major layout
simple_gemm_bf16.c BFloat16 GEMM BF16 input type, f32 accumulation
simple_gemm_s8.c Signed int8 GEMM Integer quantized GEMM

Mixed Precision

Example Description Key concepts
simple_gemm_bf16s8.c BF16 activations with int8 weights Mixed-precision, on-the-fly quantization
simple_gemm_f32s8.c F32 activations with int8 weights Mixed-precision quantized inference

Post-Operations

Example Description Key concepts
simple_gemm_with_bias.c GEMM with fused bias addition dlp_metadata_t, BIAS post-op
simple_gemm_with_relu.c GEMM with fused ReLU activation ELTWISE post-op, RELU
simple_gemm_with_mish.c F32 GEMM with fused Mish activation aocl_gemm_f32f32f32of32, ELTWISE post-op, MISH algo_type
post_ops_combinations.c Multiple chained post-operations Chaining BIAS + ELTWISE, seq_vector

Quantization

Example Description Key concepts
quantization.c Symmetric quantization workflow DLP_SYMM_STAT_QUANT, sym_quant APIs
simple_gemm_s8_sym_quant.c s8 x s8 -> f32 GEMM with symmetric static quantization aocl_gemm_s8s8s32of32_sym_quant, post_op_grp scales, group_size
simple_gemm_per_token_quant.c W8A8 s8 x s8 GEMM with per-token (PerM) A dequant, incl. n=1 decoder path aocl_gemm_s8s8s32of32, SCALE post-op, DLP_PARAM_DIM_PER_TOKEN
simple_gemm_bf16s4.c BF16 activations x s4 weights, symmetric weight-only quantization (WOQ) aocl_gemm_bf16s4f32of32, aocl_reorder_bf16s4f32of32, pre_ops->b_scl
simple_gemm_bf16u4.c BF16 activations x u4 weights, asymmetric WOQ with B zero-point aocl_gemm_bf16u4f32of32, pre_ops b_scl + b_zp

Batch & Advanced

Example Description Key concepts
batch_gemm.c Batch GEMM for multiple matrices aocl_batch_gemm_*, group_count
matrix_reorder.c Pre-reorder weights for repeated use aocl_reorder_*, mem_format_b = 'R'
eltwise_ops.c Standalone element-wise operations aocl_gemm_eltwise_ops_*

Multi-Instance & Utilities

Example Description Key concepts
multi_instance_gemm_f32.c Multiple GEMM instances in parallel Thread-local settings, concurrent calls
multi_instance_gemm_u8s8.c Multi-instance quantized GEMM Parallel quantized inference
version.c Query library version dlp_version_query()

Suggested Learning Path

If you are new to AOCL-DLP, work through the examples in this order:

  1. Quick Start -- Build and run your first program (inline example)
  2. simple_gemm_f32.c -- Understand basic GEMM parameters
  3. simple_gemm_with_bias.c -- Learn how post-ops work
  4. matrix_reorder.c -- Optimize for repeated inference
  5. batch_gemm.c -- Process multiple matrices efficiently
  6. quantization.c -- Use integer quantization for inference

Then explore the guides for deeper understanding:

Building Examples Against an Installed Library

If AOCL-DLP is already installed on your system, you can build examples standalone:

# Using shared library
gcc -o simple_gemm_f32 simple_gemm_f32.c -I/usr/local/include -L/usr/local/lib -laocl-dlp -lm

# Using static library
gcc -o simple_gemm_f32 simple_gemm_f32.c -I/usr/local/include -L/usr/local/lib \
    -Wl,--whole-archive -laocl-dlp_static -Wl,--no-whole-archive -lstdc++ -lm -fopenmp

See the Integration Guide for CMake-based builds and troubleshooting.

Clone this wiki locally