Home

AOCL-DLP Documentation Hub

Welcome to the AOCL-DLP (AMD Optimizing CPU Libraries - Deep Learning Primitives) documentation hub! This wiki serves as your central navigation point to all AOCL-DLP documentation resources.

About AOCL-DLP

AOCL-DLP is a high-performance library designed to provide optimized deep learning primitives for AMD processors. It implements Low Precision GEMM (LPGEMM) operations for machine learning applications, supporting multiple data types, pre-operations, and post-operations. The library is specifically optimized to leverage AMD hardware capabilities including AVX2, AVX512, AVX512_VNNI, and AVX512_BF16 instruction sets.

Key Features

Highly Optimized GEMM Operations - High-performance matrix multiplication targeting AMD CPUs
Multiple Data Type Support - Various precision formats (float, bfloat16, int8, uint8, int32)
Comprehensive Post-Operations - SUM, ELTWISE, BIAS, SCALE, MATRIX_ADD, MATRIX_MUL
Batch GEMM Support - Optimized for handling multiple GEMM operations
Symmetric Quantization - Specialized routines for quantized operations
Thread Optimization - Parallel execution via OpenMP

📚 Documentation Resources

Project Information

🏠 Project README - Overview and feature summary
🔧 Build Instructions - Compilation guide
📦 Installation Guide - Installation steps
📄 License - BSD 3-Clause License
📚 Examples - Examples of how to use the library

Community & Support

Development Tools & Coverage

📊 Code Coverage Tools - Comprehensive coverage analysis and HTML report generation

🔧 API Documentation

Core APIs (C Interface)

GEMM API

High-performance General Matrix Multiplication operations supporting various data type combinations:

Float Operations: f32f32f32of32
Bfloat16 Operations: bf16bf16f32of32, bf16bf16f32obf16, bf16s4f32of32, bf16s4f32obf16
Integer Operations: u8s8s32os32, u8s8s32os8, u8s8s32ou8, u8s8s32of32, u8s8s32obf16
Signed Integer: s8s8s32os32, s8s8s32os8, s8s8s32ou8, s8s8s32of32, s8s8s32obf16

📖 Detailed API Reference: GEMM API Documentation

Batch GEMM API

Optimized batch processing for multiple GEMM operations with support for all data type combinations.

📖 Detailed API Reference: Batch GEMM Documentation

Post-Operations API

Comprehensive post-processing operations that can be chained with GEMM:

SUM: Element-wise addition with scaling and zero point
ELTWISE: Activation functions (RELU, PRELU, GELU_TANH, GELU_ERF, CLIP, SWISH, TANH, SIGMOID)
BIAS: Bias addition to results
SCALE: Scaling operations
MATRIX_ADD/MUL: Matrix operations with scaling

📖 Detailed API Reference: Post-Operations Documentation

Quantization API

Specialized symmetric quantization routines:

s8s8s32of32_sym_quant
s8s8s32obf16_sym_quant

📖 Detailed API Reference: Quantization Documentation

Utility Functions API

Standalone mathematical operations:

gelu_tanh_f32 - GELU activation with tanh approximation
gelu_erf_f32 - GELU activation with erf approximation
softmax_f32 - Softmax function for float

📖 Detailed API Reference: Utility Functions Documentation

Eltwise Operations API

Specialized element-wise operations supporting various input/output type combinations:

bf16of32, bf16obf16, f32of32, f32obf16, f32os32, f32os8

📖 Detailed API Reference: Eltwise Operations Documentation

Library Management API

Essential functions for library initialization, cleanup, and configuration.

📖 Detailed API Reference: Library Interface Documentation

📋 Complete API Reference

Comprehensive Documentation

🔗 Complete API Documentation - Full API reference with detailed function descriptions
🔗 API Overview - User guide and tutorials

🚀 Getting Started

Quick Start

⚡ Quick Start Guide - NEW USERS START HERE! Get up and running in 5 minutes
- Installation
- Your first AOCL-DLP program
- Build and run examples
- Common first-time issues

Integration Guide

📖 Integration Guide - COMPREHENSIVE REFERENCE for integrating AOCL-DLP into your application
- CMake package integration (recommended)
- Manual linking instructions
- Static vs dynamic linking (including critical --whole-archive flag)
- Complete working examples
- Troubleshooting & FAQ

💡 Examples and Tutorials

Code Examples

Explore practical implementations and usage patterns:

Tutorial Documentation

🧪 Testing and Validation

Testing Framework

Comprehensive testing suite for validation and performance benchmarking:

Integration Tests - End-to-end workflow testing
Performance Benchmarks - Speed and accuracy measurements
Correctness Validation - Numerical accuracy verification

📖 Testing Documentation: Testing Framework Guide

🏗️ Development Resources

Build System

CMake Configuration - Modern CMake-based build system
Cross-platform Support - Linux, Windows compatibility
Dependency Management - Automated dependency resolution
Build Options - Customizable build configurations

Development Tools

Code Formatting - Consistent code style with clang-format
Pre-commit Hooks - Automated code quality checks
Continuous Integration - Automated testing and validation

Advanced Development

🔧 JIT Code Generation Guide - Just-In-Time compilation system for runtime optimization and kernel debugging

⚡ Performance and Hardware Support

Optimized for AMD Hardware

Zen1+ Architecture - AVX2/FMA3 support
Zen4+ Architecture - AVX512, AVX512_VNNI, AVX512_BF16 support
x86_64 Compatibility - Runs on any compatible x86_64 CPU

Performance Features

NUMA Awareness - Optimized memory access patterns
Thread Scaling - Efficient parallel execution
Memory Management - Optimized data layouts and caching

This documentation hub is continuously updated. For the latest information, please refer to the linked resources above.

Home | Quick Start | API Reference | Report Issue | Source Code

AOCL-DLP Wiki

Getting Started

User Guides

Performance & Config

Testing & Benchmarking

Developer Guides

JIT Code Generation

Reference

Uh oh!

Home

AOCL-DLP Documentation Hub

About AOCL-DLP

Key Features

📚 Documentation Resources

Project Information

Community & Support

Development Tools & Coverage

🔧 API Documentation

Core APIs (C Interface)

GEMM API

Batch GEMM API

Post-Operations API

Quantization API

Utility Functions API

Eltwise Operations API

Library Management API

📋 Complete API Reference

Comprehensive Documentation

🚀 Getting Started

Quick Start

Integration Guide

💡 Examples and Tutorials

Code Examples

Tutorial Documentation

🧪 Testing and Validation

Testing Framework

🏗️ Development Resources

Build System

Development Tools

Advanced Development

⚡ Performance and Hardware Support

Optimized for AMD Hardware

Performance Features

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally