Home

AOCL-DLP Documentation Hub

Welcome to the AOCL-DLP (AMD Optimizing CPU Libraries - Deep Learning Primitives) documentation hub! This wiki serves as your central navigation point to all AOCL-DLP documentation resources.

About AOCL-DLP

AOCL-DLP is a high-performance library designed to provide optimized deep learning primitives for AMD processors. It implements Low Precision GEMM (LPGEMM) operations for machine learning applications, supporting multiple data types, pre-operations, and post-operations. The library is specifically optimized to leverage AMD hardware capabilities including AVX2, AVX512, AVX512_VNNI, and AVX512_BF16 instruction sets.

Key Features

Highly Optimized GEMM Operations - High-performance matrix multiplication targeting AMD CPUs
Multiple Data Type Support - Various precision formats (float, bfloat16, int8, uint8, int32)
Comprehensive Post-Operations - SUM, ELTWISE, BIAS, SCALE, MATRIX_ADD, MATRIX_MUL
Batch GEMM Support - Optimized for handling multiple GEMM operations
Symmetric Quantization - Specialized routines for quantized operations
Thread Optimization - Parallel execution via OpenMP

📚 Documentation Resources

Project Information

🏠 Project README - Overview and feature summary
🔧 Build Instructions - Compilation guide
📦 Installation Guide - Installation steps
📄 License - BSD 3-Clause License
📚 Examples - Examples of how to use the library

Community & Support

Development Tools & Coverage

📊 Code Coverage Tools - Comprehensive coverage analysis and HTML report generation

🔧 API Documentation

Core APIs (C Interface)

GEMM API

High-performance General Matrix Multiplication operations supporting various data type combinations:

Float Operations: f32f32f32of32
Bfloat16 Operations: bf16bf16f32of32, bf16bf16f32obf16, bf16s4f32of32, bf16s4f32obf16
Integer Operations: u8s8s32os32, u8s8s32os8, u8s8s32ou8, u8s8s32of32, u8s8s32obf16
Signed Integer: s8s8s32os32, s8s8s32os8, s8s8s32ou8, s8s8s32of32, s8s8s32obf16

📖 Detailed API Reference: Doxygen API Documentation

Batch GEMM API

Optimized batch processing for multiple GEMM operations with support for all data type combinations.

📖 Detailed API Reference: Doxygen Batch GEMM Documentation

Post-Operations API

Comprehensive post-processing operations that can be chained with GEMM:

SUM: Element-wise addition with scaling and zero point
ELTWISE: Activation functions (RELU, PRELU, GELU_TANH, GELU_ERF, CLIP, SWISH, TANH, SIGMOID)
BIAS: Bias addition to results
SCALE: Scaling operations
MATRIX_ADD/MUL: Matrix operations with scaling

📖 Detailed API Reference: Doxygen Post-Operations Documentation

Quantization API

Specialized symmetric quantization routines:

s8s8s32of32_sym_quant
s8s8s32obf16_sym_quant

📖 Detailed API Reference: Doxygen Quantization Documentation

Utility Functions API

Standalone mathematical operations:

gelu_tanh_f32 - GELU activation with tanh approximation
gelu_erf_f32 - GELU activation with erf approximation
softmax_f32 - Softmax function for float

📖 Detailed API Reference: Doxygen Utility Functions Documentation

Eltwise Operations API

Specialized element-wise operations supporting various input/output type combinations:

bf16of32, bf16obf16, f32of32, f32obf16, f32os32, f32os8

📖 Detailed API Reference: Doxygen Eltwise Operations Documentation

Library Management API

Essential functions for library initialization, cleanup, and configuration.

📖 Detailed API Reference: Doxygen Library Interface Documentation

📋 Complete API Reference

Comprehensive Documentation

🔗 Complete Doxygen API Documentation - Full API reference with detailed function descriptions
🔗 Sphinx Documentation - User guide and tutorials

💡 Examples and Tutorials

Code Examples

Explore practical implementations and usage patterns:

Tutorial Documentation

🧪 Testing and Validation

Testing Framework

Comprehensive testing suite for validation and performance benchmarking:

Integration Tests - End-to-end workflow testing
Performance Benchmarks - Speed and accuracy measurements
Correctness Validation - Numerical accuracy verification

📖 Testing Documentation: Testing Framework Guide

🏗️ Development Resources

Build System

CMake Configuration - Modern CMake-based build system
Cross-platform Support - Linux, Windows compatibility
Dependency Management - Automated dependency resolution
Build Options - Customizable build configurations

Development Tools

Code Formatting - Consistent code style with clang-format
Pre-commit Hooks - Automated code quality checks
Continuous Integration - Automated testing and validation

⚡ Performance and Hardware Support

Optimized for AMD Hardware

Zen1+ Architecture - AVX2/FMA3 support
Zen4+ Architecture - AVX512, AVX512_VNNI, AVX512_BF16 support
x86_64 Compatibility - Runs on any compatible x86_64 CPU

Performance Features

NUMA Awareness - Optimized memory access patterns
Thread Scaling - Efficient parallel execution
Memory Management - Optimized data layouts and caching

This documentation hub is continuously updated. For the latest information, please refer to the linked resources above.

Home | Quick Start | API Reference | Report Issue | Source Code

AOCL-DLP Wiki

Getting Started

User Guides

Performance & Config

Testing & Benchmarking

Developer Guides

JIT Code Generation

Reference

Uh oh!

Home

AOCL-DLP Documentation Hub

About AOCL-DLP

Key Features

📚 Documentation Resources

Project Information

Community & Support

Development Tools & Coverage

🔧 API Documentation

Core APIs (C Interface)

GEMM API

Batch GEMM API

Post-Operations API

Quantization API

Utility Functions API

Eltwise Operations API

Library Management API

📋 Complete API Reference

Comprehensive Documentation

💡 Examples and Tutorials

Code Examples

Tutorial Documentation

🧪 Testing and Validation

Testing Framework

🏗️ Development Resources

Build System

Development Tools

⚡ Performance and Hardware Support

Optimized for AMD Hardware

Performance Features

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally