-
Notifications
You must be signed in to change notification settings - Fork 5
Home
Welcome to the AOCL-DLP (AMD Optimizing CPU Libraries - Deep Learning Primitives) documentation hub! This wiki serves as your central navigation point to all AOCL-DLP documentation resources.
AOCL-DLP is a high-performance library designed to provide optimized deep learning primitives for AMD processors. It implements Low Precision GEMM (LPGEMM) operations for machine learning applications, supporting multiple data types, pre-operations, and post-operations. The library is specifically optimized to leverage AMD hardware capabilities including AVX2, AVX512, AVX512_VNNI, and AVX512_BF16 instruction sets.
- Highly Optimized GEMM Operations - High-performance matrix multiplication targeting AMD CPUs
- Multiple Data Type Support - Various precision formats (float, bfloat16, int8, uint8, int32)
- Comprehensive Post-Operations - SUM, ELTWISE, BIAS, SCALE, MATRIX_ADD, MATRIX_MUL
- Batch GEMM Support - Optimized for handling multiple GEMM operations
- Symmetric Quantization - Specialized routines for quantized operations
- Thread Optimization - Parallel execution via OpenMP
- 🏠 Project README - Overview and feature summary
- 🔧 Build Instructions - Compilation guide
- 📦 Installation Guide - Installation steps
- 📄 License - BSD 3-Clause License
- 📚 Examples - Examples of how to use the library
- 📊 Code Coverage Tools - Comprehensive coverage analysis and HTML report generation
High-performance General Matrix Multiplication operations supporting various data type combinations:
-
Float Operations:
f32f32f32of32 -
Bfloat16 Operations:
bf16bf16f32of32,bf16bf16f32obf16,bf16s4f32of32,bf16s4f32obf16 -
Integer Operations:
u8s8s32os32,u8s8s32os8,u8s8s32ou8,u8s8s32of32,u8s8s32obf16 -
Signed Integer:
s8s8s32os32,s8s8s32os8,s8s8s32ou8,s8s8s32of32,s8s8s32obf16
📖 Detailed API Reference: Doxygen API Documentation
Optimized batch processing for multiple GEMM operations with support for all data type combinations.
📖 Detailed API Reference: Doxygen Batch GEMM Documentation
Comprehensive post-processing operations that can be chained with GEMM:
- SUM: Element-wise addition with scaling and zero point
- ELTWISE: Activation functions (RELU, PRELU, GELU_TANH, GELU_ERF, CLIP, SWISH, TANH, SIGMOID)
- BIAS: Bias addition to results
- SCALE: Scaling operations
- MATRIX_ADD/MUL: Matrix operations with scaling
📖 Detailed API Reference: Doxygen Post-Operations Documentation
Specialized symmetric quantization routines:
s8s8s32of32_sym_quants8s8s32obf16_sym_quant
📖 Detailed API Reference: Doxygen Quantization Documentation
Standalone mathematical operations:
-
gelu_tanh_f32- GELU activation with tanh approximation -
gelu_erf_f32- GELU activation with erf approximation -
softmax_f32- Softmax function for float
📖 Detailed API Reference: Doxygen Utility Functions Documentation
Specialized element-wise operations supporting various input/output type combinations:
-
bf16of32,bf16obf16,f32of32,f32obf16,f32os32,f32os8
📖 Detailed API Reference: Doxygen Eltwise Operations Documentation
Essential functions for library initialization, cleanup, and configuration.
📖 Detailed API Reference: Doxygen Library Interface Documentation
- 🔗 Complete Doxygen API Documentation - Full API reference with detailed function descriptions
- 🔗 Sphinx Documentation - User guide and tutorials
Explore practical implementations and usage patterns:
Comprehensive testing suite for validation and performance benchmarking:
- Integration Tests - End-to-end workflow testing
- Performance Benchmarks - Speed and accuracy measurements
- Correctness Validation - Numerical accuracy verification
📖 Testing Documentation: Testing Framework Guide
- CMake Configuration - Modern CMake-based build system
- Cross-platform Support - Linux, Windows compatibility
- Dependency Management - Automated dependency resolution
- Build Options - Customizable build configurations
- Code Formatting - Consistent code style with clang-format
- Pre-commit Hooks - Automated code quality checks
- Continuous Integration - Automated testing and validation
- Zen1+ Architecture - AVX2/FMA3 support
- Zen4+ Architecture - AVX512, AVX512_VNNI, AVX512_BF16 support
- x86_64 Compatibility - Runs on any compatible x86_64 CPU
- NUMA Awareness - Optimized memory access patterns
- Thread Scaling - Efficient parallel execution
- Memory Management - Optimized data layouts and caching
This documentation hub is continuously updated. For the latest information, please refer to the linked resources above.
Getting Started
User Guides
- Library Overview
- GEMM Guide
- Batch GEMM Guide
- Post-Operations
- Eltwise Operations
- Quantization
- API Lifecycle
Performance & Config
Testing & Benchmarking
Developer Guides
Reference