Skip to content

Micky774/QoLA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

QoLA — Quality of Life AITER

Manifest-driven ahead-of-time (AOT) builder for AITER kernels. QoLA wraps AITER's build_module() JIT compilation system with a declarative TOML manifest, producing either:

  • pybind11 Python modules — standard .so files importable from Python (requires PyTorch)
  • torch-free C-linkable shared libraries (cpp_itfs mode) — plain .so files linked via HIP/ROCm with no PyTorch dependency

QoLA is designed for Transformer Engine to pre-build AITER attention (MHA) kernels at package install time, replacing hours-long JIT compilation with a structured, reproducible build.

Why QoLA?

  • Declarative manifests — a single TOML file pins the AITER commit, target architectures, kernel modules, and MHA variant matrix
  • torch-free buildscpp_itfs mode eliminates the PyTorch build dependency for C-linkable libraries
  • Symbol isolation — linker version scripts and C++ namespace wrapping prevent symbol collisions when multiple AITER-backed .so files coexist in one process
  • No AITER modifications — QoLA reconstructs AITER's build namespace without importing aiter, and compiles AITER sources unmodified

Requirements

  • Python >= 3.10
  • ROCm / HIP toolchain (hipcc)
  • AITER source tree (included as a git submodule at 3rdparty/aiter/)
  • PyTorch (pybind mode only)

Installation

pip install -e .

Quick Start

# Build all modules declared in a manifest (pybind mode)
qola build \
  --manifest example/te-manifest.toml \
  --aiter-root 3rdparty/aiter \
  --output-dir /tmp/qola-out

# Build in cpp_itfs mode (no PyTorch dependency)
qola build \
  --manifest example/te-manifest.toml \
  --aiter-root 3rdparty/aiter \
  --output-dir /tmp/qola-out \
  --mode cpp_itfs

CLI Options

Option Description
--manifest Path to the TOML manifest file
--aiter-root Path to the AITER source tree
--output-dir Directory for build artifacts
--arch Target GPU architecture (repeatable, e.g. --arch gfx950)
--mode Build mode: pybind (default) or cpp_itfs
--verbose Enable verbose build output

Manifest Format

The manifest is a TOML file that declares what to build. See example/te-manifest.toml for a full example.

[qola]
aiter_commit = "33f2e6a..."   # Pinned AITER commit
namespace = "te"               # C++ namespace and .so prefix
rocm_versions = ["7.2"]

[build]
architectures = ["gfx950"]

# Static modules from AITER's optCompilerConfig.json
[[modules]]
name = "libmha_fwd"
mode = "cpp_itfs"
drop_srcs = ["mha_fwd_split.cu", "mha_fwd_batch_prefill.cu"]
drop_directions = ["fwd_splitkv", "batch_prefill"]

[[modules]]
name = "libmha_bwd"
mode = "cpp_itfs"

# MHA variant matrix — Cartesian expansion of CK codegen filters
[[mha_fwd_variants]]
dtype = ["bf16", "fp16"]
has_lse = true
has_skip = false

[[mha_bwd_variants]]
dtype = ["bf16", "fp16"]

Build Modes

pybind (default)

Produces pybind11 .so modules importable from Python. Requires PyTorch at both build and runtime.

cpp_itfs

Produces torch-free C-linkable shared libraries. Each module exposes a C++ API under the configured namespace:

#include "qola_mha_fwd.h"

// With namespace = "te":
float ret = qola::te::mha_fwd(args, stream_config);

Source replacement is driven by cpp_itfs/registry.toml: pybind entry points are swapped for thin C wrappers that expose a namespace-guarded C++ API.

Build Output

output-dir/
  lib/                    # Compiled .so files
    te_libmha_fwd.so
    te_libmha_bwd.so
  configs/                # AITER tuning CSVs
  manifest.json           # Build metadata and per-module results

Available Kernel Modules

Module Description cpp_itfs API
libmha_fwd Multi-head attention forward qola::te::mha_fwd()
libmha_bwd Multi-head attention backward qola::te::mha_bwd()

Architecture

Namespace Resolution

QoLA reconstructs AITER's build-time eval namespace from a source tree path alone, without ever running import aiter. This avoids AITER's __init__.py side effects and torch import requirements. See resolver.py.

Symbol Collision Prevention

Two layers prevent symbol leaks when multiple .so files coexist:

  1. C++ namespace wrappingQOLA_NS_BEGIN/QOLA_NS_END macros place all public symbols under qola::<namespace>::
  2. Linker version scriptqola_exports.lds forces all non-qola::* symbols local, including AITER symbols with explicit visibility("default")

MHA Variant Matrix

The manifest's [[mha_fwd_variants]] / [[mha_bwd_variants]] sections declare option dimensions (dtype, has_bias, has_mask, etc.) that are expanded into CK codegen filter patterns. This controls which of the ~34K possible kernel instances are actually compiled. See variant_matrix.py. This is currently only support for pybind11 output.

HSA Blob Embedding

generate_embedded_hsa.py converts binary .co ASM blobs into a C++ header with compile-time byte arrays, enabling kernel distribution without a runtime AITER_ASM_DIR.

Roadmap

  • CI support for building and publishing pre-built libraries from manifests
  • Kernel filtering for libmha — prune CK codegen instances based on manifest variant declarations in cpp_itfs mode (currently pybind-only)
  • C-level JIT for libmha — compile MHA variant .so files on first use at the C layer, avoiding ahead-of-time compilation of the full variant matrix

License

See the parent repository for license terms.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors