AMD ROCm™ LLMExt (ROCm-LLMExt) is an open-source software toolkit built on the ROCm platform for large language model (LLM) extensions, integrations, and performance enablement on AMD GPUs. The domain brings together training, post-training, inference, and orchestration components to make modern LLM stacks practical and reproducible on AMD hardware.
- Large-scale transformer training
- Distributed parallelism (data, tensor, pipeline)
- Mixed precision and performance tuning
- Mixture-of-Experts (MoE) enablement
- Reinforcement learning and post-training workflows
- Scalable experimentation
- Reproducible configurations
- High-throughput decoding and low-latency serving
- Optimized attention and inference operators
- Lightweight and edge-friendly inference paths
- Multi-node orchestration
- Cluster bring-up and scheduling
- Batch and online inference pipelines
ROCm-LLMExt provides reference integrations, build instructions, patches when required, benchmarks, and examples for the following projects:
- Verl: reinforcement learning and post-training workflows for LLMs
- Ray: distributed execution framework for training, inference, and serving
- FlashInfer: optimized inference operators such as attention and decoding kernels
- MegaBlocks: high-performance Mixture-of-Experts building blocks
- Stanford Megatron-LM: large-scale transformer training using Megatron-style parallelism
- Llama.cpp: lightweight and portable LLM inference for servers, desktops, edge devices and HPC environments
Refer to the individual component pages for documentation on system requirements, installation instructions and examples.