LLVM AMDGPU Assembler Helper Tools
Clone or download
Type Name Latest commit message Commit time
Failed to load latest commit information.
amdphdrs Remove unnecessary whitespace. May 17, 2016
bin Remove unnecessary whitespace. May 17, 2016
cmake_modules Initial import. Feb 29, 2016
examples Remove zeroing gds by each wave Jun 15, 2017
.gitignore Add .gitignore. May 17, 2016
CMakeLists.txt Add missing CMAKE_MODULE_PATH setting. May 17, 2016
LICENSE Rename LIC to LICENSE Aug 20, 2016
README.md Update README.md Jun 5, 2017


LLVM AMDGPU Assembler Helper Tools


This repository contains the following useful items related to AMDGPU ISA assembler:

  • amdphdrs: utility to convert ELF produced by llvm-mc into AMD Code Object (v1)
  • examples/asm-kernel: example of AMDGPU kernel code
  • examples/gfx8/ds_bpermute: transfer data between lanes in a wavefront with ds_bpermute_b32
  • examples/gfx8/dpp_reduce: calculate prefix sum in a wavefront with DPP instructions
  • examples/gfx8/s_memrealtime: use s_memrealtime instruction to create a delay
  • examples/gfx8/s_memrealtime_inline: inline assembly in OpenCL kernel version of s_memrealtime
  • examples/api/assemble: use LLVM API to assemble a kernel
  • examples/api/disassemble: use LLVM API to disassemble a stream of instructions
  • bin/sp3_to_mc.pl: script to convert some AMD sp3 legacy assembler syntax into LLVM MC
  • examples/sp3: examples of sp3 convertable code

At the time of this writing (February 2016), LLVM trunk build and latest ROCR runtime is needed.

LLVM trunk (May or later) now uses lld as linker and produces AMD Code Object (v2).


Top-level CMakeLists.txt is provided to build everything included. The following CMake variables should be set:

  • HSA_DIR (default /opt/hsa/bin): path to ROCR Runtime
  • LLVM_DIR: path to LLVM build directory

To build everything, create build directory and run cmake and make:

mkdir build
cd build  
cmake -DLLVM_DIR=/srv/git/llvm.git/build ..

Examples that require clang will only be built if clang is built as part of llvm.

Use cases

Assembling to code object with llvm-mc from command line

The following llvm-mc command line produces ELF object asm.o from assembly source asm.s:

llvm-mc -arch=amdgcn -mcpu=fiji -filetype=obj -o asm.o asm.s
Assembling to raw instruction stream with llvm-mc from command line

It is possible to extract contents of .text section after assembling to code object:

llvm-mc -arch=amdgcn -mcpu=fiji -filetype=obj -o asm.o asm.s
objdump -h asm.o | grep .text | awk '{print "dd if='asm.o' of='asm' bs=1 count=$[0x" $3 "] skip=$[0x" $6 "]"}' | bash
Disassembling code object from command line

The following command line may be used to dump contents of code object:

llvm-objdump -disassemble -mcpu=fiji asm.o

This includes text disassembly of .text section.

Disassembling raw instruction stream from command line

The following command line may be used to disassemble raw instruction stream (without ELF structure):

hexdump -v -e '/1 "0x%02X "' asm | llvm-mc -arch=amdgcn -mcpu=fiji -disassemble

Here, hexdump is used to display contents of file in hexadecimal (0x.. form) which is then consumed by llvm-mc.

Assembling source into code object using LLVM API

Refer to examples/api/assemble.

Disassembling instruction stream using LLVM API

Refer to examples/api/disassemble.

Using amdphdrs

Note that normally standard lld and Code Object version 2 should be used which is closer to standard ELF format.

amdphdrs (now obsolete) is complimentary utility that can be used to produce AMDGPU Code Object version 1.
For example, given assembly source in asm.s, the following will assemble it and link using amdphdrs:

llvm-mc -arch=amdgcn -mcpu=fiji -filetype=obj -o asm.o asm.s
andphdrs asm.o asm.co

Differences between LLVM AMDGPU Assembler and AMD SP3 assembler

Macro support

SP3 supports proprietary set of macros/tools. sp3_to_mc.pl script attempts to translate them into GAS syntax understood by llvm-mc.

flat_atomic_cmpswap instruction has 32-bit destination


flat_atomic_cmpswap v7, v[9:10], v[7:8]


flat_atomic_cmpswap v[7:8], v[9:10], v[7:8]
Atomic instructions that return value should have glc flag explicitly

LLVM AMDGPU: flat_atomic_swap_x2 v[0:1], v[0:1], v[2:3] glc

SP3 flat_atomic_swap_x2 v[0:1], v[0:1], v[2:3]