Skip to content
This repository has been archived by the owner on Jan 26, 2024. It is now read-only.

ROCm/LLVM-AMDGPU-Assembler-Extra

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

63 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LLVM AMDGPU Assembler Helper Tools

Overview

This repository contains the following useful items related to AMDGPU ISA assembler:

  • amdphdrs: utility to convert ELF produced by llvm-mc into AMD Code Object (v1)
  • examples/asm-kernel: example of AMDGPU kernel code
  • examples/gfx8/ds_bpermute: transfer data between lanes in a wavefront with ds_bpermute_b32
  • examples/gfx8/dpp_reduce: calculate prefix sum in a wavefront with DPP instructions
  • examples/gfx8/s_memrealtime: use s_memrealtime instruction to create a delay
  • examples/gfx8/s_memrealtime_inline: inline assembly in OpenCL kernel version of s_memrealtime
  • examples/api/assemble: use LLVM API to assemble a kernel
  • examples/api/disassemble: use LLVM API to disassemble a stream of instructions
  • bin/sp3_to_mc.pl: script to convert some AMD sp3 legacy assembler syntax into LLVM MC
  • examples/sp3: examples of sp3 convertable code

At the time of this writing (February 2016), LLVM trunk build and latest ROCR runtime is needed.

LLVM trunk (May or later) now uses lld as linker and produces AMD Code Object (v2).

Building

Top-level CMakeLists.txt is provided to build everything included. The following CMake variables should be set:

  • HSA_DIR (default /opt/hsa/bin): path to ROCR Runtime
  • LLVM_DIR: path to LLVM build directory

To build everything, create build directory and run cmake and make:

mkdir build
cd build  
cmake -DLLVM_DIR=/srv/git/llvm.git/build ..
make

Examples that require clang will only be built if clang is built as part of llvm.

Use cases

Assembling to code object with llvm-mc from command line

The following llvm-mc command line produces ELF object asm.o from assembly source asm.s:

llvm-mc -arch=amdgcn -mcpu=fiji -filetype=obj -o asm.o asm.s
Assembling to raw instruction stream with llvm-mc from command line

It is possible to extract contents of .text section after assembling to code object:

llvm-mc -arch=amdgcn -mcpu=fiji -filetype=obj -o asm.o asm.s
objdump -h asm.o | grep .text | awk '{print "dd if='asm.o' of='asm' bs=1 count=$[0x" $3 "] skip=$[0x" $6 "]"}' | bash
Disassembling code object from command line

The following command line may be used to dump contents of code object:

llvm-objdump -disassemble -mcpu=fiji asm.o

This includes text disassembly of .text section.

Disassembling raw instruction stream from command line

The following command line may be used to disassemble raw instruction stream (without ELF structure):

hexdump -v -e '/1 "0x%02X "' asm | llvm-mc -arch=amdgcn -mcpu=fiji -disassemble

Here, hexdump is used to display contents of file in hexadecimal (0x.. form) which is then consumed by llvm-mc.

Assembling source into code object using LLVM API

Refer to examples/api/assemble.

Disassembling instruction stream using LLVM API

Refer to examples/api/disassemble.

Using amdphdrs

Note that normally standard lld and Code Object version 2 should be used which is closer to standard ELF format.

amdphdrs (now obsolete) is complimentary utility that can be used to produce AMDGPU Code Object version 1.
For example, given assembly source in asm.s, the following will assemble it and link using amdphdrs:

llvm-mc -arch=amdgcn -mcpu=fiji -filetype=obj -o asm.o asm.s
andphdrs asm.o asm.co

Differences between LLVM AMDGPU Assembler and AMD SP3 assembler

Macro support

SP3 supports proprietary set of macros/tools. sp3_to_mc.pl script attempts to translate them into GAS syntax understood by llvm-mc.

flat_atomic_cmpswap instruction has 32-bit destination

LLVM AMDGPU:

flat_atomic_cmpswap v7, v[9:10], v[7:8]

SP3:

flat_atomic_cmpswap v[7:8], v[9:10], v[7:8]
Atomic instructions that return value should have glc flag explicitly

LLVM AMDGPU: flat_atomic_swap_x2 v[0:1], v[0:1], v[2:3] glc

SP3 flat_atomic_swap_x2 v[0:1], v[0:1], v[2:3]

References