Skip to content

Commit 96a56bd

Browse files
Nikolay HaustovNikolay Haustov
authored andcommitted
AMDGPU: Improve documentation.
Summary: Add links to ISA manuals and ABI. Add text about assembler syntax. Add info about instructions operands. Add instruction examples for each encoding. Update directives section, add missing .amdgpu_hsa_kernel. Reviewers: tstellarAMD, SamWot, vpykhtin Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, tony-tye, artem.tamazov, llvm-commits Differential Revision: https://reviews.llvm.org/D24724 llvm-svn: 281962
1 parent 02efef0 commit 96a56bd

File tree

2 files changed

+218
-68
lines changed

2 files changed

+218
-68
lines changed

llvm/docs/AMDGPUUsage.rst

Lines changed: 216 additions & 68 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,8 @@ Introduction
88
The AMDGPU back-end provides ISA code generation for AMD GPUs, starting with
99
the R600 family up until the current Volcanic Islands (GCN Gen 3).
1010

11+
Refer to `AMDGPU section in Architecture & Platform Information for Compiler Writers <CompilerWriterInfo.html#amdgpu>`_
12+
for additional documentation.
1113

1214
Conventions
1315
===========
@@ -35,96 +37,241 @@ OpenCL standard.
3537
Assembler
3638
=========
3739

38-
The assembler is currently considered experimental.
40+
AMDGPU backend has LLVM-MC based assembler which is currently in development.
41+
It supports Southern Islands ISA, Sea Islands and Volcanic Islands.
3942

40-
For syntax examples look in test/MC/AMDGPU.
43+
This document describes general syntax for instructions and operands. For more
44+
information about instructions, their semantics and supported combinations
45+
of operands, refer to one of Instruction Set Architecture manuals.
4146

42-
Below some of the currently supported features (modulo bugs). These
43-
all apply to the Southern Islands ISA, Sea Islands and Volcanic Islands
44-
are also supported but may be missing some instructions and have more bugs:
47+
An instruction has the following syntax (register operands are
48+
normally comma-separated while extra operands are space-separated):
4549

46-
DS Instructions
47-
---------------
48-
All DS instructions are supported.
50+
*<opcode> <register_operand0>, ... <extra_operand0> ...*
4951

50-
FLAT Instructions
51-
------------------
52-
These instructions are only present in the Sea Islands and Volcanic Islands
53-
instruction set. All FLAT instructions are supported for these architectures
5452

55-
MUBUF Instructions
56-
------------------
57-
All non-atomic MUBUF instructions are supported.
53+
Operands
54+
--------
5855

59-
SMRD Instructions
60-
-----------------
61-
Only the s_load_dword* SMRD instructions are supported.
56+
The following syntax for register operands is supported:
6257

63-
SOP1 Instructions
64-
-----------------
65-
All SOP1 instructions are supported.
58+
* SGPR registers: s0, ... or s[0], ...
59+
* VGPR registers: v0, ... or v[0], ...
60+
* TTMP registers: ttmp0, ... or ttmp[0], ...
61+
* Special registers: exec (exec_lo, exec_hi), vcc (vcc_lo, vcc_hi), flat_scratch (flat_scratch_lo, flat_scratch_hi)
62+
* Special trap registers: tba (tba_lo, tba_hi), tma (tma_lo, tma_hi)
63+
* Register pairs, quads, etc: s[2:3], v[10:11], ttmp[5:6], s[4:7], v[12:15], ttmp[4:7], s[8:15], ...
64+
* Register lists: [s0, s1], [ttmp0, ttmp1, ttmp2, ttmp3]
65+
* Register index expressions: v[2*2], s[1-1:2-1]
66+
* 'off' indicates that an operand is not enabled
6667

67-
SOP2 Instructions
68-
-----------------
69-
All SOP2 instructions are supported.
68+
The following extra operands are supported:
7069

71-
SOPC Instructions
72-
-----------------
73-
All SOPC instructions are supported.
70+
* offset, offset0, offset1
71+
* idxen, offen bits
72+
* glc, slc, tfe bits
73+
* waitcnt: integer or combination of counter values
74+
* VOP3 modifiers:
7475

75-
SOPP Instructions
76-
-----------------
76+
- abs (\| \|), neg (\-)
7777

78-
Unless otherwise mentioned, all SOPP instructions that have one or more
79-
operands accept integer operands only. No verification is performed
80-
on the operands, so it is up to the programmer to be familiar with the
81-
range or acceptable values.
78+
* DPP modifiers:
79+
80+
- row_shl, row_shr, row_ror, row_rol
81+
- row_mirror, row_half_mirror, row_bcast
82+
- wave_shl, wave_shr, wave_ror, wave_rol, quad_perm
83+
- row_mask, bank_mask, bound_ctrl
8284

83-
s_waitcnt
84-
^^^^^^^^^
85+
* SDWA modifiers:
8586

86-
s_waitcnt accepts named arguments to specify which memory counter(s) to
87-
wait for.
87+
- dst_sel, src0_sel, src1_sel (BYTE_N, WORD_M, DWORD)
88+
- dst_unused (UNUSED_PAD, UNUSED_SEXT, UNUSED_PRESERVE)
89+
- abs, neg, sext
90+
91+
DS Instructions Examples
92+
------------------------
8893

8994
.. code-block:: nasm
9095
91-
; Wait for all counters to be 0
92-
s_waitcnt 0
96+
ds_add_u32 v2, v4 offset:16
97+
ds_write_src2_b64 v2 offset0:4 offset1:8
98+
ds_cmpst_f32 v2, v4, v6
99+
ds_min_rtn_f64 v[8:9], v2, v[4:5]
93100
94-
; Equivalent to s_waitcnt 0. Counter names can also be delimited by
95-
; '&' or ','.
96-
s_waitcnt vmcnt(0) expcnt(0) lgkcmt(0)
97101
98-
; Wait for vmcnt counter to be 1.
99-
s_waitcnt vmcnt(1)
102+
For full list of supported instructions, refer to "LDS/GDS instructions" in ISA Manual.
100103

101-
VOP1, VOP2, VOP3, VOPC Instructions
102-
-----------------------------------
104+
FLAT Instruction Examples
105+
--------------------------
103106

104-
All 32-bit and 64-bit encodings should work.
107+
.. code-block:: nasm
105108
106-
The assembler will automatically detect which encoding size to use for
107-
VOP1, VOP2, and VOPC instructions based on the operands. If you want to force
108-
a specific encoding size, you can add an _e32 (for 32-bit encoding) or
109-
_e64 (for 64-bit encoding) suffix to the instruction. Most, but not all
110-
instructions support an explicit suffix. These are all valid assembly
111-
strings:
109+
flat_load_dword v1, v[3:4]
110+
flat_store_dwordx3 v[3:4], v[5:7]
111+
flat_atomic_swap v1, v[3:4], v5 glc
112+
flat_atomic_cmpswap v1, v[3:4], v[5:6] glc slc
113+
flat_atomic_fmax_x2 v[1:2], v[3:4], v[5:6] glc
114+
115+
For full list of supported instructions, refer to "FLAT instructions" in ISA Manual.
116+
117+
MUBUF Instruction Examples
118+
---------------------------
112119

113120
.. code-block:: nasm
114121
115-
v_mul_i32_i24 v1, v2, v3
116-
v_mul_i32_i24_e32 v1, v2, v3
117-
v_mul_i32_i24_e64 v1, v2, v3
122+
buffer_load_dword v1, off, s[4:7], s1
123+
buffer_store_dwordx4 v[1:4], v2, ttmp[4:7], s1 offen offset:4 glc tfe
124+
buffer_store_format_xy v[1:2], off, s[4:7], s1
125+
buffer_wbinvl1
126+
buffer_atomic_inc v1, v2, s[8:11], s4 idxen offset:4 slc
127+
128+
For full list of supported instructions, refer to "MUBUF Instructions" in ISA Manual.
129+
130+
SMRD/SMEM Instruction Examples
131+
-------------------------------
132+
133+
.. code-block:: nasm
134+
135+
s_load_dword s1, s[2:3], 0xfc
136+
s_load_dwordx8 s[8:15], s[2:3], s4
137+
s_load_dwordx16 s[88:103], s[2:3], s4
138+
s_dcache_inv_vol
139+
s_memtime s[4:5]
140+
141+
For full list of supported instructions, refer to "Scalar Memory Operations" in ISA Manual.
142+
143+
SOP1 Instruction Examples
144+
--------------------------
145+
146+
.. code-block:: nasm
147+
148+
s_mov_b32 s1, s2
149+
s_mov_b64 s[0:1], 0x80000000
150+
s_cmov_b32 s1, 200
151+
s_wqm_b64 s[2:3], s[4:5]
152+
s_bcnt0_i32_b64 s1, s[2:3]
153+
s_swappc_b64 s[2:3], s[4:5]
154+
s_cbranch_join s[4:5]
155+
156+
For full list of supported instructions, refer to "SOP1 Instructions" in ISA Manual.
157+
158+
SOP2 Instruction Examples
159+
-------------------------
160+
161+
.. code-block:: nasm
162+
163+
s_add_u32 s1, s2, s3
164+
s_and_b64 s[2:3], s[4:5], s[6:7]
165+
s_cselect_b32 s1, s2, s3
166+
s_andn2_b32 s2, s4, s6
167+
s_lshr_b64 s[2:3], s[4:5], s6
168+
s_ashr_i32 s2, s4, s6
169+
s_bfm_b64 s[2:3], s4, s6
170+
s_bfe_i64 s[2:3], s[4:5], s6
171+
s_cbranch_g_fork s[4:5], s[6:7]
172+
173+
For full list of supported instructions, refer to "SOP2 Instructions" in ISA Manual.
174+
175+
SOPC Instruction Examples
176+
--------------------------
177+
178+
.. code-block:: nasm
118179
119-
Assembler Directives
120-
--------------------
180+
s_cmp_eq_i32 s1, s2
181+
s_bitcmp1_b32 s1, s2
182+
s_bitcmp0_b64 s[2:3], s4
183+
s_setvskip s3, s5
184+
185+
For full list of supported instructions, refer to "SOPC Instructions" in ISA Manual.
186+
187+
SOPP Instruction Examples
188+
--------------------------
189+
190+
.. code-block:: nasm
191+
192+
s_barrier
193+
s_nop 2
194+
s_endpgm
195+
s_waitcnt 0 ; Wait for all counters to be 0
196+
s_waitcnt vmcnt(0) & expcnt(0) & lgkmcnt(0) ; Equivalent to above
197+
s_waitcnt vmcnt(1) ; Wait for vmcnt counter to be 1.
198+
s_sethalt 9
199+
s_sleep 10
200+
s_sendmsg 0x1
201+
s_sendmsg sendmsg(MSG_INTERRUPT)
202+
s_trap 1
203+
204+
For full list of supported instructions, refer to "SOPP Instructions" in ISA Manual.
205+
206+
Unless otherwise mentioned, little verification is performed on the operands
207+
of SOPP Instrucitons, so it is up to the programmer to be familiar with the
208+
range or acceptable values.
209+
210+
Vector ALU Instruction Examples
211+
-------------------------------
212+
213+
For vector ALU instruction opcodes (VOP1, VOP2, VOP3, VOPC, VOP_DPP, VOP_SDWA),
214+
the assembler will automatically use optimal encoding based on its operands.
215+
To force specific encoding, one can add a suffix to the opcode of the instruction:
216+
217+
* _e32 for 32-bit VOP1/VOP2/VOPC
218+
* _e64 for 64-bit VOP3
219+
* _dpp for VOP_DPP
220+
* _sdwa for VOP_SDWA
221+
222+
VOP1/VOP2/VOP3/VOPC examples:
223+
224+
.. code-block:: nasm
225+
226+
v_mov_b32 v1, v2
227+
v_mov_b32_e32 v1, v2
228+
v_nop
229+
v_cvt_f64_i32_e32 v[1:2], v2
230+
v_floor_f32_e32 v1, v2
231+
v_bfrev_b32_e32 v1, v2
232+
v_add_f32_e32 v1, v2, v3
233+
v_mul_i32_i24_e64 v1, v2, 3
234+
v_mul_i32_i24_e32 v1, -3, v3
235+
v_mul_i32_i24_e32 v1, -100, v3
236+
v_addc_u32 v1, s[0:1], v2, v3, s[2:3]
237+
v_max_f16_e32 v1, v2, v3
238+
239+
VOP_DPP examples:
240+
241+
.. code-block:: nasm
242+
243+
v_mov_b32 v0, v0 quad_perm:[0,2,1,1]
244+
v_sin_f32 v0, v0 row_shl:1 row_mask:0xa bank_mask:0x1 bound_ctrl:0
245+
v_mov_b32 v0, v0 wave_shl:1
246+
v_mov_b32 v0, v0 row_mirror
247+
v_mov_b32 v0, v0 row_bcast:31
248+
v_mov_b32 v0, v0 quad_perm:[1,3,0,1] row_mask:0xa bank_mask:0x1 bound_ctrl:0
249+
v_add_f32 v0, v0, |v0| row_shl:1 row_mask:0xa bank_mask:0x1 bound_ctrl:0
250+
v_max_f16 v1, v2, v3 row_shl:1 row_mask:0xa bank_mask:0x1 bound_ctrl:0
251+
252+
VOP_SDWA examples:
253+
254+
.. code-block:: nasm
255+
256+
v_mov_b32 v1, v2 dst_sel:BYTE_0 dst_unused:UNUSED_PRESERVE src0_sel:DWORD
257+
v_min_u32 v200, v200, v1 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:BYTE_1 src1_sel:DWORD
258+
v_sin_f32 v0, v0 dst_unused:UNUSED_PAD src0_sel:WORD_1
259+
v_fract_f32 v0, |v0| dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:WORD_1
260+
v_cmpx_le_u32 vcc, v1, v2 src0_sel:BYTE_2 src1_sel:WORD_0
261+
262+
For full list of supported instructions, refer to "Vector ALU instructions".
263+
264+
HSA Code Object Directives
265+
--------------------------
266+
267+
AMDGPU ABI defines auxiliary data in output code object. In assembly source,
268+
one can specify them with assembler directives.
121269

122270
.hsa_code_object_version major, minor
123271
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
124272

125273
*major* and *minor* are integers that specify the version of the HSA code
126-
object that will be generated by the assembler. This value will be stored
127-
in an entry of the .note section.
274+
object that will be generated by the assembler.
128275

129276
.hsa_code_object_isa [major, minor, stepping, vendor, arch]
130277
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
@@ -135,12 +282,14 @@ set architecture (ISA) version of the assembly program.
135282
*vendor* and *arch* are quoted strings. *vendor* should always be equal to
136283
"AMD" and *arch* should always be equal to "AMDGPU".
137284

138-
If no arguments are specified, then the assembler will derive the ISA version,
139-
*vendor*, and *arch* from the value of the -mcpu option that is passed to the
140-
assembler.
285+
By default, the assembler will derive the ISA version, *vendor*, and *arch*
286+
from the value of the -mcpu option that is passed to the assembler.
287+
288+
.amdgpu_hsa_kernel (name)
289+
^^^^^^^^^^^^^^^^^^^^^^^^^
141290

142-
ISA version, *vendor*, and *arch* will all be stored in a single entry of the
143-
.note section.
291+
This directives specifies that the symbol with given name is a kernel entry point
292+
(label) and the object should contain corresponding symbol of type STT_AMDGPU_HSA_KERNEL.
144293

145294
.amd_kernel_code_t
146295
^^^^^^^^^^^^^^^^^^
@@ -165,9 +314,8 @@ used. The default value for all keys is 0, with the following exceptions:
165314
The *.amd_kernel_code_t* directive must be placed immediately after the
166315
function label and before any instructions.
167316

168-
For a full list of amd_kernel_code_t keys, see the examples in
169-
test/CodeGen/AMDGPU/hsa.s. For an explanation of the meanings of the different
170-
keys, see the comments in lib/Target/AMDGPU/AmdKernelCodeT.h
317+
For a full list of amd_kernel_code_t keys, refer to AMDGPU ABI document,
318+
comments in lib/Target/AMDGPU/AmdKernelCodeT.h and test/CodeGen/AMDGPU/hsa.s.
171319

172320
Here is an example of a minimal amd_kernel_code_t specification:
173321

llvm/docs/CompilerWriterInfo.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -78,8 +78,10 @@ AMDGPU
7878
* `AMD Cayman/Trinity shader ISA <http://developer.amd.com/wordpress/media/2012/10/AMD_HD_6900_Series_Instruction_Set_Architecture.pdf>`_
7979
* `AMD Southern Islands Series ISA <http://developer.amd.com/wordpress/media/2012/12/AMD_Southern_Islands_Instruction_Set_Architecture.pdf>`_
8080
* `AMD Sea Islands Series ISA <http://developer.amd.com/wordpress/media/2013/07/AMD_Sea_Islands_Instruction_Set_Architecture.pdf>`_
81+
* `AMD GCN3 Instruction Set Architecture <http://amd-dev.wpengine.netdna-cdn.com/wordpress/media/2013/12/AMD_GCN3_Instruction_Set_Architecture_rev1.1.pdf>`__
8182
* `AMD GPU Programming Guide <http://developer.amd.com/download/AMD_Accelerated_Parallel_Processing_OpenCL_Programming_Guide.pdf>`_
8283
* `AMD Compute Resources <http://developer.amd.com/tools/heterogeneous-computing/amd-accelerated-parallel-processing-app-sdk/documentation/>`_
84+
* `AMDGPU Compute Application Binary Interface <https://github.com/RadeonOpenCompute/ROCm-ComputeABI-Doc/blob/master/AMDGPU-ABI.md>`__
8385

8486
SPARC
8587
-----

0 commit comments

Comments
 (0)