@@ -8,6 +8,8 @@ Introduction
8
8
The AMDGPU back-end provides ISA code generation for AMD GPUs, starting with
9
9
the R600 family up until the current Volcanic Islands (GCN Gen 3).
10
10
11
+ Refer to `AMDGPU section in Architecture & Platform Information for Compiler Writers <CompilerWriterInfo.html#amdgpu >`_
12
+ for additional documentation.
11
13
12
14
Conventions
13
15
===========
@@ -35,96 +37,241 @@ OpenCL standard.
35
37
Assembler
36
38
=========
37
39
38
- The assembler is currently considered experimental.
40
+ AMDGPU backend has LLVM-MC based assembler which is currently in development.
41
+ It supports Southern Islands ISA, Sea Islands and Volcanic Islands.
39
42
40
- For syntax examples look in test/MC/AMDGPU.
43
+ This document describes general syntax for instructions and operands. For more
44
+ information about instructions, their semantics and supported combinations
45
+ of operands, refer to one of Instruction Set Architecture manuals.
41
46
42
- Below some of the currently supported features (modulo bugs). These
43
- all apply to the Southern Islands ISA, Sea Islands and Volcanic Islands
44
- are also supported but may be missing some instructions and have more bugs:
47
+ An instruction has the following syntax (register operands are
48
+ normally comma-separated while extra operands are space-separated):
45
49
46
- DS Instructions
47
- ---------------
48
- All DS instructions are supported.
50
+ *<opcode> <register_operand0>, ... <extra_operand0> ... *
49
51
50
- FLAT Instructions
51
- ------------------
52
- These instructions are only present in the Sea Islands and Volcanic Islands
53
- instruction set. All FLAT instructions are supported for these architectures
54
52
55
- MUBUF Instructions
56
- ------------------
57
- All non-atomic MUBUF instructions are supported.
53
+ Operands
54
+ --------
58
55
59
- SMRD Instructions
60
- -----------------
61
- Only the s_load_dword* SMRD instructions are supported.
56
+ The following syntax for register operands is supported:
62
57
63
- SOP1 Instructions
64
- -----------------
65
- All SOP1 instructions are supported.
58
+ * SGPR registers: s0, ... or s[0], ...
59
+ * VGPR registers: v0, ... or v[0], ...
60
+ * TTMP registers: ttmp0, ... or ttmp[0], ...
61
+ * Special registers: exec (exec_lo, exec_hi), vcc (vcc_lo, vcc_hi), flat_scratch (flat_scratch_lo, flat_scratch_hi)
62
+ * Special trap registers: tba (tba_lo, tba_hi), tma (tma_lo, tma_hi)
63
+ * Register pairs, quads, etc: s[2:3], v[10:11], ttmp[5:6], s[4:7], v[12:15], ttmp[4:7], s[8:15], ...
64
+ * Register lists: [s0, s1], [ttmp0, ttmp1, ttmp2, ttmp3]
65
+ * Register index expressions: v[2*2], s[1-1:2-1]
66
+ * 'off' indicates that an operand is not enabled
66
67
67
- SOP2 Instructions
68
- -----------------
69
- All SOP2 instructions are supported.
68
+ The following extra operands are supported:
70
69
71
- SOPC Instructions
72
- -----------------
73
- All SOPC instructions are supported.
70
+ * offset, offset0, offset1
71
+ * idxen, offen bits
72
+ * glc, slc, tfe bits
73
+ * waitcnt: integer or combination of counter values
74
+ * VOP3 modifiers:
74
75
75
- SOPP Instructions
76
- -----------------
76
+ - abs (\| \| ), neg (\- )
77
77
78
- Unless otherwise mentioned, all SOPP instructions that have one or more
79
- operands accept integer operands only. No verification is performed
80
- on the operands, so it is up to the programmer to be familiar with the
81
- range or acceptable values.
78
+ * DPP modifiers:
79
+
80
+ - row_shl, row_shr, row_ror, row_rol
81
+ - row_mirror, row_half_mirror, row_bcast
82
+ - wave_shl, wave_shr, wave_ror, wave_rol, quad_perm
83
+ - row_mask, bank_mask, bound_ctrl
82
84
83
- s_waitcnt
84
- ^^^^^^^^^
85
+ * SDWA modifiers:
85
86
86
- s_waitcnt accepts named arguments to specify which memory counter(s) to
87
- wait for.
87
+ - dst_sel, src0_sel, src1_sel (BYTE_N, WORD_M, DWORD)
88
+ - dst_unused (UNUSED_PAD, UNUSED_SEXT, UNUSED_PRESERVE)
89
+ - abs, neg, sext
90
+
91
+ DS Instructions Examples
92
+ ------------------------
88
93
89
94
.. code-block :: nasm
90
95
91
- ; Wait for all counters to be 0
92
- s_waitcnt 0
96
+ ds_add_u32 v2, v4 offset:16
97
+ ds_write_src2_b64 v2 offset0:4 offset1:8
98
+ ds_cmpst_f32 v2, v4, v6
99
+ ds_min_rtn_f64 v[8:9], v2, v[4:5]
93
100
94
- ; Equivalent to s_waitcnt 0. Counter names can also be delimited by
95
- ; '&' or ','.
96
- s_waitcnt vmcnt(0) expcnt(0) lgkcmt(0)
97
101
98
- ; Wait for vmcnt counter to be 1.
99
- s_waitcnt vmcnt(1)
102
+ For full list of supported instructions, refer to "LDS/GDS instructions" in ISA Manual.
100
103
101
- VOP1, VOP2, VOP3, VOPC Instructions
102
- -----------------------------------
104
+ FLAT Instruction Examples
105
+ --------------------------
103
106
104
- All 32-bit and 64-bit encodings should work.
107
+ .. code-block :: nasm
105
108
106
- The assembler will automatically detect which encoding size to use for
107
- VOP1, VOP2, and VOPC instructions based on the operands. If you want to force
108
- a specific encoding size, you can add an _e32 (for 32-bit encoding) or
109
- _e64 (for 64-bit encoding) suffix to the instruction. Most, but not all
110
- instructions support an explicit suffix. These are all valid assembly
111
- strings:
109
+ flat_load_dword v1, v[3:4]
110
+ flat_store_dwordx3 v[3:4], v[5:7]
111
+ flat_atomic_swap v1, v[3:4], v5 glc
112
+ flat_atomic_cmpswap v1, v[3:4], v[5:6] glc slc
113
+ flat_atomic_fmax_x2 v[1:2], v[3:4], v[5:6] glc
114
+
115
+ For full list of supported instructions, refer to "FLAT instructions" in ISA Manual.
116
+
117
+ MUBUF Instruction Examples
118
+ ---------------------------
112
119
113
120
.. code-block :: nasm
114
121
115
- v_mul_i32_i24 v1, v2, v3
116
- v_mul_i32_i24_e32 v1, v2, v3
117
- v_mul_i32_i24_e64 v1, v2, v3
122
+ buffer_load_dword v1, off, s[4:7], s1
123
+ buffer_store_dwordx4 v[1:4], v2, ttmp[4:7], s1 offen offset:4 glc tfe
124
+ buffer_store_format_xy v[1:2], off, s[4:7], s1
125
+ buffer_wbinvl1
126
+ buffer_atomic_inc v1, v2, s[8:11], s4 idxen offset:4 slc
127
+
128
+ For full list of supported instructions, refer to "MUBUF Instructions" in ISA Manual.
129
+
130
+ SMRD/SMEM Instruction Examples
131
+ -------------------------------
132
+
133
+ .. code-block :: nasm
134
+
135
+ s_load_dword s1, s[2:3], 0xfc
136
+ s_load_dwordx8 s[8:15], s[2:3], s4
137
+ s_load_dwordx16 s[88:103], s[2:3], s4
138
+ s_dcache_inv_vol
139
+ s_memtime s[4:5]
140
+
141
+ For full list of supported instructions, refer to "Scalar Memory Operations" in ISA Manual.
142
+
143
+ SOP1 Instruction Examples
144
+ --------------------------
145
+
146
+ .. code-block :: nasm
147
+
148
+ s_mov_b32 s1, s2
149
+ s_mov_b64 s[0:1], 0x80000000
150
+ s_cmov_b32 s1, 200
151
+ s_wqm_b64 s[2:3], s[4:5]
152
+ s_bcnt0_i32_b64 s1, s[2:3]
153
+ s_swappc_b64 s[2:3], s[4:5]
154
+ s_cbranch_join s[4:5]
155
+
156
+ For full list of supported instructions, refer to "SOP1 Instructions" in ISA Manual.
157
+
158
+ SOP2 Instruction Examples
159
+ -------------------------
160
+
161
+ .. code-block :: nasm
162
+
163
+ s_add_u32 s1, s2, s3
164
+ s_and_b64 s[2:3], s[4:5], s[6:7]
165
+ s_cselect_b32 s1, s2, s3
166
+ s_andn2_b32 s2, s4, s6
167
+ s_lshr_b64 s[2:3], s[4:5], s6
168
+ s_ashr_i32 s2, s4, s6
169
+ s_bfm_b64 s[2:3], s4, s6
170
+ s_bfe_i64 s[2:3], s[4:5], s6
171
+ s_cbranch_g_fork s[4:5], s[6:7]
172
+
173
+ For full list of supported instructions, refer to "SOP2 Instructions" in ISA Manual.
174
+
175
+ SOPC Instruction Examples
176
+ --------------------------
177
+
178
+ .. code-block :: nasm
118
179
119
- Assembler Directives
120
- --------------------
180
+ s_cmp_eq_i32 s1, s2
181
+ s_bitcmp1_b32 s1, s2
182
+ s_bitcmp0_b64 s[2:3], s4
183
+ s_setvskip s3, s5
184
+
185
+ For full list of supported instructions, refer to "SOPC Instructions" in ISA Manual.
186
+
187
+ SOPP Instruction Examples
188
+ --------------------------
189
+
190
+ .. code-block :: nasm
191
+
192
+ s_barrier
193
+ s_nop 2
194
+ s_endpgm
195
+ s_waitcnt 0 ; Wait for all counters to be 0
196
+ s_waitcnt vmcnt(0) & expcnt(0) & lgkmcnt(0) ; Equivalent to above
197
+ s_waitcnt vmcnt(1) ; Wait for vmcnt counter to be 1.
198
+ s_sethalt 9
199
+ s_sleep 10
200
+ s_sendmsg 0x1
201
+ s_sendmsg sendmsg(MSG_INTERRUPT)
202
+ s_trap 1
203
+
204
+ For full list of supported instructions, refer to "SOPP Instructions" in ISA Manual.
205
+
206
+ Unless otherwise mentioned, little verification is performed on the operands
207
+ of SOPP Instrucitons, so it is up to the programmer to be familiar with the
208
+ range or acceptable values.
209
+
210
+ Vector ALU Instruction Examples
211
+ -------------------------------
212
+
213
+ For vector ALU instruction opcodes (VOP1, VOP2, VOP3, VOPC, VOP_DPP, VOP_SDWA),
214
+ the assembler will automatically use optimal encoding based on its operands.
215
+ To force specific encoding, one can add a suffix to the opcode of the instruction:
216
+
217
+ * _e32 for 32-bit VOP1/VOP2/VOPC
218
+ * _e64 for 64-bit VOP3
219
+ * _dpp for VOP_DPP
220
+ * _sdwa for VOP_SDWA
221
+
222
+ VOP1/VOP2/VOP3/VOPC examples:
223
+
224
+ .. code-block :: nasm
225
+
226
+ v_mov_b32 v1, v2
227
+ v_mov_b32_e32 v1, v2
228
+ v_nop
229
+ v_cvt_f64_i32_e32 v[1:2], v2
230
+ v_floor_f32_e32 v1, v2
231
+ v_bfrev_b32_e32 v1, v2
232
+ v_add_f32_e32 v1, v2, v3
233
+ v_mul_i32_i24_e64 v1, v2, 3
234
+ v_mul_i32_i24_e32 v1, -3, v3
235
+ v_mul_i32_i24_e32 v1, -100, v3
236
+ v_addc_u32 v1, s[0:1], v2, v3, s[2:3]
237
+ v_max_f16_e32 v1, v2, v3
238
+
239
+ VOP_DPP examples:
240
+
241
+ .. code-block :: nasm
242
+
243
+ v_mov_b32 v0, v0 quad_perm:[0,2,1,1]
244
+ v_sin_f32 v0, v0 row_shl:1 row_mask:0xa bank_mask:0x1 bound_ctrl:0
245
+ v_mov_b32 v0, v0 wave_shl:1
246
+ v_mov_b32 v0, v0 row_mirror
247
+ v_mov_b32 v0, v0 row_bcast:31
248
+ v_mov_b32 v0, v0 quad_perm:[1,3,0,1] row_mask:0xa bank_mask:0x1 bound_ctrl:0
249
+ v_add_f32 v0, v0, |v0| row_shl:1 row_mask:0xa bank_mask:0x1 bound_ctrl:0
250
+ v_max_f16 v1, v2, v3 row_shl:1 row_mask:0xa bank_mask:0x1 bound_ctrl:0
251
+
252
+ VOP_SDWA examples:
253
+
254
+ .. code-block :: nasm
255
+
256
+ v_mov_b32 v1, v2 dst_sel:BYTE_0 dst_unused:UNUSED_PRESERVE src0_sel:DWORD
257
+ v_min_u32 v200, v200, v1 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:BYTE_1 src1_sel:DWORD
258
+ v_sin_f32 v0, v0 dst_unused:UNUSED_PAD src0_sel:WORD_1
259
+ v_fract_f32 v0, |v0| dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:WORD_1
260
+ v_cmpx_le_u32 vcc, v1, v2 src0_sel:BYTE_2 src1_sel:WORD_0
261
+
262
+ For full list of supported instructions, refer to "Vector ALU instructions".
263
+
264
+ HSA Code Object Directives
265
+ --------------------------
266
+
267
+ AMDGPU ABI defines auxiliary data in output code object. In assembly source,
268
+ one can specify them with assembler directives.
121
269
122
270
.hsa_code_object_version major, minor
123
271
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
124
272
125
273
*major * and *minor * are integers that specify the version of the HSA code
126
- object that will be generated by the assembler. This value will be stored
127
- in an entry of the .note section.
274
+ object that will be generated by the assembler.
128
275
129
276
.hsa_code_object_isa [major, minor, stepping, vendor, arch]
130
277
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
@@ -135,12 +282,14 @@ set architecture (ISA) version of the assembly program.
135
282
*vendor * and *arch * are quoted strings. *vendor * should always be equal to
136
283
"AMD" and *arch * should always be equal to "AMDGPU".
137
284
138
- If no arguments are specified, then the assembler will derive the ISA version,
139
- *vendor *, and *arch * from the value of the -mcpu option that is passed to the
140
- assembler.
285
+ By default, the assembler will derive the ISA version, *vendor *, and *arch *
286
+ from the value of the -mcpu option that is passed to the assembler.
287
+
288
+ .amdgpu_hsa_kernel (name)
289
+ ^^^^^^^^^^^^^^^^^^^^^^^^^
141
290
142
- ISA version, * vendor *, and * arch * will all be stored in a single entry of the
143
- .note section .
291
+ This directives specifies that the symbol with given name is a kernel entry point
292
+ (label) and the object should contain corresponding symbol of type STT_AMDGPU_HSA_KERNEL .
144
293
145
294
.amd_kernel_code_t
146
295
^^^^^^^^^^^^^^^^^^
@@ -165,9 +314,8 @@ used. The default value for all keys is 0, with the following exceptions:
165
314
The *.amd_kernel_code_t * directive must be placed immediately after the
166
315
function label and before any instructions.
167
316
168
- For a full list of amd_kernel_code_t keys, see the examples in
169
- test/CodeGen/AMDGPU/hsa.s. For an explanation of the meanings of the different
170
- keys, see the comments in lib/Target/AMDGPU/AmdKernelCodeT.h
317
+ For a full list of amd_kernel_code_t keys, refer to AMDGPU ABI document,
318
+ comments in lib/Target/AMDGPU/AmdKernelCodeT.h and test/CodeGen/AMDGPU/hsa.s.
171
319
172
320
Here is an example of a minimal amd_kernel_code_t specification:
173
321
0 commit comments