@@ -1123,8 +1123,119 @@ transformed and loaded back into the JIT pipeline via
1123
1123
LLVM/OpenMP Target Host Runtime Plugins (``libomptarget.rtl.XXXX ``)
1124
1124
-------------------------------------------------------------------
1125
1125
1126
- .. _device_runtime :
1126
+ The LLVM/OpenMP target host runtime plugins were recently re-implemented,
1127
+ temporarily renamed as the NextGen plugins, and set as the default and only
1128
+ plugins' implementation. Currently, these plugins have support for the NVIDIA
1129
+ and AMDGPU devices as well as the GenericELF64bit host-simulated device.
1130
+
1131
+ The source code of the common infrastructure and the vendor-specific plugins is
1132
+ in the ``openmp/libomptarget/nextgen-plugins `` directory in the LLVM project
1133
+ repository. The plugin infrastructure aims at unifying the plugin code and logic
1134
+ into a generic interface using object-oriented C++. There is a plugin interface
1135
+ composed by multiple generic C++ classes which implement the common logic that
1136
+ every vendor-specific plugin should provide. In turn, the specific plugins
1137
+ inherit from those generic classes and implement the required functions that
1138
+ depend on the specific vendor API. As an example, some generic classes that the
1139
+ plugin interface define are for representing a device, a device image, an
1140
+ efficient resource manager, etc.
1141
+
1142
+ With this common plugin infrastructure, several tasks have been simplified:
1143
+ adding a new vendor-specific plugin, adding generic features or optimizations
1144
+ to all plugins, debugging plugins, etc.
1127
1145
1146
+ Environment Variables
1147
+ ^^^^^^^^^^^^^^^^^^^^^
1148
+
1149
+ There are several environment variables to change the behavior of the plugins:
1150
+
1151
+ * ``LIBOMPTARGET_SHARED_MEMORY_SIZE ``
1152
+ * ``LIBOMPTARGET_STACK_SIZE ``
1153
+ * ``LIBOMPTARGET_HEAP_SIZE ``
1154
+ * ``LIBOMPTARGET_NUM_INITIAL_STREAMS ``
1155
+ * ``LIBOMPTARGET_NUM_INITIAL_EVENTS ``
1156
+ * ``LIBOMPTARGET_LOCK_MAPPED_HOST_BUFFERS ``
1157
+ * ``LIBOMPTARGET_AMDGPU_NUM_HSA_QUEUES ``
1158
+ * ``LIBOMPTARGET_AMDGPU_HSA_QUEUE_SIZE ``
1159
+ * ``LIBOMPTARGET_AMDGPU_TEAMS_PER_CU ``
1160
+ * ``LIBOMPTARGET_AMDGPU_MAX_ASYNC_COPY_BYTES ``
1161
+ * ``LIBOMPTARGET_AMDGPU_NUM_INITIAL_HSA_SIGNALS ``
1162
+
1163
+ The environment variables ``LIBOMPTARGET_SHARED_MEMORY_SIZE ``,
1164
+ ``LIBOMPTARGET_STACK_SIZE `` and ``LIBOMPTARGET_HEAP_SIZE `` are described in
1165
+ :ref: `libopenmptarget_environment_vars `.
1166
+
1167
+ LIBOMPTARGET_NUM_INITIAL_STREAMS
1168
+ """"""""""""""""""""""""""""""""
1169
+
1170
+ This environment variable sets the number of pre-created streams in the plugin
1171
+ (if supported) at initialization. More streams will be created dynamically
1172
+ throughout the execution if needed. A stream is a queue of asynchronous
1173
+ operations (e.g., kernel launches and memory copies) that are executed
1174
+ sequentially. Parallelism is achieved by featuring multiple streams. The
1175
+ ``libomptarget `` leverages streams to exploit parallelism between plugin
1176
+ operations. The default value is ``32 ``.
1177
+
1178
+ LIBOMPTARGET_NUM_INITIAL_EVENTS
1179
+ """""""""""""""""""""""""""""""
1180
+
1181
+ This environment variable sets the number of pre-created events in the
1182
+ plugin (if supported) at initialization. More events will be created
1183
+ dynamically throughout the execution if needed. An event is used to synchronize
1184
+ a stream with another efficiently. The default value is ``32 ``.
1185
+
1186
+ LIBOMPTARGET_LOCK_MAPPED_HOST_BUFFERS
1187
+ """""""""""""""""""""""""""""""""""""
1188
+
1189
+ This environment variable indicates whether the host buffers mapped by the user
1190
+ should be automatically locked/pinned by the plugin. Pinned host buffers allow
1191
+ true asynchronous copies between the host and devices. Enabling this feature can
1192
+ increase the performance of applications that are intensive in host-device
1193
+ memory transfers. The default value is ``false ``.
1194
+
1195
+ LIBOMPTARGET_AMDGPU_NUM_HSA_QUEUES
1196
+ """"""""""""""""""""""""""""""""""
1197
+
1198
+ This environment variable controls the number of HSA queues per device in the
1199
+ AMDGPU plugin. An HSA queue is a runtime-allocated resource that contains an
1200
+ AQL (Architected Queuing Language) packet buffer and is associated with an AQL
1201
+ packet processor. HSA queues are used for inserting kernel packets to launching
1202
+ kernel executions. A high number of HSA queues may degrade the performance. The
1203
+ default value is ``4 ``.
1204
+
1205
+ LIBOMPTARGET_AMDGPU_HSA_QUEUE_SIZE
1206
+ """"""""""""""""""""""""""""""""""
1207
+
1208
+ This environment variable controls the size of each HSA queue in the AMDGPU
1209
+ plugin. The size is the number of AQL packets an HSA queue is expected to hold.
1210
+ It is also the number of AQL packets that can be pushed into each queue without
1211
+ waiting the driver to process them. The default value is ``512 ``.
1212
+
1213
+ LIBOMPTARGET_AMDGPU_TEAMS_PER_CU
1214
+ """"""""""""""""""""""""""""""""
1215
+
1216
+ This environment variable controls the default number of teams relative to the
1217
+ number of compute units (CUs) of the AMDGPU device. The default number of teams
1218
+ is ``#default_teams = #teams_per_CU * #CUs ``. The default value of teams per CU
1219
+ is ``4 ``.
1220
+
1221
+ LIBOMPTARGET_AMDGPU_MAX_ASYNC_COPY_BYTES
1222
+ """"""""""""""""""""""""""""""""""""""""
1223
+
1224
+ This environment variable specifies the maximum size in bytes where the memory
1225
+ copies are asynchronous operations in the AMDGPU plugin. Up to this transfer
1226
+ size, the memory copies are asychronous operations pushed to the corresponding
1227
+ stream. For larger transfers, they are synchronous transfers. Memory copies
1228
+ involving already locked/pinned host buffers are always asychronous. The default
1229
+ value is ``1*1024*1024 `` bytes (1 MB).
1230
+
1231
+ LIBOMPTARGET_AMDGPU_NUM_INITIAL_HSA_SIGNALS
1232
+ """""""""""""""""""""""""""""""""""""""""""
1233
+
1234
+ This environment variable controls the initial number of HSA signals per device
1235
+ in the AMDGPU plugin. There is one resource manager of signals per device
1236
+ managing several pre-created signals. These signals are mainly used by AMDGPU
1237
+ streams. More HSA signals will be created dynamically throughout the execution
1238
+ if needed. The default value is ``64 ``.
1128
1239
1129
1240
.. _remote_offloading_plugin :
1130
1241
0 commit comments