Mi250x: pointer is unknown by backend (error code = HIP:1) #1047
Replies: 1 comment 4 replies
-
I guess I figured out what the culprit is. The application which I am working on is two explicitly coupled solvers. It has some complex mesh partitioning. It can so happen that one partition of the second solver won't get any elements to process (it becomes relevant during strong scaling when we substantially increase the number of MPI processes). Therefore, I guess Does anybody know what SYCL standard tells about it? In my opinion, it is essential to establish the same behavior for all backends. In theory, it can be easily done. For example, when a user calls |
Beta Was this translation helpful? Give feedback.
-
Hi all,
I am running my application on multiple nodes of LUMI (4 Mi250x per node). The program terminates with a bunch of
hipSYCL errors
i.e.,The problem occurs only on a multi-node setup which is weird i.e., no issues while running the up on single node with 8 MPI processes.
my current openSYCL configuration
$ opensycl-info
=================Backend information===================
Loaded backend 0: HIP
Found device:
Loaded backend 1: OpenMP
Found device: hipSYCL OpenMP host device
=================Device information===================
***************** Devices for backend HIP *****************
Device 0:
General device information:
Name:
Backend: HIP
Vendor: AMD
Arch: gfx90a:sramecc+:xnack+
Driver version: 50422804
Is CPU: 0
Is GPU: 1
Default executor information:
Is in-order queue: 1
Is out-of-order queue: 0
Is task graph: 0
Device support queries:
images: 0
error_correction: 0
host_unified_memory: 0
little_endian: 1
global_mem_cache: 1
global_mem_cache_read_only: 0
global_mem_cache_read_write: 1
emulated_local_memory: 0
sub_group_independent_forward_progress: 1
usm_device_allocations: 1
usm_host_allocations: 1
usm_atomic_host_allocations: 0
usm_shared_allocations: 1
usm_atomic_shared_allocations: 0
usm_system_allocations: 0
execution_timestamps: 1
sscp_kernels: 0
Device properties:
max_compute_units: 110
max_global_size0: 2199023254528
max_global_size1: 2199023254528
max_global_size2: 2199023254528
max_group_size: 1024
max_num_sub_groups: 16
preferred_vector_width_char: 4
preferred_vector_width_double: 1
preferred_vector_width_float: 1
preferred_vector_width_half: 2
preferred_vector_width_int: 1
preferred_vector_width_long: 1
preferred_vector_width_short: 2
native_vector_width_char: 4
native_vector_width_double: 1
native_vector_width_float: 1
native_vector_width_half: 2
native_vector_width_int: 1
native_vector_width_long: 1
native_vector_width_short: 2
max_clock_speed: 1700
max_malloc_size: 68702699520
address_bits: 64
max_read_image_args: 0
max_write_image_args: 0
image2d_max_width: 0
image2d_max_height: 0
image3d_max_width: 0
image3d_max_height: 0
image3d_max_depth: 0
image_max_buffer_size: 0
image_max_array_size: 0
max_samplers: 0
max_parameter_size: 18446744073709551615
mem_base_addr_align: 8
global_mem_cache_line_size: 128
global_mem_cache_size: 8388608
global_mem_size: 68702699520
max_constant_buffer_size: 2147483647
max_constant_args: 18446744073709551615
local_mem_size: 65536
printf_buffer_size: 18446744073709551615
partition_max_sub_devices: 0
vendor_id: 1022
sub_group_sizes: 64
***************** Devices for backend OpenMP *****************
Device 0:
General device information:
Name: hipSYCL OpenMP host device
Backend: OpenMP
Vendor: the hipSYCL project
Arch:
Driver version: 1.2
Is CPU: 1
Is GPU: 0
Default executor information:
Is in-order queue: 1
Is out-of-order queue: 0
Is task graph: 0
Device support queries:
images: 0
error_correction: 0
host_unified_memory: 1
little_endian: 1
global_mem_cache: 1
global_mem_cache_read_only: 0
global_mem_cache_read_write: 1
emulated_local_memory: 1
sub_group_independent_forward_progress: 0
usm_device_allocations: 1
usm_host_allocations: 1
usm_atomic_host_allocations: 1
usm_shared_allocations: 1
usm_atomic_shared_allocations: 1
usm_system_allocations: 1
execution_timestamps: 1
sscp_kernels: 0
Device properties:
max_compute_units: 8
max_global_size0: 18446744073709551615
max_global_size1: 18446744073709551615
max_global_size2: 18446744073709551615
max_group_size: 1024
max_num_sub_groups: 18446744073709551615
preferred_vector_width_char: 4
preferred_vector_width_double: 1
preferred_vector_width_float: 1
preferred_vector_width_half: 2
preferred_vector_width_int: 1
preferred_vector_width_long: 1
preferred_vector_width_short: 2
native_vector_width_char: 4
native_vector_width_double: 1
native_vector_width_float: 1
native_vector_width_half: 2
native_vector_width_int: 1
native_vector_width_long: 1
native_vector_width_short: 2
max_clock_speed: 0
max_malloc_size: 18446744073709551615
address_bits: 64
max_read_image_args: 0
max_write_image_args: 0
image2d_max_width: 0
image2d_max_height: 0
image3d_max_width: 0
image3d_max_height: 0
image3d_max_depth: 0
image_max_buffer_size: 0
image_max_array_size: 0
max_samplers: 0
max_parameter_size: 18446744073709551615
mem_base_addr_align: 8
global_mem_cache_line_size: 64
global_mem_cache_size: 1
global_mem_size: 18446744073709551615
max_constant_buffer_size: 18446744073709551615
max_constant_args: 18446744073709551615
local_mem_size: 18446744073709551615
printf_buffer_size: 18446744073709551615
partition_max_sub_devices: 0
vendor_id: 18446744073709551615
sub_group_sizes: 1
openSYCL version
CMake Configuration
SLURM Batch script
So far, I haven't found a way to write a small reproducible example yet. Therefore, I just want to start a discussion.
Here is my `hipcc` version
The following code is called while calling
hip_allocator::free(void *mem)
(Regardless whether the memory was allocated withmalloc_device
ormalloc_shared
)https://github.com/OpenSYCL/OpenSYCL/blob/12fdcaedfa990ab58ddf8bce304fa8cf917e6182/src/runtime/hip/hip_allocator.cpp#L122-L156
Does anybody have an idea why it is happening?
Another problem occurs with
allocate_usm
.hipMallocManaged
does not returnhipSuccess
.https://github.com/OpenSYCL/OpenSYCL/blob/12fdcaedfa990ab58ddf8bce304fa8cf917e6182/src/runtime/hip/hip_allocator.cpp#L98-L113
Beta Was this translation helpful? Give feedback.
All reactions