Summary
opticalFlow-hip crashes at runtime on AMD ROCm with hipErrorOutOfMemory (error 2)
when the GPU computation starts. The error originates from hipCreateTextureObject
calls and is caused by using hipResourceTypePitch2D with normalizedCoords = true
and hipFilterModeLinear, a combination that is not supported on AMD hardware.
Environment
| Item |
Value |
| GPU |
AMD Instinct MI250X |
| Architecture |
gfx90a |
| ROCm version |
TheRock distribution (April 2026) |
| Compiler |
hipcc (clang-based) |
| OS |
Linux |
Affected Files
src/opticalFlow-hip/downscaleKernel.cuh — Downscale() host wrapper
src/opticalFlow-hip/warpingKernel.cuh — WarpImage() host wrapper
src/opticalFlow-hip/upscaleKernel.cuh — Upscale() host wrapper
src/opticalFlow-hip/derivativesKernel.cuh — ComputeDerivatives() host wrapper
Steps to Reproduce
cd src/opticalFlow-hip
make
./main ../opticalFlow-cuda/data/frame10.ppm ../opticalFlow-cuda/data/frame11.ppm
Observed Output
HSOpticalFlow Starting...
Loading "../opticalFlow-cuda/data/frame10.ppm" ...
Loading "../opticalFlow-cuda/data/frame11.ppm" ...
Computing optical flow on CPU...
Processing time on CPU: 2326.059326 (ms)
Computing optical flow on GPU...
HIP Error = 2: out of memory from file ./downscaleKernel.cuh, line 94
HIP Error = 2: out of memory from file ./downscaleKernel.cuh, line 94
... (repeated across all texture-using kernels)
Memory access fault by GPU node-1 on address (nil). Reason: Unknown.
make: *** [Makefile:62: run] Broken pipe
Expected Output
HSOpticalFlow Starting...
Loading "../opticalFlow-cuda/data/frame10.ppm" ...
Loading "../opticalFlow-cuda/data/frame11.ppm" ...
Computing optical flow on CPU...
Processing time on CPU: 2298.088867 (ms)
Computing optical flow on GPU...
Processing time on Device: 384.415527 (ms)
L1 error : 0.044196
Root Cause
All four kernel wrapper functions set up hipTextureObject_t using a
hipResourceTypePitch2D resource descriptor with the following texture descriptor:
texRes.resType = hipResourceTypePitch2D;
texRes.res.pitch2D.devPtr = (void *)src;
texRes.res.pitch2D.desc = hipCreateChannelDesc<float>();
texRes.res.pitch2D.width = width;
texRes.res.pitch2D.height = height;
texRes.res.pitch2D.pitchInBytes = stride * sizeof(float);
texDescr.normalizedCoords = true;
texDescr.filterMode = hipFilterModeLinear;
texDescr.addressMode[0] = hipAddressModeMirror;
texDescr.addressMode[1] = hipAddressModeMirror;
On AMD ROCm, this combination fails. hipResourceTypePitch2D resources are backed by
linear device memory (HSA_EXT_IMAGE_DATA_LAYOUT_LINEAR), and the HSA runtime rejects
creating a linear-layout image when the requested sampler requires normalized coordinates
or linear filtering. No single source document states this as a named hardware rule;
the constraint is implicit and emerges from the ROCr image implementation described
in the failure chain below.
Failure chain (traced through CLR and ROCr source)
hipCreateTextureObject calls ihipCreateTextureObject
(clr/hipamd/src/hip_texture.cpp:51).
- No validation check catches the unsupported combination at this level — the Pitch2D
validation block (hip_texture.cpp:130–138) only checks pointer alignment and
dimension limits, not sampler compatibility.
ihipCreateTextureObject creates an amd::Sampler with normalizedCoords=true and
filterMode=LINEAR — this succeeds without error (hip_texture.cpp:189–202).
- It then calls
ihipImageCreate for the Pitch2D resource, which maps to
CL_MEM_OBJECT_IMAGE2D (clr/hipamd/src/hip_conversions.hpp:140–141) backed by the
raw device buffer.
image->create() internally calls Hsa::image_create_with_layout(..., HSA_EXT_IMAGE_DATA_LAYOUT_LINEAR, ...) (clr/rocclr/device/rocm/rocmemory.cpp:1391–1393).
- Inside the HSA runtime (
rocr-runtime), ImageRuntime::CreateImageHandle
(image/image_runtime.cpp:485–486) assigns tile_mode = Image::TileMode::LINEAR
for any HSA_EXT_IMAGE_DATA_LAYOUT_LINEAR image. Before populating the image SRD it
calls ImageManagerAi::CalculateImageSizeAndAlignment
(image/image_manager_ai.cpp:95–126), which uses addrlib to compute the required row
pitch for an ADDR_SW_LINEAR surface. When the addrlib-computed pitch does not match
the caller-supplied image_data_row_pitch (i.e. stride * sizeof(float) from CLR),
CalculateImageSizeAndAlignment returns
HSA_EXT_STATUS_ERROR_IMAGE_PITCH_UNSUPPORTED (0x3002):
if (desc.geometry != HSA_EXT_IMAGE_GEOMETRY_1DB &&
image_data_layout == HSA_EXT_IMAGE_DATA_LAYOUT_LINEAR &&
((image_data_row_pitch && (rowPitch != image_data_row_pitch)) ||
(image_data_slice_pitch && (slicePitch != image_data_slice_pitch)))) {
return static_cast<hsa_status_t>(HSA_EXT_STATUS_ERROR_IMAGE_PITCH_UNSUPPORTED);
}
CreateImageHandle propagates this status, causing hsa_ext_image_create_with_layout
to return 0x3002 to CLR. GDB-confirmed: a live debugging session observed
hsa_ext_sampler_create_v2 (sampler creation) return HSA_STATUS_SUCCESS (0) and
hsa_ext_image_create_with_layout return 0x3002, with the call stack showing
ImageManagerAi::CalculateImageSizeAndAlignment as the origin.
ihipImageCreate catches the create() failure (which internally routed through
Image::createView() for the Pitch2D resource, clr/rocclr/device/rocm/rocmemory.cpp:1469–1472)
and returns nullptr with status = hipErrorOutOfMemory — the error code is
hardcoded regardless of the actual failure reason
(clr/hipamd/src/hip_memory.cpp:1116–1120):
if (!image->create(nullptr)) {
status = hipErrorOutOfMemory; // ← wrong: should be hipErrorNotSupported
delete image;
return nullptr;
}
- Back in
ihipCreateTextureObject, image == nullptr causes the function to return
hipErrorOutOfMemory to the caller.
Because checkCudaErrors only prints the error without aborting, execution continues
with an invalid texture object, causing a GPU memory access fault on address NULL.
This code works on NVIDIA CUDA because NVIDIA hardware supports normalized-coordinate
sampling and linear filtering on linearly-laid-out (pitch2D) memory directly.
Fix
Replace hipResourceTypePitch2D with hipResourceTypeArray in each wrapper.
Stage the pitch2D device data into a hipArray_t before creating the texture object.
CUDA arrays fully support normalized coordinates and linear filtering on AMD hardware.
The pattern to apply in each affected wrapper:
// Before (broken on ROCm):
hipResourceDesc texRes;
texRes.resType = hipResourceTypePitch2D;
texRes.res.pitch2D.devPtr = (void *)src;
texRes.res.pitch2D.desc = hipCreateChannelDesc<float>();
texRes.res.pitch2D.width = width;
texRes.res.pitch2D.height = height;
texRes.res.pitch2D.pitchInBytes = stride * sizeof(float);
// ... set texDescr with normalizedCoords=true, filterMode=Linear ...
checkCudaErrors(hipCreateTextureObject(&tex, &texRes, &texDescr, NULL));
MyKernel<<<blocks, threads>>>(..., tex);
// After (works on ROCm):
hipArray_t srcArray;
hipChannelFormatDesc channelDesc = hipCreateChannelDesc<float>();
checkCudaErrors(hipMallocArray(&srcArray, &channelDesc, width, height));
checkCudaErrors(hipMemcpy2DToArray(srcArray, 0, 0, src,
stride * sizeof(float),
width * sizeof(float), height,
hipMemcpyDeviceToDevice));
hipResourceDesc texRes;
memset(&texRes, 0, sizeof(hipResourceDesc));
texRes.resType = hipResourceTypeArray;
texRes.res.array.array = srcArray;
// ... set texDescr with normalizedCoords=true, filterMode=Linear (unchanged) ...
hipTextureObject_t tex;
checkCudaErrors(hipCreateTextureObject(&tex, &texRes, &texDescr, NULL));
MyKernel<<<blocks, threads>>>(..., tex);
checkCudaErrors(hipDestroyTextureObject(tex));
checkCudaErrors(hipFreeArray(srcArray));
Note: hipMemcpy2DToArray takes the actual image width in bytes (width * sizeof(float))
as the transfer width, not the padded stride, so padding columns are not copied into
the array (which has no padding).
Additional Notes
- The
checkCudaErrors macro in common.h does not abort on error, so the program
continues past the failed hipCreateTextureObject and subsequently crashes inside
a GPU kernel when it dereferences the invalid texture object.
Summary
opticalFlow-hipcrashes at runtime on AMD ROCm withhipErrorOutOfMemory(error 2)when the GPU computation starts. The error originates from
hipCreateTextureObjectcalls and is caused by using
hipResourceTypePitch2DwithnormalizedCoords = trueand
hipFilterModeLinear, a combination that is not supported on AMD hardware.Environment
Affected Files
src/opticalFlow-hip/downscaleKernel.cuh—Downscale()host wrappersrc/opticalFlow-hip/warpingKernel.cuh—WarpImage()host wrappersrc/opticalFlow-hip/upscaleKernel.cuh—Upscale()host wrappersrc/opticalFlow-hip/derivativesKernel.cuh—ComputeDerivatives()host wrapperSteps to Reproduce
cd src/opticalFlow-hip make ./main ../opticalFlow-cuda/data/frame10.ppm ../opticalFlow-cuda/data/frame11.ppmObserved Output
Expected Output
Root Cause
All four kernel wrapper functions set up
hipTextureObject_tusing ahipResourceTypePitch2Dresource descriptor with the following texture descriptor:On AMD ROCm, this combination fails.
hipResourceTypePitch2Dresources are backed bylinear device memory (
HSA_EXT_IMAGE_DATA_LAYOUT_LINEAR), and the HSA runtime rejectscreating a linear-layout image when the requested sampler requires normalized coordinates
or linear filtering. No single source document states this as a named hardware rule;
the constraint is implicit and emerges from the ROCr image implementation described
in the failure chain below.
Failure chain (traced through CLR and ROCr source)
hipCreateTextureObjectcallsihipCreateTextureObject(
clr/hipamd/src/hip_texture.cpp:51).validation block (
hip_texture.cpp:130–138) only checks pointer alignment anddimension limits, not sampler compatibility.
ihipCreateTextureObjectcreates anamd::SamplerwithnormalizedCoords=trueandfilterMode=LINEAR— this succeeds without error (hip_texture.cpp:189–202).ihipImageCreatefor the Pitch2D resource, which maps toCL_MEM_OBJECT_IMAGE2D(clr/hipamd/src/hip_conversions.hpp:140–141) backed by theraw device buffer.
image->create()internally callsHsa::image_create_with_layout(..., HSA_EXT_IMAGE_DATA_LAYOUT_LINEAR, ...)(clr/rocclr/device/rocm/rocmemory.cpp:1391–1393).rocr-runtime),ImageRuntime::CreateImageHandle(
image/image_runtime.cpp:485–486) assignstile_mode = Image::TileMode::LINEARfor any
HSA_EXT_IMAGE_DATA_LAYOUT_LINEARimage. Before populating the image SRD itcalls
ImageManagerAi::CalculateImageSizeAndAlignment(
image/image_manager_ai.cpp:95–126), which uses addrlib to compute the required rowpitch for an
ADDR_SW_LINEARsurface. When the addrlib-computed pitch does not matchthe caller-supplied
image_data_row_pitch(i.e.stride * sizeof(float)from CLR),CalculateImageSizeAndAlignmentreturnsHSA_EXT_STATUS_ERROR_IMAGE_PITCH_UNSUPPORTED(0x3002):CreateImageHandlepropagates this status, causinghsa_ext_image_create_with_layoutto return
0x3002to CLR. GDB-confirmed: a live debugging session observedhsa_ext_sampler_create_v2(sampler creation) returnHSA_STATUS_SUCCESS (0)andhsa_ext_image_create_with_layoutreturn0x3002, with the call stack showingImageManagerAi::CalculateImageSizeAndAlignmentas the origin.ihipImageCreatecatches thecreate()failure (which internally routed throughImage::createView()for the Pitch2D resource,clr/rocclr/device/rocm/rocmemory.cpp:1469–1472)and returns
nullptrwithstatus = hipErrorOutOfMemory— the error code ishardcoded regardless of the actual failure reason
(
clr/hipamd/src/hip_memory.cpp:1116–1120):ihipCreateTextureObject,image == nullptrcauses the function to returnhipErrorOutOfMemoryto the caller.Because
checkCudaErrorsonly prints the error without aborting, execution continueswith an invalid texture object, causing a GPU memory access fault on address NULL.
This code works on NVIDIA CUDA because NVIDIA hardware supports normalized-coordinate
sampling and linear filtering on linearly-laid-out (pitch2D) memory directly.
Fix
Replace
hipResourceTypePitch2DwithhipResourceTypeArrayin each wrapper.Stage the pitch2D device data into a
hipArray_tbefore creating the texture object.CUDA arrays fully support normalized coordinates and linear filtering on AMD hardware.
The pattern to apply in each affected wrapper:
Note:
hipMemcpy2DToArraytakes the actual image width in bytes (width * sizeof(float))as the transfer width, not the padded stride, so padding columns are not copied into
the array (which has no padding).
Additional Notes
checkCudaErrorsmacro incommon.hdoes not abort on error, so the programcontinues past the failed
hipCreateTextureObjectand subsequently crashes insidea GPU kernel when it dereferences the invalid texture object.