Skip to content

[opticalFlow-hip] Unsupported arg combination in hipCreateTextureObject #190

@maarquitos14

Description

@maarquitos14

Summary

opticalFlow-hip crashes at runtime on AMD ROCm with hipErrorOutOfMemory (error 2)
when the GPU computation starts. The error originates from hipCreateTextureObject
calls and is caused by using hipResourceTypePitch2D with normalizedCoords = true
and hipFilterModeLinear, a combination that is not supported on AMD hardware.

Environment

Item Value
GPU AMD Instinct MI250X
Architecture gfx90a
ROCm version TheRock distribution (April 2026)
Compiler hipcc (clang-based)
OS Linux

Affected Files

  • src/opticalFlow-hip/downscaleKernel.cuhDownscale() host wrapper
  • src/opticalFlow-hip/warpingKernel.cuhWarpImage() host wrapper
  • src/opticalFlow-hip/upscaleKernel.cuhUpscale() host wrapper
  • src/opticalFlow-hip/derivativesKernel.cuhComputeDerivatives() host wrapper

Steps to Reproduce

cd src/opticalFlow-hip
make
./main ../opticalFlow-cuda/data/frame10.ppm ../opticalFlow-cuda/data/frame11.ppm

Observed Output

HSOpticalFlow Starting...

Loading "../opticalFlow-cuda/data/frame10.ppm" ...
Loading "../opticalFlow-cuda/data/frame11.ppm" ...
Computing optical flow on CPU...
Processing time on CPU: 2326.059326 (ms)
Computing optical flow on GPU...
HIP Error = 2: out of memory from file ./downscaleKernel.cuh, line 94
HIP Error = 2: out of memory from file ./downscaleKernel.cuh, line 94
... (repeated across all texture-using kernels)
Memory access fault by GPU node-1 on address (nil). Reason: Unknown.
make: *** [Makefile:62: run] Broken pipe

Expected Output

HSOpticalFlow Starting...

Loading "../opticalFlow-cuda/data/frame10.ppm" ...
Loading "../opticalFlow-cuda/data/frame11.ppm" ...
Computing optical flow on CPU...
Processing time on CPU: 2298.088867 (ms)
Computing optical flow on GPU...
Processing time on Device: 384.415527 (ms)
L1 error : 0.044196

Root Cause

All four kernel wrapper functions set up hipTextureObject_t using a
hipResourceTypePitch2D resource descriptor with the following texture descriptor:

texRes.resType = hipResourceTypePitch2D;
texRes.res.pitch2D.devPtr = (void *)src;
texRes.res.pitch2D.desc = hipCreateChannelDesc<float>();
texRes.res.pitch2D.width = width;
texRes.res.pitch2D.height = height;
texRes.res.pitch2D.pitchInBytes = stride * sizeof(float);

texDescr.normalizedCoords = true;
texDescr.filterMode = hipFilterModeLinear;
texDescr.addressMode[0] = hipAddressModeMirror;
texDescr.addressMode[1] = hipAddressModeMirror;

On AMD ROCm, this combination fails. hipResourceTypePitch2D resources are backed by
linear device memory (HSA_EXT_IMAGE_DATA_LAYOUT_LINEAR), and the HSA runtime rejects
creating a linear-layout image when the requested sampler requires normalized coordinates
or linear filtering. No single source document states this as a named hardware rule;
the constraint is implicit and emerges from the ROCr image implementation described
in the failure chain below.

Failure chain (traced through CLR and ROCr source)

  1. hipCreateTextureObject calls ihipCreateTextureObject
    (clr/hipamd/src/hip_texture.cpp:51).
  2. No validation check catches the unsupported combination at this level — the Pitch2D
    validation block (hip_texture.cpp:130–138) only checks pointer alignment and
    dimension limits, not sampler compatibility.
  3. ihipCreateTextureObject creates an amd::Sampler with normalizedCoords=true and
    filterMode=LINEAR — this succeeds without error (hip_texture.cpp:189–202).
  4. It then calls ihipImageCreate for the Pitch2D resource, which maps to
    CL_MEM_OBJECT_IMAGE2D (clr/hipamd/src/hip_conversions.hpp:140–141) backed by the
    raw device buffer.
  5. image->create() internally calls Hsa::image_create_with_layout(..., HSA_EXT_IMAGE_DATA_LAYOUT_LINEAR, ...) (clr/rocclr/device/rocm/rocmemory.cpp:1391–1393).
  6. Inside the HSA runtime (rocr-runtime), ImageRuntime::CreateImageHandle
    (image/image_runtime.cpp:485–486) assigns tile_mode = Image::TileMode::LINEAR
    for any HSA_EXT_IMAGE_DATA_LAYOUT_LINEAR image. Before populating the image SRD it
    calls ImageManagerAi::CalculateImageSizeAndAlignment
    (image/image_manager_ai.cpp:95–126), which uses addrlib to compute the required row
    pitch for an ADDR_SW_LINEAR surface. When the addrlib-computed pitch does not match
    the caller-supplied image_data_row_pitch (i.e. stride * sizeof(float) from CLR),
    CalculateImageSizeAndAlignment returns
    HSA_EXT_STATUS_ERROR_IMAGE_PITCH_UNSUPPORTED (0x3002):
    if (desc.geometry != HSA_EXT_IMAGE_GEOMETRY_1DB &&
        image_data_layout == HSA_EXT_IMAGE_DATA_LAYOUT_LINEAR &&
        ((image_data_row_pitch && (rowPitch != image_data_row_pitch)) ||
         (image_data_slice_pitch && (slicePitch != image_data_slice_pitch)))) {
      return static_cast<hsa_status_t>(HSA_EXT_STATUS_ERROR_IMAGE_PITCH_UNSUPPORTED);
    }
    CreateImageHandle propagates this status, causing hsa_ext_image_create_with_layout
    to return 0x3002 to CLR. GDB-confirmed: a live debugging session observed
    hsa_ext_sampler_create_v2 (sampler creation) return HSA_STATUS_SUCCESS (0) and
    hsa_ext_image_create_with_layout return 0x3002, with the call stack showing
    ImageManagerAi::CalculateImageSizeAndAlignment as the origin.
  7. ihipImageCreate catches the create() failure (which internally routed through
    Image::createView() for the Pitch2D resource, clr/rocclr/device/rocm/rocmemory.cpp:1469–1472)
    and returns nullptr with status = hipErrorOutOfMemory — the error code is
    hardcoded regardless of the actual failure reason
    (clr/hipamd/src/hip_memory.cpp:1116–1120):
    if (!image->create(nullptr)) {
        status = hipErrorOutOfMemory;   // ← wrong: should be hipErrorNotSupported
        delete image;
        return nullptr;
    }
  8. Back in ihipCreateTextureObject, image == nullptr causes the function to return
    hipErrorOutOfMemory to the caller.

Because checkCudaErrors only prints the error without aborting, execution continues
with an invalid texture object, causing a GPU memory access fault on address NULL.

This code works on NVIDIA CUDA because NVIDIA hardware supports normalized-coordinate
sampling and linear filtering on linearly-laid-out (pitch2D) memory directly.

Fix

Replace hipResourceTypePitch2D with hipResourceTypeArray in each wrapper.
Stage the pitch2D device data into a hipArray_t before creating the texture object.
CUDA arrays fully support normalized coordinates and linear filtering on AMD hardware.

The pattern to apply in each affected wrapper:

// Before (broken on ROCm):
hipResourceDesc texRes;
texRes.resType = hipResourceTypePitch2D;
texRes.res.pitch2D.devPtr = (void *)src;
texRes.res.pitch2D.desc = hipCreateChannelDesc<float>();
texRes.res.pitch2D.width = width;
texRes.res.pitch2D.height = height;
texRes.res.pitch2D.pitchInBytes = stride * sizeof(float);
// ... set texDescr with normalizedCoords=true, filterMode=Linear ...
checkCudaErrors(hipCreateTextureObject(&tex, &texRes, &texDescr, NULL));
MyKernel<<<blocks, threads>>>(..., tex);

// After (works on ROCm):
hipArray_t srcArray;
hipChannelFormatDesc channelDesc = hipCreateChannelDesc<float>();
checkCudaErrors(hipMallocArray(&srcArray, &channelDesc, width, height));
checkCudaErrors(hipMemcpy2DToArray(srcArray, 0, 0, src,
                                   stride * sizeof(float),
                                   width * sizeof(float), height,
                                   hipMemcpyDeviceToDevice));

hipResourceDesc texRes;
memset(&texRes, 0, sizeof(hipResourceDesc));
texRes.resType = hipResourceTypeArray;
texRes.res.array.array = srcArray;
// ... set texDescr with normalizedCoords=true, filterMode=Linear (unchanged) ...
hipTextureObject_t tex;
checkCudaErrors(hipCreateTextureObject(&tex, &texRes, &texDescr, NULL));
MyKernel<<<blocks, threads>>>(..., tex);

checkCudaErrors(hipDestroyTextureObject(tex));
checkCudaErrors(hipFreeArray(srcArray));

Note: hipMemcpy2DToArray takes the actual image width in bytes (width * sizeof(float))
as the transfer width, not the padded stride, so padding columns are not copied into
the array (which has no padding).

Additional Notes

  • The checkCudaErrors macro in common.h does not abort on error, so the program
    continues past the failed hipCreateTextureObject and subsequently crashes inside
    a GPU kernel when it dereferences the invalid texture object.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions