[opticalFlow-hip] Unsupported arg combination in `hipCreateTextureObject`

## Summary

`opticalFlow-hip` crashes at runtime on AMD ROCm with `hipErrorOutOfMemory` (error 2)
when the GPU computation starts. The error originates from `hipCreateTextureObject`
calls and is caused by using `hipResourceTypePitch2D` with `normalizedCoords = true`
and `hipFilterModeLinear`, a combination that is not supported on AMD hardware.

## Environment

| Item | Value |
|------|-------|
| GPU | AMD Instinct MI250X |
| Architecture | gfx90a |
| ROCm version | TheRock distribution (April 2026) |
| Compiler | hipcc (clang-based) |
| OS | Linux |

## Affected Files

- `src/opticalFlow-hip/downscaleKernel.cuh` — `Downscale()` host wrapper
- `src/opticalFlow-hip/warpingKernel.cuh` — `WarpImage()` host wrapper
- `src/opticalFlow-hip/upscaleKernel.cuh` — `Upscale()` host wrapper
- `src/opticalFlow-hip/derivativesKernel.cuh` — `ComputeDerivatives()` host wrapper

## Steps to Reproduce

```bash
cd src/opticalFlow-hip
make
./main ../opticalFlow-cuda/data/frame10.ppm ../opticalFlow-cuda/data/frame11.ppm
```

## Observed Output

```
HSOpticalFlow Starting...

Loading "../opticalFlow-cuda/data/frame10.ppm" ...
Loading "../opticalFlow-cuda/data/frame11.ppm" ...
Computing optical flow on CPU...
Processing time on CPU: 2326.059326 (ms)
Computing optical flow on GPU...
HIP Error = 2: out of memory from file ./downscaleKernel.cuh, line 94
HIP Error = 2: out of memory from file ./downscaleKernel.cuh, line 94
... (repeated across all texture-using kernels)
Memory access fault by GPU node-1 on address (nil). Reason: Unknown.
make: *** [Makefile:62: run] Broken pipe
```

## Expected Output

```
HSOpticalFlow Starting...

Loading "../opticalFlow-cuda/data/frame10.ppm" ...
Loading "../opticalFlow-cuda/data/frame11.ppm" ...
Computing optical flow on CPU...
Processing time on CPU: 2298.088867 (ms)
Computing optical flow on GPU...
Processing time on Device: 384.415527 (ms)
L1 error : 0.044196
```

## Root Cause

All four kernel wrapper functions set up `hipTextureObject_t` using a
`hipResourceTypePitch2D` resource descriptor with the following texture descriptor:

```cpp
texRes.resType = hipResourceTypePitch2D;
texRes.res.pitch2D.devPtr = (void *)src;
texRes.res.pitch2D.desc = hipCreateChannelDesc<float>();
texRes.res.pitch2D.width = width;
texRes.res.pitch2D.height = height;
texRes.res.pitch2D.pitchInBytes = stride * sizeof(float);

texDescr.normalizedCoords = true;
texDescr.filterMode = hipFilterModeLinear;
texDescr.addressMode[0] = hipAddressModeMirror;
texDescr.addressMode[1] = hipAddressModeMirror;
```

On AMD ROCm, this combination fails. `hipResourceTypePitch2D` resources are backed by
linear device memory (`HSA_EXT_IMAGE_DATA_LAYOUT_LINEAR`), and the HSA runtime rejects
creating a linear-layout image when the requested sampler requires normalized coordinates
or linear filtering. No single source document states this as a named hardware rule;
the constraint is implicit and emerges from the ROCr image implementation described
in the failure chain below.

### Failure chain (traced through CLR and ROCr source)

1. `hipCreateTextureObject` calls `ihipCreateTextureObject`
   (`clr/hipamd/src/hip_texture.cpp:51`).
2. No validation check catches the unsupported combination at this level — the Pitch2D
   validation block (`hip_texture.cpp:130–138`) only checks pointer alignment and
   dimension limits, not sampler compatibility.
3. `ihipCreateTextureObject` creates an `amd::Sampler` with `normalizedCoords=true` and
   `filterMode=LINEAR` — this succeeds without error (`hip_texture.cpp:189–202`).
4. It then calls `ihipImageCreate` for the Pitch2D resource, which maps to
   `CL_MEM_OBJECT_IMAGE2D` (`clr/hipamd/src/hip_conversions.hpp:140–141`) backed by the
   raw device buffer.
5. `image->create()` internally calls `Hsa::image_create_with_layout(...,
   HSA_EXT_IMAGE_DATA_LAYOUT_LINEAR, ...)` (`clr/rocclr/device/rocm/rocmemory.cpp:1391–1393`).
6. Inside the HSA runtime (`rocr-runtime`), `ImageRuntime::CreateImageHandle`
   (`image/image_runtime.cpp:485–486`) assigns `tile_mode = Image::TileMode::LINEAR`
   for any `HSA_EXT_IMAGE_DATA_LAYOUT_LINEAR` image. Before populating the image SRD it
   calls `ImageManagerAi::CalculateImageSizeAndAlignment`
   (`image/image_manager_ai.cpp:95–126`), which uses addrlib to compute the required row
   pitch for an `ADDR_SW_LINEAR` surface. When the addrlib-computed pitch does not match
   the caller-supplied `image_data_row_pitch` (i.e. `stride * sizeof(float)` from CLR),
   `CalculateImageSizeAndAlignment` returns
   **`HSA_EXT_STATUS_ERROR_IMAGE_PITCH_UNSUPPORTED` (0x3002)**:
   ```cpp
   if (desc.geometry != HSA_EXT_IMAGE_GEOMETRY_1DB &&
       image_data_layout == HSA_EXT_IMAGE_DATA_LAYOUT_LINEAR &&
       ((image_data_row_pitch && (rowPitch != image_data_row_pitch)) ||
        (image_data_slice_pitch && (slicePitch != image_data_slice_pitch)))) {
     return static_cast<hsa_status_t>(HSA_EXT_STATUS_ERROR_IMAGE_PITCH_UNSUPPORTED);
   }
   ```
   `CreateImageHandle` propagates this status, causing `hsa_ext_image_create_with_layout`
   to return `0x3002` to CLR. **GDB-confirmed**: a live debugging session observed
   `hsa_ext_sampler_create_v2` (sampler creation) return `HSA_STATUS_SUCCESS (0)` and
   `hsa_ext_image_create_with_layout` return `0x3002`, with the call stack showing
   `ImageManagerAi::CalculateImageSizeAndAlignment` as the origin.
7. `ihipImageCreate` catches the `create()` failure (which internally routed through
   `Image::createView()` for the Pitch2D resource, `clr/rocclr/device/rocm/rocmemory.cpp:1469–1472`)
   and returns `nullptr` with `status = hipErrorOutOfMemory` — the error code is
   **hardcoded** regardless of the actual failure reason
   (`clr/hipamd/src/hip_memory.cpp:1116–1120`):
   ```cpp
   if (!image->create(nullptr)) {
       status = hipErrorOutOfMemory;   // ← wrong: should be hipErrorNotSupported
       delete image;
       return nullptr;
   }
   ```
8. Back in `ihipCreateTextureObject`, `image == nullptr` causes the function to return
   `hipErrorOutOfMemory` to the caller.

Because `checkCudaErrors` only prints the error without aborting, execution continues
with an invalid texture object, causing a GPU memory access fault on address NULL.

This code works on NVIDIA CUDA because NVIDIA hardware supports normalized-coordinate
sampling and linear filtering on linearly-laid-out (pitch2D) memory directly.

## Fix

Replace `hipResourceTypePitch2D` with `hipResourceTypeArray` in each wrapper.
Stage the pitch2D device data into a `hipArray_t` before creating the texture object.
CUDA arrays fully support normalized coordinates and linear filtering on AMD hardware.

The pattern to apply in each affected wrapper:

```cpp
// Before (broken on ROCm):
hipResourceDesc texRes;
texRes.resType = hipResourceTypePitch2D;
texRes.res.pitch2D.devPtr = (void *)src;
texRes.res.pitch2D.desc = hipCreateChannelDesc<float>();
texRes.res.pitch2D.width = width;
texRes.res.pitch2D.height = height;
texRes.res.pitch2D.pitchInBytes = stride * sizeof(float);
// ... set texDescr with normalizedCoords=true, filterMode=Linear ...
checkCudaErrors(hipCreateTextureObject(&tex, &texRes, &texDescr, NULL));
MyKernel<<<blocks, threads>>>(..., tex);

// After (works on ROCm):
hipArray_t srcArray;
hipChannelFormatDesc channelDesc = hipCreateChannelDesc<float>();
checkCudaErrors(hipMallocArray(&srcArray, &channelDesc, width, height));
checkCudaErrors(hipMemcpy2DToArray(srcArray, 0, 0, src,
                                   stride * sizeof(float),
                                   width * sizeof(float), height,
                                   hipMemcpyDeviceToDevice));

hipResourceDesc texRes;
memset(&texRes, 0, sizeof(hipResourceDesc));
texRes.resType = hipResourceTypeArray;
texRes.res.array.array = srcArray;
// ... set texDescr with normalizedCoords=true, filterMode=Linear (unchanged) ...
hipTextureObject_t tex;
checkCudaErrors(hipCreateTextureObject(&tex, &texRes, &texDescr, NULL));
MyKernel<<<blocks, threads>>>(..., tex);

checkCudaErrors(hipDestroyTextureObject(tex));
checkCudaErrors(hipFreeArray(srcArray));
```

Note: `hipMemcpy2DToArray` takes the actual image width in bytes (`width * sizeof(float)`)
as the transfer width, not the padded stride, so padding columns are not copied into
the array (which has no padding).


## Additional Notes
- The `checkCudaErrors` macro in `common.h` does not abort on error, so the program
  continues past the failed `hipCreateTextureObject` and subsequently crashes inside
  a GPU kernel when it dereferences the invalid texture object.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[opticalFlow-hip] Unsupported arg combination in `hipCreateTextureObject` #190

Summary

Environment

Affected Files

Steps to Reproduce

Observed Output

Expected Output

Root Cause

Failure chain (traced through CLR and ROCr source)

Fix

Additional Notes

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Item	Value
GPU	AMD Instinct MI250X
Architecture	gfx90a
ROCm version	TheRock distribution (April 2026)
Compiler	hipcc (clang-based)
OS	Linux

[opticalFlow-hip] Unsupported arg combination in hipCreateTextureObject #190

Description

Summary

Environment

Affected Files

Steps to Reproduce

Observed Output

Expected Output

Root Cause

Failure chain (traced through CLR and ROCr source)

Fix

Additional Notes

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

[opticalFlow-hip] Unsupported arg combination in `hipCreateTextureObject` #190