Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Texture/image support #1253

Open
bernhardmgruber opened this issue Feb 5, 2021 · 10 comments
Open

Texture/image support #1253

bernhardmgruber opened this issue Feb 5, 2021 · 10 comments

Comments

@bernhardmgruber
Copy link
Member

Alpaka currently lacks support for texture/image capabilities of certain backends. This currently concerns the CUDA backend and the currently developed SYCL backend.
Texture/image support was also requested in: #1065
The discussion also came up during the prototyping of kernel side accessors to buffers: #38 and #1249

Since backend support for this feature is scarce, we have two options to implement such a facility:

  1. emulation on backends without texture/image support, e.g. via a wrapper on alpaka::Buf
  2. do not provide the feature and fail to compile

While option 1 is certainly doable, given that only CUDA supports this feature, we might run into a situation where the feature performs suboptimally on non-CUDA backends, because we might not pick the right emulation approach for everyone. E.g. is Z-order storage really the best memory layout? How about weird texture formats (see: https://sycl.readthedocs.io/en/latest/iface/image.html#sycl-image-channel-order)? Bilinear/trilinear interpolation on access? Edge behavior? Normalized texture coordinates? There is a lot we could get wrong or at least bad.

Option 2 is safe from our perspective, but locks users into CUDA (and later SYCL) when they use the feature. So as it stands now they could just use CUDA directly.

We could also mix the options and just provide a very limited texture/image support that we are confident we can emulate.

What is the strategy to go forward wrt. texture/image support?

@bernhardmgruber
Copy link
Member Author

So while HIP did not mention texture support in their documentation, the functionality seems to be there: https://github.com/ROCm-Developer-Tools/HIP/blob/main/include/hip/hcc_detail/texture_functions.h

@sbastrakov
Copy link
Member

sbastrakov commented Feb 10, 2021

I agree with your assessment. I do not think textures are that widely used in computational applications nowadays, as there are now for a long time caches on GPUs (was one of the reasons to use textures for computations in early CUDA days), and their operations like interpolation have limited accuracy. However emulating while I think not that difficult to do to make it just work, without performance requirements, would still require continuous maintenance.

@bernhardmgruber
Copy link
Member Author

We opened a GSoC position for this feature: https://www.casus.science/news-events/events/google-summer-of-code-2021/#anchor-6

@bussmann
Copy link

Dear all, I firmly believe this is a side quest. I think there is more important stuff to do.

@PrometheusPi
Copy link

While this might be a less important task for the overall goal of alpaka, ISAAC would definitely benefit from that.

@psychocoderHPC
Copy link
Member

psychocoderHPC commented Mar 25, 2021

While this might be a less important task for the overall goal of alpaka, ISAAC would definitely benefit from that.

To give it a little bit more context: In ISAAC we can have the case that we visualize multiple data sources with different resolutions within the same kernel. Accessing the data in a texture-like way with normalized indices and automatic interpolation is simplifying the ray casting kernel.

Maybe we can propagate work at some point from ISAAC back into alpaka.

@FelixTUD
Copy link

ISAAC would greatly benefit from textures.
The addressing is not really a problem, as it can easily be emulated with minimal overhead.
Bigger problems, which can be solved with a proper native texture support are:

  1. Caching: currently the data for 3D buffers is in a normal array and as such is cached normally along the array, resulting in many cache misses, as the accesses are most frequently on neighbouring voxels which are most likely at least in 2 of the 3 dimensions far from another in memory and therefore not cached, textures would solve this, as they cache locally in the dimension of the buffer
  2. Interpolation: currently the trilinear interpolation is emulated with 8 buffer reads on neighbouring voxels, which are most likely not cached due to problem 1. and therefore have a very high performance cost, with texture support the interpolation would be done automatically on access and much cheaper
  3. Buffer boundary handling: currently all reads of the 3D buffers need to be boundary checked on every read and different functionalities are emulated if a boundary is reached like texture repeat, clamp and constant color, which would be done much more efficiently with a native texture implementation

@bussmann
Copy link

How long would a texture imp in Alpaka take? Can we test the perf gain by trying it in a CUDA only branch for ISAAC?

@FelixTUD
Copy link

Right now I'm trying to integrate the native cuda textures in ISAAC, that I can hopefully include some performance numbers in my master thesis. And as @psychocoderHPC said, maybe we can propagate some of the work to alpaka, as I need to implement a software emulation for all non cuda capable architectures anyway

@bernhardmgruber
Copy link
Member Author

Here is how I envisioned the design of an image accessor:

    using Image = cudaTextureObject_t; // we likely need an Image type

    template<typename TElem, typename TBufferIdx, typename TAccessModes>
    struct Accessor<Image, TElem, TBufferIdx, 2, TAccessModes> {
        // Vec subscript to be compatible with buffer accessor
        ALPAKA_FN_HOST_ACC auto operator[](Vec<DimInt<2>, TBufferIdx> i) const -> TElem { 
            return (*this)(i[0], i[1]);
        }

        // integral call operator to be compatible with buffer accessor, does texel fetch
        ALPAKA_FN_HOST_ACC auto operator()(TBufferIdx y, TBufferIdx x) const -> TElem { 
            return tex1Dfetch<TElem>(texObj, y * rowPitchInValues + x);
        }

        // floating-point call operator for interpolated access
        ALPAKA_FN_HOST_ACC auto operator()(float y, float x) const -> TElem { 
            return tex2D<TElem>(texObj, x, y);
        }

        Image texObj;
        TBufferIdx rowPitchInValues; // for texel fetch
        Vec<DimInt<2>, TBufferIdx> extents; // compatibility with buffer accessor
    };

TAccessMode probably just allows alpaka::ReadOnly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants