Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Size and alignment of 3-component built-in vector types are incompatible with CUDA #706

Open
ex-rzr opened this issue Oct 11, 2018 · 7 comments

Comments

@ex-rzr
Copy link
Contributor

ex-rzr commented Oct 11, 2018

HIP 3-component vectors have in fact same sizes and alignments as 4-component vectors.

CUDA 3-component vectors are packed:
sizeof(T3) = 3 * sizeof(T) and alignof(T3) = alignof(T)

See https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#built-in-vector-types

    std::cout << "sizeof(float1): " << sizeof(float1) << '\n';
    std::cout << "sizeof(float2): " << sizeof(float2) << '\n';
    std::cout << "sizeof(float3): " << sizeof(float3) << '\n';
    std::cout << "sizeof(float4): " << sizeof(float4) << '\n';

    std::cout << "alignof(float1): " << alignof(float1) << '\n';
    std::cout << "alignof(float2): " << alignof(float2) << '\n';
    std::cout << "alignof(float3): " << alignof(float3) << '\n';
    std::cout << "alignof(float4): " << alignof(float4) << '\n';

HIP (hcc):

sizeof(float1): 4
sizeof(float2): 8
sizeof(float3): 16 <<<<<<<<<<<<<<<
sizeof(float4): 16

alignof(float1): 4
alignof(float2): 8
alignof(float3): 16 <<<<<<<<<<<<<<<
alignof(float4): 16

CUDA

sizeof(float1): 4
sizeof(float2): 8
sizeof(float3): 12 <<<<<<<<<<<<<<<
sizeof(float4): 16

alignof(float1): 4
alignof(float2): 8
alignof(float3): 4 <<<<<<<<<<<<<<<
alignof(float4): 16

If this is correct behavior then it should be mentioned in the docs (https://github.com/ROCm-Developer-Tools/HIP/blob/master/docs/markdown/hip_kernel_language.md#short-vector-types).

@JosephGeoBenjamin
Copy link

Hi,
This issue causes serious bug in OpenCV library; Since CV uses short vectors of uchar for 3-channel images, the results are not as expected, on moving data from one data structure to other on HIP-backend on AMD.

Can anyone point out workarounds or fixes related to this issue ?

@ex-rzr
Copy link
Contributor Author

ex-rzr commented Feb 8, 2019

I use this workaround:

template <> __forceinline__ __device__
void gather_force_store<float3>(const float fx, const float fy, const float fz,
        const int stride, const int pos,
        float3* force) {
  // Store into non-strided "float3" array
#if defined(__HIP_PLATFORM_HCC__)
  // Workaround: unlike CUDA, HIP-hcc has sizeof(float3) != sizeof(CudaForce) (and == sizeof(float4))
  // TODO-HIP: Remove when https://github.com/ROCm-Developer-Tools/HIP/issues/706 is fixed
  reinterpret_cast<float*>(force)[pos * 3 + 0] = fx;
  reinterpret_cast<float*>(force)[pos * 3 + 1] = fy;
  reinterpret_cast<float*>(force)[pos * 3 + 2] = fz;
#else
  force[pos].x = fx;
  force[pos].y = fy;
  force[pos].z = fz;
#endif
}

(not sure that it can be applied to your situation)

@JosephGeoBenjamin
Copy link

JosephGeoBenjamin commented Feb 8, 2019

@ex-rzr Thanks for your response but I guess I don't have access to that repo. (gives 404 error)

@ex-rzr
Copy link
Contributor Author

ex-rzr commented Feb 8, 2019

oh, sorry. The previous message has been updated.

@NEELMCW
Copy link

NEELMCW commented Feb 8, 2019

@ex-rzr Hi Anton

could you provide us access to view NAMD repository ?

@ex-rzr
Copy link
Contributor Author

ex-rzr commented Feb 8, 2019

@NEELMCW I don't have permissions to change the repo's settings.
But I posted the code with the workaround I use there.

@xuhuisheng
Copy link

This issue had solved on ROCm-5.1.0.
ROCm-Developer-Tools/hipamd@e27f645

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants