Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

avx512_int32.hpp uses _mm512_mask_i64scatter_epi32 incorrectly #1294

Closed
jeffhammond opened this issue Jul 12, 2022 · 4 comments
Closed

avx512_int32.hpp uses _mm512_mask_i64scatter_epi32 incorrectly #1294

jeffhammond opened this issue Jul 12, 2022 · 4 comments

Comments

@jeffhammond
Copy link
Contributor

You are passing a 512-bit register when a 256-bit register is required.

void _mm512_mask_i64scatter_epi32 (void* base_addr, 
                                   __mmask8 k,
                                   __m512i vindex,
                                   __m256i a,        // <- NOTICE THIS ONE
                                   int scale)

(from docs)

It is used incorrectly in policy/tensor/arch/avx512/avx512_int32.hpp here:

      /*!
       * @brief Store partial register to consecutive memory locations
       *
       */
      RAJA_INLINE
      self_type const &store_strided_n(element_type *ptr, camp::idx_t stride, camp::idx_t N) const{
        // AVX512F
        _mm512_mask_i64scatter_epi32(ptr,
                                     createMask(N),
                                     createStridedOffsets(stride),
                                     m_value,        // <- NOTICE THIS ONE
                                     sizeof(element_type));
        return *this;
      }

You declare the argument here:

    public:
      using register_type = __m512i;
    private:
      register_type m_value;

This bug was found by NVC++, which refuses to compile this code.

@jeffhammond
Copy link
Contributor Author

Also, please expand tabs in this file. Not all of us use tab = 2 spaces, and this file is unreadable with the defaults of some editors.

@ajkunen
Copy link
Contributor

ajkunen commented Jul 12, 2022

@jeffhammond thanks for bug report! we'll take care of both those issues.

@eoseret
Copy link

eoseret commented Aug 10, 2022

Hi,
If I understand correctly, just do as for gathers (that compiles correctly): replace i64 with i32 (size of index elements) and then compilation will be fine.
4 occurrences: store_strided and store_strided_n in both avx512_float.hpp and avx512_int32.hpp
Git-patch enclosed, feel free to look at it.
raja_AVX512_scatter_error_git_patch.txt

@rchen20 rchen20 self-assigned this Aug 11, 2022
@rchen20
Copy link
Member

rchen20 commented May 16, 2023

Closing this issue because it is fixed here #1339.

@rchen20 rchen20 closed this as completed May 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants