Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#BONUS intrinsics that might be useful #84

Open
8 of 33 tasks
p0nce opened this issue Oct 18, 2021 · 2 comments
Open
8 of 33 tasks

#BONUS intrinsics that might be useful #84

p0nce opened this issue Oct 18, 2021 · 2 comments

Comments

@p0nce
Copy link
Collaborator

p0nce commented Oct 18, 2021

Add one here every time you wish for one:

  • _mm_cvtpd_epi64 that would convert 2x double using MXCSR would speed up things for arm and non-avx x86 => actually a AVX512DQ + AVX512VL existing instruction
  • _mm_abs_ps
  • _mm_movemask_epi16
  • _mm_cmpge_epi8
  • _mm_cmpge_epi16 (twice)
  • _mm_cmple_epi8
  • _mm_cmple_epi16
  • _mm_not_si128

Ideas from Alfred Klomp

  • mm_absdiff_epu16
  • mm_absdiff_epu8
  • mm_blendv_si128
  • mm_bswap_epi16
  • mm_bswap_epi32
  • mm_bswap_epi64
  • mm_bswap_si128
  • mm_cmpge_epu16
  • mm_cmpge_epu8
  • mm_cmpgt_epu16
  • mm_cmpgt_epu8
  • mm_cmple_epu16
  • mm_cmple_epu8
  • mm_cmplt_epu16
  • mm_cmplt_epu8
  • mm_div255_epu16
  • mm_div_epu8
  • mm_divfast_epu16
  • mm_divfast_epu8
  • mm_max_epu16
  • mm_min_epu16
  • mm_not_si128
  • mm_scale_epu8
  • _mm256_unpacklo_si128
  • _mm256_unpackhi_si128
@p0nce
Copy link
Collaborator Author

p0nce commented Jan 24, 2022

@p0nce
Copy link
Collaborator Author

p0nce commented Oct 2, 2022

complex multiply, complex add, complex sub, complex divide

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant