Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Translate x86_64 SSE to ppc64le VSX intrinsics #4807

Merged
merged 11 commits into from Jul 6, 2023
Merged

Commits on Jul 3, 2023

  1. Translate x86_64 SSE to ppc64le VSX intrinsics

    Yields a quite large speedup on POWER9. See this article for background:
    
    https://www.talospace.com/2019/07/easier-power-vectorizing-for-fun-and.html
    Jeremy Rand committed Jul 3, 2023
    Copy the full SHA
    9b7ac8a View commit details
    Browse the repository at this point in the history
  2. Only translate SSE4.1 to VSX if _mm_packus_epi32 available

    _mm_packus_epi32 was added in GCC v12.1.
    Jeremy Rand committed Jul 3, 2023
    Copy the full SHA
    e7398ae View commit details
    Browse the repository at this point in the history
  3. Revert "Only translate SSE4.1 to VSX if _mm_packus_epi32 available"

    This reverts commit e7398ae.
    Jeremy Rand committed Jul 3, 2023
    Copy the full SHA
    0af89b2 View commit details
    Browse the repository at this point in the history
  4. Revert "Translate x86_64 SSE to ppc64le VSX intrinsics"

    This reverts commit 9b7ac8a.
    Jeremy Rand committed Jul 3, 2023
    Copy the full SHA
    a7fa19c View commit details
    Browse the repository at this point in the history
  5. Add POWER9 VSX toolchains

    Translating x86_64 SSE to ppc64le VSX intrinsics yields a quite large
    speedup on POWER9. See this article for background:
    
    https://www.talospace.com/2019/07/easier-power-vectorizing-for-fun-and.html
    Jeremy Rand committed Jul 3, 2023
    Copy the full SHA
    6adef41 View commit details
    Browse the repository at this point in the history

Commits on Jul 6, 2023

  1. power9le clang toolchain: Fix missing C++ include path

    Jeremy Rand committed Jul 6, 2023
    Copy the full SHA
    7335b6a View commit details
    Browse the repository at this point in the history
  2. Add power9le docs

    Jeremy Rand committed Jul 6, 2023
    Copy the full SHA
    8c9feca View commit details
    Browse the repository at this point in the history
  3. Rename NCNN_SSE4_1 to NCNN_SSE41

    Jeremy Rand committed Jul 6, 2023
    Copy the full SHA
    049b4cf View commit details
    Browse the repository at this point in the history
  4. power9le toolchains: Remove redundant NCNN_TARGET_ARCH

    Jeremy Rand committed Jul 6, 2023
    Copy the full SHA
    659d71e View commit details
    Browse the repository at this point in the history
  5. Remove linux-clang-power9le-vsx CI job

    Not sure why it was failing, will investigate later and try to fix and
    re-enable it.
    Jeremy Rand committed Jul 6, 2023
    Copy the full SHA
    ad5bf0e View commit details
    Browse the repository at this point in the history
  6. Copy the full SHA
    46b7a2a View commit details
    Browse the repository at this point in the history