Skip to content

some activation opt on x86#6604

Merged
nihui merged 1 commit intoTencent:masterfrom
futz12:some-activation-opt-on-x86
Mar 18, 2026
Merged

some activation opt on x86#6604
nihui merged 1 commit intoTencent:masterfrom
futz12:some-activation-opt-on-x86

Conversation

@futz12
Copy link
Copy Markdown
Contributor

@futz12 futz12 commented Mar 17, 2026

support erf simd on x86
support gelu normal mode on x86

support gelu normal mode on x86
@github-actions github-actions bot added the x86 label Mar 17, 2026
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Mar 17, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 93.47%. Comparing base (7237643) to head (b71ab52).

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #6604      +/-   ##
==========================================
+ Coverage   93.41%   93.47%   +0.05%     
==========================================
  Files         868      869       +1     
  Lines      275540   274712     -828     
==========================================
- Hits       257391   256776     -615     
+ Misses      18149    17936     -213     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds SIMD-optimized erf function implementations for x86 (SSE, AVX, AVX512) and uses them to support the normal (non-fast) GELU activation mode with SIMD packing on x86, removing the previous fallback to scalar-only processing for normal GELU.

Changes:

  • Added erf_ps, erf256_ps, and erf512_ps SIMD erf approximations in the respective mathfun headers
  • Added new Erf_x86 layer with SIMD-accelerated forward_inplace
  • Extended GELU_x86 to support normal mode (erf-based) with SIMD, removing the support_packing = false guard

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated no comments.

Show a summary per file
File Description
src/layer/x86/sse_mathfun.h Added SSE erf_ps polynomial approximation
src/layer/x86/avx_mathfun.h Added AVX erf256_ps polynomial approximation
src/layer/x86/avx512_mathfun.h Added AVX512 erf512_ps polynomial approximation
src/layer/x86/erf_x86.h New Erf_x86 layer header
src/layer/x86/erf_x86.cpp New Erf_x86 layer with SIMD forward_inplace
src/layer/x86/gelu_x86.cpp Added normal GELU mode SIMD paths alongside existing fast GELU

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@nihui nihui merged commit 2eca3bc into Tencent:master Mar 18, 2026
87 of 88 checks passed
@nihui
Copy link
Copy Markdown
Member

nihui commented Mar 18, 2026

Thanks for your contribution !

chenglimin pushed a commit to chenglimin/ncnn that referenced this pull request Apr 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants