Skip to content

fix: softmax x86 miss tail data feat: use rcp to replace 1/x#6368

Merged
nihui merged 7 commits intoTencent:masterfrom
futz12:softmax
Oct 23, 2025
Merged

fix: softmax x86 miss tail data feat: use rcp to replace 1/x#6368
nihui merged 7 commits intoTencent:masterfrom
futz12:softmax

Conversation

@futz12
Copy link
Copy Markdown
Contributor

@futz12 futz12 commented Oct 22, 2025

fix: softmax x86 miss tail
feat: use rcp to replace 1/x

feat: use rcp to replace 1/x
@github-actions github-actions bot added the x86 label Oct 22, 2025
@nihui nihui closed this Oct 22, 2025
@nihui nihui reopened this Oct 22, 2025
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Oct 22, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 95.89%. Comparing base (adee6a0) to head (ce52997).
⚠️ Report is 7 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #6368      +/-   ##
==========================================
- Coverage   95.89%   95.89%   -0.01%     
==========================================
  Files         840      840              
  Lines      265942   265941       -1     
==========================================
- Hits       255022   255021       -1     
  Misses      10920    10920              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@nihui nihui requested a review from Copilot October 22, 2025 11:23
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes a bug in the x86 softmax implementation where tail data was incorrectly processed, and improves performance by replacing division operations with optimized reciprocal calculations using Newton-Raphson refinement.

Key changes:

  • Fixed calculation of nn_size to properly handle tail elements using ceiling division: (size + sizen - 1) / sizen
  • Introduced optimized reciprocal functions (_mm_rcp_nr_ps, _mm256_rcp_nr_ps, _mm512_rcp_nr_ps) that use Newton-Raphson refinement for better accuracy than raw rcp approximations
  • Replaced all _mm*_div_ps division operations with the new reciprocal functions throughout the softmax implementations

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@nihui nihui closed this Oct 22, 2025
@nihui nihui reopened this Oct 22, 2025
@github-actions
Copy link
Copy Markdown

The binary size change of libncnn.so (bytes)

architecture base size pr size difference
x86_64 15190272 15190272 0 😘
armhf 6196764 6196716 -48 😘
aarch64 9522968 9523440 +472 ⚠️

@nihui nihui merged commit c18cc6d into Tencent:master Oct 23, 2025
103 checks passed
@nihui
Copy link
Copy Markdown
Member

nihui commented Oct 23, 2025

Thanks for your contribution !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants