Skip to content

fix asan error via x86 tmp buffer alignment#6703

Merged
nihui merged 2 commits into
Tencent:masterfrom
nihui:fix-asan-align
May 8, 2026
Merged

fix asan error via x86 tmp buffer alignment#6703
nihui merged 2 commits into
Tencent:masterfrom
nihui:fix-asan-align

Conversation

@nihui
Copy link
Copy Markdown
Member

@nihui nihui commented May 8, 2026

No description provided.

@github-actions github-actions Bot added the x86 label May 8, 2026
@tencent-adm
Copy link
Copy Markdown
Member

CLA assistant check
Thank you for your submission, we really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

@nihui nihui requested a review from Copilot May 8, 2026 09:19
@nihui
Copy link
Copy Markdown
Member Author

nihui commented May 8, 2026

@codex review

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented May 8, 2026

Codecov Report

❌ Patch coverage is 99.09091% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 93.95%. Comparing base (d95679b) to head (b517514).
⚠️ Report is 5 commits behind head on master.

Files with missing lines Patch % Lines
src/layer/x86/gemm_bf16s.h 96.15% 3 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #6703      +/-   ##
==========================================
- Coverage   93.96%   93.95%   -0.01%     
==========================================
  Files         933      933              
  Lines      299477   299613     +136     
==========================================
+ Hits       281396   281515     +119     
- Misses      18081    18098      +17     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Chef's kiss.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses x86 ASan/alignment-related faults by ensuring temporary stack buffers used with SIMD intrinsics are explicitly aligned, and by switching from unaligned (*_loadu_* / *_storeu_*) to aligned (*_load_* / *_store_*) load/store intrinsics where appropriate.

Changes:

  • Add explicit 16/32/64-byte alignment annotations to stack temporary arrays across multiple x86 kernels.
  • Replace unaligned SIMD loads/stores with aligned variants once alignment is guaranteed.
  • Consolidate some multi-array temporaries into a single aligned buffer with pointer slices (e.g., tmpbuf/sumbuf) to ensure consistent alignment.

Reviewed changes

Copilot reviewed 18 out of 18 changed files in this pull request and generated no comments.

Show a summary per file
File Description
src/layer/x86/innerproduct_fp.h Align sums buffers and use aligned AVX/SSE loads for accumulation initialization.
src/layer/x86/innerproduct_bf16s.h Same alignment + aligned loads change for bf16s innerproduct path.
src/layer/x86/gemm_x86.cpp Align sum buffers used with AVX512/AVX/SSE stores; switch to aligned stores.
src/layer/x86/gemm_int8.h Align temporary output sum buffers; switch to aligned AVX/SSE stores.
src/layer/x86/gemm_bf16s.h Align multiple tmp/sum buffers (float + bf16) and switch to aligned SIMD stores.
src/layer/x86/deconvolution_packed.h Align sum/tmp buffers and switch to aligned AVX512/AVX/SSE loads/stores.
src/layer/x86/deconvolution_packed_bf16s.h Same alignment + aligned load/store changes for bf16s deconvolution.
src/layer/x86/convolution1d_packed.h Align sum buffers and use aligned stores for AVX512/AVX/SSE outputs.
src/layer/x86/convolution1d_packed_bf16s.h Same alignment + aligned stores change for bf16s convolution1d.
src/layer/x86/convolution_packed.h Align sum buffers and use aligned stores for AVX512/AVX/SSE outputs.
src/layer/x86/convolution_packed_int8.h Align int sum buffers and use aligned integer SIMD stores.
src/layer/x86/convolution_packed_bf16s.h Align sum buffers and use aligned stores for bf16s packed convolution outputs.
src/layer/x86/convolution_im2col_gemm.h Align sum buffers and use aligned AVX512/AVX/SSE stores in im2col+gemm path.
src/layer/x86/convolution_im2col_gemm_int8.h Align offset/sum buffers and switch to aligned SSE/AVX integer stores where used.
src/layer/x86/convolution_im2col_gemm_bf16s.h Align sum buffers and use aligned stores for bf16s im2col+gemm output handling.
src/layer/x86/convolution_3x3_winograd.h Align tmp buffers used for Winograd output transforms; switch to aligned stores.
src/layer/x86/convolution_3x3_winograd_int8.h Align tmp int buffers used for Winograd int8 output transforms; switch to aligned stores.
src/layer/x86/convolution_3x3_winograd_bf16s.h Align tmp buffers used for bf16s Winograd output transforms; switch to aligned stores.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@nihui nihui merged commit 8775d9c into Tencent:master May 8, 2026
84 of 87 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants