fix(types,callconv): register CUDA vector ABI alignment#321
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: ASSERTIVE Plan: Enterprise Run ID: 📒 Files selected for processing (7)
📝 WalkthroughWalkthroughCompute and apply explicit power-of-two alloca alignments (optionally using Numba type alignof_), thread them through return-slot, argument, and IntentPlan out-return allocas/stores/loads in callconv; add CUDA vector alignof registration in types and expand tests to assert lowered IR alignment annotations. ChangesCallconv alignment updates
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@numbast/src/numbast/callconv.py`:
- Around line 111-119: Extract the duplicated CUDA-allocation alignment logic
into a helper function and use it in all four places: create a helper like
_set_cuda_alloca_align(alloca, ty, target_data) that sets alloca.align =
max(target_data.abi_alignment(ty), min(target_data.abi_size(ty), 16)); replace
the repeated expressions (currently using _dl = context.target_data and setting
retval_ptr.align or other alloca.align with max(_dl.abi_alignment(...),
min(_dl.abi_size(...), 16))) by calling this helper with the corresponding
alloca (e.g., retval_ptr), type (e.g., retval_ty), and context.target_data to
centralize the logic and avoid repetition.
- Line 4: Remove the development/tracking marker comment
"NUMBAST_RETVAL_ALIGN_FIX_APPLIED" from the top of
numbast/src/numbast/callconv.py; simply delete that standalone comment line so
the file contains only production code/comments, and ensure no leftover blank
line or stray artifact remains.
- Around line 116-119: Update calls to the target data API and remove
unsupported alloca alignment assignments: replace _dl.abi_alignment(...) and
_dl.abi_size(...) with _dl.get_abi_alignment(...) and _dl.get_abi_size(...)
wherever used (e.g., the calculations around retval_ptr, and the other
occurrences at the same sites later in the file), and remove assignments to
AllocaInstr.align (e.g., retval_ptr.align = ..., and the other .align
assignments) since llvmlite IR AllocaInstr does not support setting .align; if
explicit "align N" is required use the binding layer to emit raw LLVM IR
instead, otherwise rely on the target data defaults and omit the .align
statements.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro Plus
Run ID: 09ca3211-42c8-4afe-84bd-ab4a26b03bf8
📒 Files selected for processing (1)
numbast/src/numbast/callconv.py
CUDA vector types (float2, float4, etc.) carry __align__(N) attributes that require N-byte alignment (float2 → 8 B, float4 → 16 B). LLVM represents them as anonymous structs whose ABI alignment defaults to the element alignment (4 B for float), not the vector alignment. When FunctionCallConv._lower_impl allocates stack slots via builder.alloca / cgutils.alloca_once without an explicit alignment, LLVM emits 4-byte-aligned allocas. The NVRTC-compiled shim then tries to perform a vector load/store (e.g. ld.global.v2.f32 for float2) on that 4-byte-aligned pointer, which violates the 8-byte alignment requirement and raises cudaErrorMisalignedAddress at runtime. Fix: after every alloca in _lower_impl, set alloca.align = max(dl.abi_alignment(ty), min(dl.abi_size(ty), 16)) The cap at 16 bytes covers float4/int4 (the widest standard CUDA vector types) without over-aligning large user-defined structs. The four sites fixed are: 1. retval_ptr — the function return-value slot 2. visible-arg ptrs (no-intent-plan path) 3. out_return ptrs (intent-plan path) 4. visible-arg ptrs (intent-plan path) Fixes optixGetTriangleBarycentrics() (float2 return) and any other Numbast binding that returns or accepts a CUDA vector type.
21c668b to
4fc0c38
Compare
There was a problem hiding this comment.
♻️ Duplicate comments (1)
numbast/src/numbast/callconv.py (1)
135-138: 🧹 Nitpick | 🔵 TrivialDeduplicate the repeated alignment formula into a helper.
The same expression is repeated four times; centralizing it will reduce drift risk and simplify future ABI updates.
Refactor sketch
+def _set_cuda_alloca_align(alloca, value_ty, target_data): + alloca.align = max( + target_data.abi_alignment(value_ty), + min(target_data.abi_size(value_ty), 16), + ) ... - _dl = context.target_data - retval_ptr.align = max( - _dl.abi_alignment(retval_ty), min(_dl.abi_size(retval_ty), 16) - ) + _set_cuda_alloca_align(retval_ptr, retval_ty, context.target_data) ... - _dl = context.target_data - ptr.align = max( - _dl.abi_alignment(vty), min(_dl.abi_size(vty), 16) - ) + _set_cuda_alloca_align(ptr, vty, context.target_data)Also applies to: 186-189, 211-214, 234-237
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@numbast/src/numbast/callconv.py` around lines 135 - 138, Several places set retval_ptr.align using the same expression; extract that logic into a small helper (e.g., abi_retval_align(dl, ty) or _compute_retval_align) that takes the data layout/target_data and the type and returns max(dl.abi_alignment(ty), min(dl.abi_size(ty), 16)); then replace each repeated expression (the assignments to retval_ptr.align found where context.target_data/_dl and retval_ty are used) with a call to this new helper (pass context.target_data or _dl and retval_ty). Ensure the helper is placed in the same module (callconv.py) and update all four locations (the occurrences setting retval_ptr.align) to use it.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@numbast/src/numbast/callconv.py`:
- Around line 135-138: Several places set retval_ptr.align using the same
expression; extract that logic into a small helper (e.g., abi_retval_align(dl,
ty) or _compute_retval_align) that takes the data layout/target_data and the
type and returns max(dl.abi_alignment(ty), min(dl.abi_size(ty), 16)); then
replace each repeated expression (the assignments to retval_ptr.align found
where context.target_data/_dl and retval_ty are used) with a call to this new
helper (pass context.target_data or _dl and retval_ty). Ensure the helper is
placed in the same module (callconv.py) and update all four locations (the
occurrences setting retval_ptr.align) to use it.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro Plus
Run ID: 7a63c268-ad04-4299-9645-7c500649845e
📒 Files selected for processing (1)
numbast/src/numbast/callconv.py
…uristic The previous fix used max(abi_alignment, min(sizeof, 16)) to guess the required alloca alignment for CUDA vector types. This heuristic works for power-of-2 sized types (float2, float4) but is incorrect for non-power-of-2 types like float3/uint3 (sizeof=12 → would produce alignment=12, which is not a valid power-of-2 LLVM alignment). Numbast already propagates alignof_ from ast_canopy onto user-defined bound structs, and already uses getattr(argty, "alignof_", None) for load/store instructions. Apply the same convention to alloca: check alignof_ on the Numba type and set it when present; when absent, leave LLVM's default ABI alignment (correct for scalars and structs without an explicit __align__ attribute). Callers registering built-in CUDA vector types in CTYPE_MAPS must set alignof_ on the Numba type to match the __align__(N) in the CUDA headers (e.g. float32x2.alignof_ = 8 for float2's __align__(8)). This mirrors how ast_canopy-derived struct types already work.
There was a problem hiding this comment.
♻️ Duplicate comments (1)
numbast/src/numbast/callconv.py (1)
4-4:⚠️ Potential issue | 🟡 MinorRemove transient marker comment at Line 4.
# NUMBAST_RETVAL_ALIGN_FIX_APPLIEDlooks like a tracking artifact and should not remain in production source.Proposed fix
-# NUMBAST_RETVAL_ALIGN_FIX_APPLIED from numbast.args import prepare_ir_types🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@numbast/src/numbast/callconv.py` at line 4, Remove the transient tracking comment "# NUMBAST_RETVAL_ALIGN_FIX_APPLIED" from the top of callconv.py; simply delete that standalone marker so the source contains no leftover transient artifact (search for the exact string NUMBAST_RETVAL_ALIGN_FIX_APPLIED to locate it).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@numbast/src/numbast/callconv.py`:
- Line 4: Remove the transient tracking comment "#
NUMBAST_RETVAL_ALIGN_FIX_APPLIED" from the top of callconv.py; simply delete
that standalone marker so the source contains no leftover transient artifact
(search for the exact string NUMBAST_RETVAL_ALIGN_FIX_APPLIED to locate it).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro Plus
Run ID: 18c274b5-7749-4f9e-8607-5769df380ab4
📒 Files selected for processing (1)
numbast/src/numbast/callconv.py
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@numbast/tests/test_callconv.py`:
- Around line 58-68: The test currently depends on numba-cuda vector type
internals by asserting getattr(float2, "alignof_", None) is None and similarly
for float4; instead modify the test to exercise the fallback path of
_get_alloca_alignment directly by passing a sentinel numba_ty that lacks an
alignof_ attribute (e.g., None or a simple object without alignof_) into the
calls that use context.get_value_type(float2)/float4 as the first argument and
the sentinel as the numba_ty argument; update the two assertions to call
_get_alloca_alignment(context, context.get_value_type(float2), <sentinel>) and
_get_alloca_alignment(context, context.get_value_type(float4), <sentinel>) and
assert they return 8 and 16 respectively so the test no longer depends on
float2/float4 having no alignof_.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Enterprise
Run ID: 8e365935-c899-4c4e-b15a-811e11b25b5c
📒 Files selected for processing (2)
numbast/src/numbast/callconv.pynumbast/tests/test_callconv.py
|
/ok to test |
@isVoid, there was an error processing your request: See the following link for more information: https://docs.gha-runners.nvidia.com/cpr/e/1/ |
|
/ok to test 3e95cf6 |
|
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@numbast/src/numbast/types.py`:
- Around line 137-146: The code incorrectly mutates shared singleton numba_type
by assigning alignof_ on the object (via numba_type.alignof_), which causes
alias registrations to leak alignment; instead either attach alignment metadata
to the CTYPE_MAPS entry or shallow-copy the type before mutating. Concretely, in
the register path that runs _normalize_alignof and checks existing_alignof, if
alignof is not None make a new instance (e.g., copy.copy(numba_type)) and set
its alignof_ on the copy, then store that copy in CTYPE_MAPS[cxx_name] (or else
store a tuple/struct like (numba_type, alignof) in CTYPE_MAPS) so you do not
mutate the original singleton; keep the existing getattr(existing_alignof) check
against the copy/metadata to preserve behavior.
- Around line 107-113: The helper _normalize_alignof currently coerces arbitrary
numerics via int(...) which silently truncates values like 2.9; change it to
reject non-integral inputs instead: do not call int(...) on the raw
input—validate that alignof is an integer type (or explicitly integral) and
raise a TypeError for non-integer values, then perform the existing positive
power-of-two check (keep the ValueError for <=0 or non-power-of-two) and return
the integer alignof when valid.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Enterprise
Run ID: 3b8421b5-4065-4644-b450-4da0e998146f
📒 Files selected for processing (5)
numbast/src/numbast/callconv.pynumbast/src/numbast/types.pynumbast/tests/test_callconv.pynumbast/tests/test_cuda_vector_types.pynumbast/tests/test_register_cxx_type.py
|
/ok to test 08bf7b5 |
|
/ok to test 7ff033a |
|
/ok to test 7c4be8e |
## Summary - update root `VERSION` to `0.10.0` - add `0.10.0` to `docs/versions.json` and `docs/nv-versions.json` for docs version picker support ## Changelog - Remove direct numba imports from MLIR backend (#365) - [codex] add numba-cuda-mlir support guide (#364) - Split CI tests for Numba-CUDA and MLIR (#363) - Migrate experimental MLIR backend (#346) - Bump test-summary/action from 2.4 to 2.6 in the actions-monthly group (#345) - [codex] Support POD struct array fields (#343) - fix(types,callconv): register CUDA vector ABI alignment (#321) ## Validation - `python -m json.tool docs/versions.json` - `python -m json.tool docs/nv-versions.json` - `git diff --check` <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **Chores** * Version 0.10.0 released * Updated version references in documentation configuration files <!-- review_stack_entry_start --> [](https://app.coderabbit.ai/change-stack/NVIDIA/numbast/pull/367?utm_source=github_walkthrough&utm_medium=github&utm_campaign=change_stack) <!-- review_stack_entry_end --> <!-- end of auto-generated comment: release notes by coderabbit.ai --> Co-authored-by: Michael Wang <isVoid@users.noreply.github.com>
Summary
This PR makes CUDA vector type alignment an explicit part of Numbast's type
registry and uses that metadata when lowering call-convention temporaries.
register_cxx_type(...)with an optionalalignof=argument.type singletons.
register_cxx_type.numba.cuda.vector_types(char1throughdouble4) with explicit CUDA ABIalignments.
NUMBA_TO_CTYPE_MAPSentries for those vector types so templatededuction can round-trip vector arguments.
alignof_metadatafor struct types, when aligning return, argument, and out-return stack slots
in
FunctionCallConv.Motivation
CUDA vector types (
float2,float4, etc.) carry__align__(N)attributes.LLVM represents these values as anonymous structs, whose default ABI alignment
can be lower than the CUDA vector ABI requires. For example,
float2lowers as{float, float}but must be 8-byte aligned.When
FunctionCallConv._lower_implcreates under-aligned temporary slots, theNVRTC-compiled shim may issue vector load/store instructions through pointers
that do not meet the CUDA ABI requirement. That can surface as
cudaErrorMisalignedAddress.The concrete motivating case is
optixGetTriangleBarycentrics(), which returnsfloat2.Alignment Policy
honors it.
alignof_attribute directly, as generated structtypes already do, callconv lowering continues to honor it.
max(LLVM ABI alignment, explicit alignment).value.
registration.
because that would make alias-specific alignment leak to unrelated C++ names.
This keeps the path open for future/user-defined types with alignment larger
than 16 bytes without globally mutating Numba's shared scalar/vector objects.
The special CUDA
*_32avector aliases are intentionally not added here becausethey require distinct Numba type objects. Mapping both
double4anddouble4_32ato the same Numba type would require incompatible 16-byte and32-byte alignments for one Numba-level type.
Lowering Sites Updated
out_returntemporaries in the intent-plan path.Testing
PYTHONPATH=/home/wangm/numbast-optix/numbast/numbast/src /home/wangm/numbast-pixi-testing/.pixi/envs/test-cu13/bin/python -m pytest numbast/tests/test_callconv.py numbast/tests/test_register_cxx_type.py numbast/tests/test_cuda_vector_types.py/home/wangm/numbast-pixi-testing/.pixi/envs/test-cu13/bin/ruff check numbast/src/numbast/callconv.py numbast/src/numbast/types.py numbast/tests/test_callconv.py numbast/tests/test_register_cxx_type.py numbast/tests/test_cuda_vector_types.pygit diff --checkSummary by CodeRabbit
Release Notes
New Features
Tests