fix(types,callconv): register CUDA vector ABI alignment by isVoid · Pull Request #321 · NVIDIA/numbast

isVoid · 2026-04-15T18:49:58Z

Summary

This PR makes CUDA vector type alignment an explicit part of Numbast's type
registry and uses that metadata when lowering call-convention temporaries.

Extend register_cxx_type(...) with an optional alignof= argument.
Keep explicit alignment in registry metadata instead of mutating shared Numba
type singletons.
Register built-in C/CUDA type mappings through register_cxx_type.
Register the full canonical CUDA vector type family backed by
numba.cuda.vector_types (char1 through double4) with explicit CUDA ABI
alignments.
Add reverse NUMBA_TO_CTYPE_MAPS entries for those vector types so template
deduction can round-trip vector arguments.
Use registered alignment metadata, and existing type-owned alignof_ metadata
for struct types, when aligning return, argument, and out-return stack slots
in FunctionCallConv.

Motivation

CUDA vector types (float2, float4, etc.) carry __align__(N) attributes.
LLVM represents these values as anonymous structs, whose default ABI alignment
can be lower than the CUDA vector ABI requires. For example, float2 lowers as
{float, float} but must be 8-byte aligned.

When FunctionCallConv._lower_impl creates under-aligned temporary slots, the
NVRTC-compiled shim may issue vector load/store instructions through pointers
that do not meet the CUDA ABI requirement. That can surface as
cudaErrorMisalignedAddress.

The concrete motivating case is optixGetTriangleBarycentrics(), which returns
float2.

Alignment Policy

If a Numba type has registered explicit alignment metadata, callconv lowering
honors it.
If a Numba type owns an alignof_ attribute directly, as generated struct
types already do, callconv lowering continues to honor it.
The chosen alignment is max(LLVM ABI alignment, explicit alignment).
Alignment is no longer derived from object size or capped at an arbitrary
value.
Conflicting explicit alignments for the same Numba type are rejected during
registration.
Aligned aliases of already-registered unaligned singleton types are rejected,
because that would make alias-specific alignment leak to unrelated C++ names.

This keeps the path open for future/user-defined types with alignment larger
than 16 bytes without globally mutating Numba's shared scalar/vector objects.

The special CUDA *_32a vector aliases are intentionally not added here because
they require distinct Numba type objects. Mapping both double4 and
double4_32a to the same Numba type would require incompatible 16-byte and
32-byte alignments for one Numba-level type.

Lowering Sites Updated

Return-value temporary slot.
Visible argument temporaries in the no-intent-plan path.
out_return temporaries in the intent-plan path.
Visible argument temporaries in the intent-plan path.
Loads and stores associated with those temporaries.

Testing

PYTHONPATH=/home/wangm/numbast-optix/numbast/numbast/src /home/wangm/numbast-pixi-testing/.pixi/envs/test-cu13/bin/python -m pytest numbast/tests/test_callconv.py numbast/tests/test_register_cxx_type.py numbast/tests/test_cuda_vector_types.py
/home/wangm/numbast-pixi-testing/.pixi/envs/test-cu13/bin/ruff check numbast/src/numbast/callconv.py numbast/src/numbast/types.py numbast/tests/test_callconv.py numbast/tests/test_register_cxx_type.py numbast/tests/test_cuda_vector_types.py
git diff --check

Summary by CodeRabbit

Release Notes

New Features
- Enhanced alignment support for CUDA vector types and function call conventions
- Type registration system now accepts explicit alignment specifications
Tests
- Added comprehensive alignment validation tests for CUDA vector types
- Added coverage for alignment in generated code bindings
- Improved test isolation for alignment-related registries

copy-pr-bot · 2026-04-15T18:50:03Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

coderabbitai · 2026-04-15T18:50:14Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Enterprise

Run ID: c3c0b517-aa65-472b-81ce-b9d59fd24b4d

📥 Commits

Reviewing files that changed from the base of the PR and between 175383d and 7c4be8e.

📒 Files selected for processing (7)

numbast/src/numbast/callconv.py
numbast/src/numbast/static/callconv.py
numbast/src/numbast/static/tests/test_function_static_bindings.py
numbast/src/numbast/types.py
numbast/tests/test_callconv.py
numbast/tests/test_cuda_vector_types.py
numbast/tests/test_register_cxx_type.py

📝 Walkthrough

Walkthrough

Compute and apply explicit power-of-two alloca alignments (optionally using Numba type alignof_), thread them through return-slot, argument, and IntentPlan out-return allocas/stores/loads in callconv; add CUDA vector alignof registration in types and expand tests to assert lowered IR alignment annotations.

Changes

Callconv alignment updates

Layer / File(s)	Summary
Type registry & alignof metadata `numbast/src/numbast/types.py`	Add CUDA vector specs and programmatic maps; add `_normalize_alignof`; extend `register_cxx_type(..., alignof=...)` to validate/attach `numba_type.alignof_`; register built-ins at import time.
Static generated bindings `numbast/src/numbast/static/callconv.py`	Render and embed a standalone `_NUMBA_TYPE_ALIGNOF_MAPS` snapshot and `get_numba_type_alignof` helper into `CALLCONV_SRC` so generated function.cuh is self-contained.
Alignment helpers `numbast/src/numbast/callconv.py`	Add `_get_alloca_alignment(...)` to compute ABI alignment and optional Numba `alignof_`, and `_set_alloca_alignment(...)` to apply/store it on an `alloca`.
Return-slot allocation `numbast/src/numbast/callconv.py`	When `cxx_return_type` is non-void, compute `retval_align` via `_set_alloca_alignment` on the C++ return slot (`retval_ptr`) and use that alignment for subsequent `load`.
Argument allocas (non-IntentPlan / IntentPlan normal args) `numbast/src/numbast/callconv.py`	After `cgutils.alloca_once` for non-passthrough arguments and for IntentPlan normal args, call `_set_alloca_alignment`, save `ptr_align`, and use it in `builder.store(..., align=ptr_align)` instead of inline `getattr(..., "alignof_", None)`.
IntentPlan out-return allocas `numbast/src/numbast/callconv.py`	Change `out_return_ptrs` from `(out_ty, out_ptr)` tuples to `_OutReturnPtr(numba_ty, ptr, align)` records. Each out-return alloca has its alignment computed/stored via `_set_alloca_alignment` and recorded.
Loads for returns and out-returns `numbast/src/numbast/callconv.py`	Use precomputed `retval_align` and each `out_return.align` in `builder.load(..., align=...)` when assembling return values, removing inline `alignof_` usage.
Tests / Validation `numbast/tests/`, `numbast/src/numbast/static/tests/`	Add `_ShimWriter` stub and `_lower_callconv_to_ir` helper; add tests for CUDA vector alignof mappings, explicit alignof validation, and lowered-IR assertions for aligned `alloca`/`store`/`load`; ensure generated callconv source contains a standalone alignof helper.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 I hop through LLVM fields at night,

I tuck alignments snug and tight,
Return, arg, out-return in line,
Power-of-two, capped and fine,
A rabbit hums: "stacks aligned!"

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 44.12% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The PR title clearly and specifically describes the main change: registering CUDA vector ABI alignment in the type system and call convention handling.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

Generate code and open pull requests
Plan features and break down work
Investigate incidents and troubleshoot customer tickets together
Automate recurring tasks and respond to alerts with triggers
Summarize progress and report instantly

Built for teams:

Shared memory across your entire org—no repeating context
Per-thread sandboxes to safely plan and execute work
Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@numbast/src/numbast/callconv.py`:
- Around line 111-119: Extract the duplicated CUDA-allocation alignment logic
into a helper function and use it in all four places: create a helper like
_set_cuda_alloca_align(alloca, ty, target_data) that sets alloca.align =
max(target_data.abi_alignment(ty), min(target_data.abi_size(ty), 16)); replace
the repeated expressions (currently using _dl = context.target_data and setting
retval_ptr.align or other alloca.align with max(_dl.abi_alignment(...),
min(_dl.abi_size(...), 16))) by calling this helper with the corresponding
alloca (e.g., retval_ptr), type (e.g., retval_ty), and context.target_data to
centralize the logic and avoid repetition.
- Line 4: Remove the development/tracking marker comment
"NUMBAST_RETVAL_ALIGN_FIX_APPLIED" from the top of
numbast/src/numbast/callconv.py; simply delete that standalone comment line so
the file contains only production code/comments, and ensure no leftover blank
line or stray artifact remains.
- Around line 116-119: Update calls to the target data API and remove
unsupported alloca alignment assignments: replace _dl.abi_alignment(...) and
_dl.abi_size(...) with _dl.get_abi_alignment(...) and _dl.get_abi_size(...)
wherever used (e.g., the calculations around retval_ptr, and the other
occurrences at the same sites later in the file), and remove assignments to
AllocaInstr.align (e.g., retval_ptr.align = ..., and the other .align
assignments) since llvmlite IR AllocaInstr does not support setting .align; if
explicit "align N" is required use the binding layer to emit raw LLVM IR
instead, otherwise rely on the target data defaults and omit the .align
statements.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro Plus

Run ID: 09ca3211-42c8-4afe-84bd-ab4a26b03bf8

📥 Commits

Reviewing files that changed from the base of the PR and between f29a975 and 21c668b.

📒 Files selected for processing (1)

numbast/src/numbast/callconv.py

CUDA vector types (float2, float4, etc.) carry __align__(N) attributes that require N-byte alignment (float2 → 8 B, float4 → 16 B). LLVM represents them as anonymous structs whose ABI alignment defaults to the element alignment (4 B for float), not the vector alignment. When FunctionCallConv._lower_impl allocates stack slots via builder.alloca / cgutils.alloca_once without an explicit alignment, LLVM emits 4-byte-aligned allocas. The NVRTC-compiled shim then tries to perform a vector load/store (e.g. ld.global.v2.f32 for float2) on that 4-byte-aligned pointer, which violates the 8-byte alignment requirement and raises cudaErrorMisalignedAddress at runtime. Fix: after every alloca in _lower_impl, set alloca.align = max(dl.abi_alignment(ty), min(dl.abi_size(ty), 16)) The cap at 16 bytes covers float4/int4 (the widest standard CUDA vector types) without over-aligning large user-defined structs. The four sites fixed are: 1. retval_ptr — the function return-value slot 2. visible-arg ptrs (no-intent-plan path) 3. out_return ptrs (intent-plan path) 4. visible-arg ptrs (intent-plan path) Fixes optixGetTriangleBarycentrics() (float2 return) and any other Numbast binding that returns or accepts a CUDA vector type.

coderabbitai

♻️ Duplicate comments (1)

numbast/src/numbast/callconv.py (1)

135-138: 🧹 Nitpick | 🔵 Trivial

Deduplicate the repeated alignment formula into a helper.

The same expression is repeated four times; centralizing it will reduce drift risk and simplify future ABI updates.

Refactor sketch

+def _set_cuda_alloca_align(alloca, value_ty, target_data):
+    alloca.align = max(
+        target_data.abi_alignment(value_ty),
+        min(target_data.abi_size(value_ty), 16),
+    )
...
-            _dl = context.target_data
-            retval_ptr.align = max(
-                _dl.abi_alignment(retval_ty), min(_dl.abi_size(retval_ty), 16)
-            )
+            _set_cuda_alloca_align(retval_ptr, retval_ty, context.target_data)
...
-                    _dl = context.target_data
-                    ptr.align = max(
-                        _dl.abi_alignment(vty), min(_dl.abi_size(vty), 16)
-                    )
+                    _set_cuda_alloca_align(ptr, vty, context.target_data)

Also applies to: 186-189, 211-214, 234-237

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@numbast/src/numbast/callconv.py` around lines 135 - 138, Several places set
retval_ptr.align using the same expression; extract that logic into a small
helper (e.g., abi_retval_align(dl, ty) or _compute_retval_align) that takes the
data layout/target_data and the type and returns max(dl.abi_alignment(ty),
min(dl.abi_size(ty), 16)); then replace each repeated expression (the
assignments to retval_ptr.align found where context.target_data/_dl and
retval_ty are used) with a call to this new helper (pass context.target_data or
_dl and retval_ty). Ensure the helper is placed in the same module (callconv.py)
and update all four locations (the occurrences setting retval_ptr.align) to use
it.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@numbast/src/numbast/callconv.py`:
- Around line 135-138: Several places set retval_ptr.align using the same
expression; extract that logic into a small helper (e.g., abi_retval_align(dl,
ty) or _compute_retval_align) that takes the data layout/target_data and the
type and returns max(dl.abi_alignment(ty), min(dl.abi_size(ty), 16)); then
replace each repeated expression (the assignments to retval_ptr.align found
where context.target_data/_dl and retval_ty are used) with a call to this new
helper (pass context.target_data or _dl and retval_ty). Ensure the helper is
placed in the same module (callconv.py) and update all four locations (the
occurrences setting retval_ptr.align) to use it.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro Plus

Run ID: 7a63c268-ad04-4299-9645-7c500649845e

📥 Commits

Reviewing files that changed from the base of the PR and between 21c668b and 4fc0c38.

📒 Files selected for processing (1)

numbast/src/numbast/callconv.py

…uristic The previous fix used max(abi_alignment, min(sizeof, 16)) to guess the required alloca alignment for CUDA vector types. This heuristic works for power-of-2 sized types (float2, float4) but is incorrect for non-power-of-2 types like float3/uint3 (sizeof=12 → would produce alignment=12, which is not a valid power-of-2 LLVM alignment). Numbast already propagates alignof_ from ast_canopy onto user-defined bound structs, and already uses getattr(argty, "alignof_", None) for load/store instructions. Apply the same convention to alloca: check alignof_ on the Numba type and set it when present; when absent, leave LLVM's default ABI alignment (correct for scalars and structs without an explicit __align__ attribute). Callers registering built-in CUDA vector types in CTYPE_MAPS must set alignof_ on the Numba type to match the __align__(N) in the CUDA headers (e.g. float32x2.alignof_ = 8 for float2's __align__(8)). This mirrors how ast_canopy-derived struct types already work.

coderabbitai

♻️ Duplicate comments (1)

numbast/src/numbast/callconv.py (1)
4-4: ⚠️ Potential issue | 🟡 Minor

Remove transient marker comment at Line 4.

# NUMBAST_RETVAL_ALIGN_FIX_APPLIED looks like a tracking artifact and should not remain in production source.
Proposed fix
-# NUMBAST_RETVAL_ALIGN_FIX_APPLIED
 from numbast.args import prepare_ir_types
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@numbast/src/numbast/callconv.py` at line 4, Remove the transient tracking
comment "# NUMBAST_RETVAL_ALIGN_FIX_APPLIED" from the top of callconv.py; simply
delete that standalone marker so the source contains no leftover transient
artifact (search for the exact string NUMBAST_RETVAL_ALIGN_FIX_APPLIED to locate
it).

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@numbast/src/numbast/callconv.py`:
- Line 4: Remove the transient tracking comment "#
NUMBAST_RETVAL_ALIGN_FIX_APPLIED" from the top of callconv.py; simply delete
that standalone marker so the source contains no leftover transient artifact
(search for the exact string NUMBAST_RETVAL_ALIGN_FIX_APPLIED to locate it).

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro Plus

Run ID: 18c274b5-7749-4f9e-8607-5769df380ab4

📥 Commits

Reviewing files that changed from the base of the PR and between 4fc0c38 and f32141d.

📒 Files selected for processing (1)

numbast/src/numbast/callconv.py

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@numbast/tests/test_callconv.py`:
- Around line 58-68: The test currently depends on numba-cuda vector type
internals by asserting getattr(float2, "alignof_", None) is None and similarly
for float4; instead modify the test to exercise the fallback path of
_get_alloca_alignment directly by passing a sentinel numba_ty that lacks an
alignof_ attribute (e.g., None or a simple object without alignof_) into the
calls that use context.get_value_type(float2)/float4 as the first argument and
the sentinel as the numba_ty argument; update the two assertions to call
_get_alloca_alignment(context, context.get_value_type(float2), <sentinel>) and
_get_alloca_alignment(context, context.get_value_type(float4), <sentinel>) and
assert they return 8 and 16 respectively so the test no longer depends on
float2/float4 having no alignof_.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Enterprise

Run ID: 8e365935-c899-4c4e-b15a-811e11b25b5c

📥 Commits

Reviewing files that changed from the base of the PR and between f32141d and db5c2e4.

📒 Files selected for processing (2)

numbast/src/numbast/callconv.py
numbast/tests/test_callconv.py

isVoid · 2026-05-05T06:03:55Z

/ok to test

copy-pr-bot · 2026-05-05T06:03:58Z

/ok to test

@isVoid, there was an error processing your request: E1

See the following link for more information: https://docs.gha-runners.nvidia.com/cpr/e/1/

isVoid · 2026-05-05T06:04:20Z

/ok to test 3e95cf6

github-actions · 2026-05-05T07:09:24Z

Doc Preview CI
🚀 View pre-built docs at
https://NVIDIA.github.io/numbast/pr-preview/pr-321/
Preview will be ready when GitHub Pages deployment finishes.

isVoid · 2026-05-07T20:56:03Z

@coderabbitai review

coderabbitai · 2026-05-07T20:56:10Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@numbast/src/numbast/types.py`:
- Around line 137-146: The code incorrectly mutates shared singleton numba_type
by assigning alignof_ on the object (via numba_type.alignof_), which causes
alias registrations to leak alignment; instead either attach alignment metadata
to the CTYPE_MAPS entry or shallow-copy the type before mutating. Concretely, in
the register path that runs _normalize_alignof and checks existing_alignof, if
alignof is not None make a new instance (e.g., copy.copy(numba_type)) and set
its alignof_ on the copy, then store that copy in CTYPE_MAPS[cxx_name] (or else
store a tuple/struct like (numba_type, alignof) in CTYPE_MAPS) so you do not
mutate the original singleton; keep the existing getattr(existing_alignof) check
against the copy/metadata to preserve behavior.
- Around line 107-113: The helper _normalize_alignof currently coerces arbitrary
numerics via int(...) which silently truncates values like 2.9; change it to
reject non-integral inputs instead: do not call int(...) on the raw
input—validate that alignof is an integer type (or explicitly integral) and
raise a TypeError for non-integer values, then perform the existing positive
power-of-two check (keep the ValueError for <=0 or non-power-of-two) and return
the integer alignof when valid.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Enterprise

Run ID: 3b8421b5-4065-4644-b450-4da0e998146f

📥 Commits

Reviewing files that changed from the base of the PR and between db5c2e4 and 175383d.

📒 Files selected for processing (5)

numbast/src/numbast/callconv.py
numbast/src/numbast/types.py
numbast/tests/test_callconv.py
numbast/tests/test_cuda_vector_types.py
numbast/tests/test_register_cxx_type.py

isVoid · 2026-05-11T16:38:26Z

/ok to test 08bf7b5

isVoid · 2026-05-11T16:42:35Z

/ok to test 7ff033a

isVoid · 2026-05-11T19:33:01Z

/ok to test 7c4be8e

## Summary - update root `VERSION` to `0.10.0` - add `0.10.0` to `docs/versions.json` and `docs/nv-versions.json` for docs version picker support ## Changelog - Remove direct numba imports from MLIR backend (#365) - [codex] add numba-cuda-mlir support guide (#364) - Split CI tests for Numba-CUDA and MLIR (#363) - Migrate experimental MLIR backend (#346) - Bump test-summary/action from 2.4 to 2.6 in the actions-monthly group (#345) - [codex] Support POD struct array fields (#343) - fix(types,callconv): register CUDA vector ABI alignment (#321) ## Validation - `python -m json.tool docs/versions.json` - `python -m json.tool docs/nv-versions.json` - `git diff --check`  ## Summary by CodeRabbit * **Chores** * Version 0.10.0 released * Updated version references in documentation configuration files  [![Review Change Stack](https://storage.googleapis.com/coderabbit_public_assets/review-stack-in-coderabbit-ui.svg)](https://app.coderabbit.ai/change-stack/NVIDIA/numbast/pull/367?utm_source=github_walkthrough&utm_medium=github&utm_campaign=change_stack)   Co-authored-by: Michael Wang <isVoid@users.noreply.github.com>

coderabbitai Bot reviewed Apr 15, 2026

View reviewed changes

Comment thread numbast/src/numbast/callconv.py Outdated

Comment thread numbast/src/numbast/callconv.py Outdated

Comment thread numbast/src/numbast/callconv.py Outdated

isVoid force-pushed the fix/callconv-alloca-alignment-cuda-vector-types branch from 21c668b to 4fc0c38 Compare April 15, 2026 18:57

coderabbitai Bot reviewed Apr 15, 2026

View reviewed changes

isVoid and others added 2 commits May 4, 2026 15:41

Merge branch 'main' into fix/callconv-alloca-alignment-cuda-vector-types

21ec0ef

fix: compute callconv alloca alignment

db5c2e4

coderabbitai Bot reviewed May 4, 2026

View reviewed changes

Comment thread numbast/tests/test_callconv.py Outdated

refactor: name out-return callconv fields

3e95cf6

isVoid added 3 commits May 7, 2026 11:00

refactor(callconv): use explicit type alignment

84e2ad1

refactor(types): register explicit type alignment

f4de346

feat(types): register CUDA vector type mappings

175383d

isVoid changed the title ~~fix(callconv): align alloca slots to CUDA vector type ABI requirements~~ fix(types,callconv): register CUDA vector ABI alignment May 7, 2026

coderabbitai Bot reviewed May 7, 2026

View reviewed changes

Comment thread numbast/src/numbast/types.py

Comment thread numbast/src/numbast/types.py

isVoid added 2 commits May 10, 2026 22:54

fix(types): keep explicit alignment in registry metadata

8a3b911

test(callconv): cover ABI alignment fallback explicitly

08bf7b5

isVoid and others added 2 commits May 11, 2026 09:41

style: apply pre-commit formatting

1a67928

Merge branch 'main' into fix/callconv-alloca-alignment-cuda-vector-types

7ff033a

isVoid mentioned this pull request May 11, 2026

[codex] Align C ABI allocas for CUDA vector types #336

Closed

isVoid mentioned this pull request May 11, 2026

[codex] Add CUDA vector type C mappings #337

Closed

fix(static): keep generated callconv independent of numbast

7c4be8e

isVoid merged commit 266b9df into NVIDIA:main May 12, 2026
32 checks passed

isVoid mentioned this pull request May 22, 2026

Bump Version to 0.10.0 #367

Merged

Conversation

isVoid commented Apr 15, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Motivation

Alignment Policy

Lowering Sites Updated

Testing

Summary by CodeRabbit

Release Notes

Uh oh!

copy-pr-bot Bot commented Apr 15, 2026

Uh oh!

coderabbitai Bot commented Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

isVoid commented May 5, 2026

Uh oh!

copy-pr-bot Bot commented May 5, 2026

Uh oh!

isVoid commented May 5, 2026

Uh oh!

github-actions Bot commented May 5, 2026

Uh oh!

isVoid commented May 7, 2026

Uh oh!

coderabbitai Bot commented May 7, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

isVoid commented May 11, 2026

Uh oh!

isVoid commented May 11, 2026

Uh oh!

isVoid commented May 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

isVoid commented Apr 15, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 15, 2026 •

edited

Loading