Skip to content

fix(types,callconv): register CUDA vector ABI alignment#321

Merged
isVoid merged 13 commits into
NVIDIA:mainfrom
isVoid:fix/callconv-alloca-alignment-cuda-vector-types
May 12, 2026
Merged

fix(types,callconv): register CUDA vector ABI alignment#321
isVoid merged 13 commits into
NVIDIA:mainfrom
isVoid:fix/callconv-alloca-alignment-cuda-vector-types

Conversation

@isVoid
Copy link
Copy Markdown
Collaborator

@isVoid isVoid commented Apr 15, 2026

Summary

This PR makes CUDA vector type alignment an explicit part of Numbast's type
registry and uses that metadata when lowering call-convention temporaries.

  • Extend register_cxx_type(...) with an optional alignof= argument.
  • Keep explicit alignment in registry metadata instead of mutating shared Numba
    type singletons.
  • Register built-in C/CUDA type mappings through register_cxx_type.
  • Register the full canonical CUDA vector type family backed by
    numba.cuda.vector_types (char1 through double4) with explicit CUDA ABI
    alignments.
  • Add reverse NUMBA_TO_CTYPE_MAPS entries for those vector types so template
    deduction can round-trip vector arguments.
  • Use registered alignment metadata, and existing type-owned alignof_ metadata
    for struct types, when aligning return, argument, and out-return stack slots
    in FunctionCallConv.

Motivation

CUDA vector types (float2, float4, etc.) carry __align__(N) attributes.
LLVM represents these values as anonymous structs, whose default ABI alignment
can be lower than the CUDA vector ABI requires. For example, float2 lowers as
{float, float} but must be 8-byte aligned.

When FunctionCallConv._lower_impl creates under-aligned temporary slots, the
NVRTC-compiled shim may issue vector load/store instructions through pointers
that do not meet the CUDA ABI requirement. That can surface as
cudaErrorMisalignedAddress.

The concrete motivating case is optixGetTriangleBarycentrics(), which returns
float2.

Alignment Policy

  • If a Numba type has registered explicit alignment metadata, callconv lowering
    honors it.
  • If a Numba type owns an alignof_ attribute directly, as generated struct
    types already do, callconv lowering continues to honor it.
  • The chosen alignment is max(LLVM ABI alignment, explicit alignment).
  • Alignment is no longer derived from object size or capped at an arbitrary
    value.
  • Conflicting explicit alignments for the same Numba type are rejected during
    registration.
  • Aligned aliases of already-registered unaligned singleton types are rejected,
    because that would make alias-specific alignment leak to unrelated C++ names.

This keeps the path open for future/user-defined types with alignment larger
than 16 bytes without globally mutating Numba's shared scalar/vector objects.

The special CUDA *_32a vector aliases are intentionally not added here because
they require distinct Numba type objects. Mapping both double4 and
double4_32a to the same Numba type would require incompatible 16-byte and
32-byte alignments for one Numba-level type.

Lowering Sites Updated

  1. Return-value temporary slot.
  2. Visible argument temporaries in the no-intent-plan path.
  3. out_return temporaries in the intent-plan path.
  4. Visible argument temporaries in the intent-plan path.
  5. Loads and stores associated with those temporaries.

Testing

  • PYTHONPATH=/home/wangm/numbast-optix/numbast/numbast/src /home/wangm/numbast-pixi-testing/.pixi/envs/test-cu13/bin/python -m pytest numbast/tests/test_callconv.py numbast/tests/test_register_cxx_type.py numbast/tests/test_cuda_vector_types.py
  • /home/wangm/numbast-pixi-testing/.pixi/envs/test-cu13/bin/ruff check numbast/src/numbast/callconv.py numbast/src/numbast/types.py numbast/tests/test_callconv.py numbast/tests/test_register_cxx_type.py numbast/tests/test_cuda_vector_types.py
  • git diff --check

Summary by CodeRabbit

Release Notes

  • New Features

    • Enhanced alignment support for CUDA vector types and function call conventions
    • Type registration system now accepts explicit alignment specifications
  • Tests

    • Added comprehensive alignment validation tests for CUDA vector types
    • Added coverage for alignment in generated code bindings
    • Improved test isolation for alignment-related registries

@copy-pr-bot
Copy link
Copy Markdown
Contributor

copy-pr-bot Bot commented Apr 15, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 15, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Enterprise

Run ID: c3c0b517-aa65-472b-81ce-b9d59fd24b4d

📥 Commits

Reviewing files that changed from the base of the PR and between 175383d and 7c4be8e.

📒 Files selected for processing (7)
  • numbast/src/numbast/callconv.py
  • numbast/src/numbast/static/callconv.py
  • numbast/src/numbast/static/tests/test_function_static_bindings.py
  • numbast/src/numbast/types.py
  • numbast/tests/test_callconv.py
  • numbast/tests/test_cuda_vector_types.py
  • numbast/tests/test_register_cxx_type.py

📝 Walkthrough

Walkthrough

Compute and apply explicit power-of-two alloca alignments (optionally using Numba type alignof_), thread them through return-slot, argument, and IntentPlan out-return allocas/stores/loads in callconv; add CUDA vector alignof registration in types and expand tests to assert lowered IR alignment annotations.

Changes

Callconv alignment updates

Layer / File(s) Summary
Type registry & alignof metadata
numbast/src/numbast/types.py
Add CUDA vector specs and programmatic maps; add _normalize_alignof; extend register_cxx_type(..., alignof=...) to validate/attach numba_type.alignof_; register built-ins at import time.
Static generated bindings
numbast/src/numbast/static/callconv.py
Render and embed a standalone _NUMBA_TYPE_ALIGNOF_MAPS snapshot and get_numba_type_alignof helper into CALLCONV_SRC so generated function.cuh is self-contained.
Alignment helpers
numbast/src/numbast/callconv.py
Add _get_alloca_alignment(...) to compute ABI alignment and optional Numba alignof_, and _set_alloca_alignment(...) to apply/store it on an alloca.
Return-slot allocation
numbast/src/numbast/callconv.py
When cxx_return_type is non-void, compute retval_align via _set_alloca_alignment on the C++ return slot (retval_ptr) and use that alignment for subsequent load.
Argument allocas (non-IntentPlan / IntentPlan normal args)
numbast/src/numbast/callconv.py
After cgutils.alloca_once for non-passthrough arguments and for IntentPlan normal args, call _set_alloca_alignment, save ptr_align, and use it in builder.store(..., align=ptr_align) instead of inline getattr(..., "alignof_", None).
IntentPlan out-return allocas
numbast/src/numbast/callconv.py
Change out_return_ptrs from (out_ty, out_ptr) tuples to _OutReturnPtr(numba_ty, ptr, align) records. Each out-return alloca has its alignment computed/stored via _set_alloca_alignment and recorded.
Loads for returns and out-returns
numbast/src/numbast/callconv.py
Use precomputed retval_align and each out_return.align in builder.load(..., align=...) when assembling return values, removing inline alignof_ usage.
Tests / Validation
numbast/tests/*, numbast/src/numbast/static/tests/*
Add _ShimWriter stub and _lower_callconv_to_ir helper; add tests for CUDA vector alignof mappings, explicit alignof validation, and lowered-IR assertions for aligned alloca/store/load; ensure generated callconv source contains a standalone alignof helper.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 I hop through LLVM fields at night,

I tuck alignments snug and tight,
Return, arg, out-return in line,
Power-of-two, capped and fine,
A rabbit hums: "stacks aligned!"

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 44.12% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title clearly and specifically describes the main change: registering CUDA vector ABI alignment in the type system and call convention handling.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@numbast/src/numbast/callconv.py`:
- Around line 111-119: Extract the duplicated CUDA-allocation alignment logic
into a helper function and use it in all four places: create a helper like
_set_cuda_alloca_align(alloca, ty, target_data) that sets alloca.align =
max(target_data.abi_alignment(ty), min(target_data.abi_size(ty), 16)); replace
the repeated expressions (currently using _dl = context.target_data and setting
retval_ptr.align or other alloca.align with max(_dl.abi_alignment(...),
min(_dl.abi_size(...), 16))) by calling this helper with the corresponding
alloca (e.g., retval_ptr), type (e.g., retval_ty), and context.target_data to
centralize the logic and avoid repetition.
- Line 4: Remove the development/tracking marker comment
"NUMBAST_RETVAL_ALIGN_FIX_APPLIED" from the top of
numbast/src/numbast/callconv.py; simply delete that standalone comment line so
the file contains only production code/comments, and ensure no leftover blank
line or stray artifact remains.
- Around line 116-119: Update calls to the target data API and remove
unsupported alloca alignment assignments: replace _dl.abi_alignment(...) and
_dl.abi_size(...) with _dl.get_abi_alignment(...) and _dl.get_abi_size(...)
wherever used (e.g., the calculations around retval_ptr, and the other
occurrences at the same sites later in the file), and remove assignments to
AllocaInstr.align (e.g., retval_ptr.align = ..., and the other .align
assignments) since llvmlite IR AllocaInstr does not support setting .align; if
explicit "align N" is required use the binding layer to emit raw LLVM IR
instead, otherwise rely on the target data defaults and omit the .align
statements.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro Plus

Run ID: 09ca3211-42c8-4afe-84bd-ab4a26b03bf8

📥 Commits

Reviewing files that changed from the base of the PR and between f29a975 and 21c668b.

📒 Files selected for processing (1)
  • numbast/src/numbast/callconv.py

Comment thread numbast/src/numbast/callconv.py Outdated
Comment thread numbast/src/numbast/callconv.py Outdated
Comment thread numbast/src/numbast/callconv.py Outdated
CUDA vector types (float2, float4, etc.) carry __align__(N) attributes
that require N-byte alignment (float2 → 8 B, float4 → 16 B).  LLVM
represents them as anonymous structs whose ABI alignment defaults to
the element alignment (4 B for float), not the vector alignment.

When FunctionCallConv._lower_impl allocates stack slots via
builder.alloca / cgutils.alloca_once without an explicit alignment,
LLVM emits 4-byte-aligned allocas.  The NVRTC-compiled shim then tries
to perform a vector load/store (e.g. ld.global.v2.f32 for float2) on
that 4-byte-aligned pointer, which violates the 8-byte alignment
requirement and raises cudaErrorMisalignedAddress at runtime.

Fix: after every alloca in _lower_impl, set
  alloca.align = max(dl.abi_alignment(ty), min(dl.abi_size(ty), 16))

The cap at 16 bytes covers float4/int4 (the widest standard CUDA
vector types) without over-aligning large user-defined structs.

The four sites fixed are:
  1. retval_ptr — the function return-value slot
  2. visible-arg ptrs (no-intent-plan path)
  3. out_return ptrs (intent-plan path)
  4. visible-arg ptrs (intent-plan path)

Fixes optixGetTriangleBarycentrics() (float2 return) and any other
Numbast binding that returns or accepts a CUDA vector type.
@isVoid isVoid force-pushed the fix/callconv-alloca-alignment-cuda-vector-types branch from 21c668b to 4fc0c38 Compare April 15, 2026 18:57
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
numbast/src/numbast/callconv.py (1)

135-138: 🧹 Nitpick | 🔵 Trivial

Deduplicate the repeated alignment formula into a helper.

The same expression is repeated four times; centralizing it will reduce drift risk and simplify future ABI updates.

Refactor sketch
+def _set_cuda_alloca_align(alloca, value_ty, target_data):
+    alloca.align = max(
+        target_data.abi_alignment(value_ty),
+        min(target_data.abi_size(value_ty), 16),
+    )
...
-            _dl = context.target_data
-            retval_ptr.align = max(
-                _dl.abi_alignment(retval_ty), min(_dl.abi_size(retval_ty), 16)
-            )
+            _set_cuda_alloca_align(retval_ptr, retval_ty, context.target_data)
...
-                    _dl = context.target_data
-                    ptr.align = max(
-                        _dl.abi_alignment(vty), min(_dl.abi_size(vty), 16)
-                    )
+                    _set_cuda_alloca_align(ptr, vty, context.target_data)

Also applies to: 186-189, 211-214, 234-237

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@numbast/src/numbast/callconv.py` around lines 135 - 138, Several places set
retval_ptr.align using the same expression; extract that logic into a small
helper (e.g., abi_retval_align(dl, ty) or _compute_retval_align) that takes the
data layout/target_data and the type and returns max(dl.abi_alignment(ty),
min(dl.abi_size(ty), 16)); then replace each repeated expression (the
assignments to retval_ptr.align found where context.target_data/_dl and
retval_ty are used) with a call to this new helper (pass context.target_data or
_dl and retval_ty). Ensure the helper is placed in the same module (callconv.py)
and update all four locations (the occurrences setting retval_ptr.align) to use
it.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@numbast/src/numbast/callconv.py`:
- Around line 135-138: Several places set retval_ptr.align using the same
expression; extract that logic into a small helper (e.g., abi_retval_align(dl,
ty) or _compute_retval_align) that takes the data layout/target_data and the
type and returns max(dl.abi_alignment(ty), min(dl.abi_size(ty), 16)); then
replace each repeated expression (the assignments to retval_ptr.align found
where context.target_data/_dl and retval_ty are used) with a call to this new
helper (pass context.target_data or _dl and retval_ty). Ensure the helper is
placed in the same module (callconv.py) and update all four locations (the
occurrences setting retval_ptr.align) to use it.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro Plus

Run ID: 7a63c268-ad04-4299-9645-7c500649845e

📥 Commits

Reviewing files that changed from the base of the PR and between 21c668b and 4fc0c38.

📒 Files selected for processing (1)
  • numbast/src/numbast/callconv.py

…uristic

The previous fix used max(abi_alignment, min(sizeof, 16)) to guess the
required alloca alignment for CUDA vector types. This heuristic works
for power-of-2 sized types (float2, float4) but is incorrect for
non-power-of-2 types like float3/uint3 (sizeof=12 → would produce
alignment=12, which is not a valid power-of-2 LLVM alignment).

Numbast already propagates alignof_ from ast_canopy onto user-defined
bound structs, and already uses getattr(argty, "alignof_", None) for
load/store instructions. Apply the same convention to alloca: check
alignof_ on the Numba type and set it when present; when absent, leave
LLVM's default ABI alignment (correct for scalars and structs without
an explicit __align__ attribute).

Callers registering built-in CUDA vector types in CTYPE_MAPS must set
alignof_ on the Numba type to match the __align__(N) in the CUDA
headers (e.g. float32x2.alignof_ = 8 for float2's __align__(8)).
This mirrors how ast_canopy-derived struct types already work.
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
numbast/src/numbast/callconv.py (1)

4-4: ⚠️ Potential issue | 🟡 Minor

Remove transient marker comment at Line 4.

# NUMBAST_RETVAL_ALIGN_FIX_APPLIED looks like a tracking artifact and should not remain in production source.

Proposed fix
-# NUMBAST_RETVAL_ALIGN_FIX_APPLIED
 from numbast.args import prepare_ir_types
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@numbast/src/numbast/callconv.py` at line 4, Remove the transient tracking
comment "# NUMBAST_RETVAL_ALIGN_FIX_APPLIED" from the top of callconv.py; simply
delete that standalone marker so the source contains no leftover transient
artifact (search for the exact string NUMBAST_RETVAL_ALIGN_FIX_APPLIED to locate
it).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@numbast/src/numbast/callconv.py`:
- Line 4: Remove the transient tracking comment "#
NUMBAST_RETVAL_ALIGN_FIX_APPLIED" from the top of callconv.py; simply delete
that standalone marker so the source contains no leftover transient artifact
(search for the exact string NUMBAST_RETVAL_ALIGN_FIX_APPLIED to locate it).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro Plus

Run ID: 18c274b5-7749-4f9e-8607-5769df380ab4

📥 Commits

Reviewing files that changed from the base of the PR and between 4fc0c38 and f32141d.

📒 Files selected for processing (1)
  • numbast/src/numbast/callconv.py

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@numbast/tests/test_callconv.py`:
- Around line 58-68: The test currently depends on numba-cuda vector type
internals by asserting getattr(float2, "alignof_", None) is None and similarly
for float4; instead modify the test to exercise the fallback path of
_get_alloca_alignment directly by passing a sentinel numba_ty that lacks an
alignof_ attribute (e.g., None or a simple object without alignof_) into the
calls that use context.get_value_type(float2)/float4 as the first argument and
the sentinel as the numba_ty argument; update the two assertions to call
_get_alloca_alignment(context, context.get_value_type(float2), <sentinel>) and
_get_alloca_alignment(context, context.get_value_type(float4), <sentinel>) and
assert they return 8 and 16 respectively so the test no longer depends on
float2/float4 having no alignof_.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Enterprise

Run ID: 8e365935-c899-4c4e-b15a-811e11b25b5c

📥 Commits

Reviewing files that changed from the base of the PR and between f32141d and db5c2e4.

📒 Files selected for processing (2)
  • numbast/src/numbast/callconv.py
  • numbast/tests/test_callconv.py

Comment thread numbast/tests/test_callconv.py Outdated
@isVoid
Copy link
Copy Markdown
Collaborator Author

isVoid commented May 5, 2026

/ok to test

@copy-pr-bot
Copy link
Copy Markdown
Contributor

copy-pr-bot Bot commented May 5, 2026

/ok to test

@isVoid, there was an error processing your request: E1

See the following link for more information: https://docs.gha-runners.nvidia.com/cpr/e/1/

@isVoid
Copy link
Copy Markdown
Collaborator Author

isVoid commented May 5, 2026

/ok to test 3e95cf6

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 5, 2026

Doc Preview CI
🚀 View pre-built docs at
https://NVIDIA.github.io/numbast/pr-preview/pr-321/

Preview will be ready when GitHub Pages deployment finishes.

@isVoid isVoid changed the title fix(callconv): align alloca slots to CUDA vector type ABI requirements fix(types,callconv): register CUDA vector ABI alignment May 7, 2026
@isVoid
Copy link
Copy Markdown
Collaborator Author

isVoid commented May 7, 2026

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 7, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@numbast/src/numbast/types.py`:
- Around line 137-146: The code incorrectly mutates shared singleton numba_type
by assigning alignof_ on the object (via numba_type.alignof_), which causes
alias registrations to leak alignment; instead either attach alignment metadata
to the CTYPE_MAPS entry or shallow-copy the type before mutating. Concretely, in
the register path that runs _normalize_alignof and checks existing_alignof, if
alignof is not None make a new instance (e.g., copy.copy(numba_type)) and set
its alignof_ on the copy, then store that copy in CTYPE_MAPS[cxx_name] (or else
store a tuple/struct like (numba_type, alignof) in CTYPE_MAPS) so you do not
mutate the original singleton; keep the existing getattr(existing_alignof) check
against the copy/metadata to preserve behavior.
- Around line 107-113: The helper _normalize_alignof currently coerces arbitrary
numerics via int(...) which silently truncates values like 2.9; change it to
reject non-integral inputs instead: do not call int(...) on the raw
input—validate that alignof is an integer type (or explicitly integral) and
raise a TypeError for non-integer values, then perform the existing positive
power-of-two check (keep the ValueError for <=0 or non-power-of-two) and return
the integer alignof when valid.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Enterprise

Run ID: 3b8421b5-4065-4644-b450-4da0e998146f

📥 Commits

Reviewing files that changed from the base of the PR and between db5c2e4 and 175383d.

📒 Files selected for processing (5)
  • numbast/src/numbast/callconv.py
  • numbast/src/numbast/types.py
  • numbast/tests/test_callconv.py
  • numbast/tests/test_cuda_vector_types.py
  • numbast/tests/test_register_cxx_type.py

Comment thread numbast/src/numbast/types.py
Comment thread numbast/src/numbast/types.py
@isVoid
Copy link
Copy Markdown
Collaborator Author

isVoid commented May 11, 2026

/ok to test 08bf7b5

@isVoid
Copy link
Copy Markdown
Collaborator Author

isVoid commented May 11, 2026

/ok to test 7ff033a

@isVoid
Copy link
Copy Markdown
Collaborator Author

isVoid commented May 11, 2026

/ok to test 7c4be8e

@isVoid isVoid merged commit 266b9df into NVIDIA:main May 12, 2026
32 checks passed
@isVoid isVoid mentioned this pull request May 22, 2026
isVoid added a commit that referenced this pull request May 23, 2026
## Summary
- update root `VERSION` to `0.10.0`
- add `0.10.0` to `docs/versions.json` and `docs/nv-versions.json` for
docs version picker support

## Changelog
- Remove direct numba imports from MLIR backend (#365)
- [codex] add numba-cuda-mlir support guide (#364)
- Split CI tests for Numba-CUDA and MLIR (#363)
- Migrate experimental MLIR backend (#346)
- Bump test-summary/action from 2.4 to 2.6 in the actions-monthly group
(#345)
- [codex] Support POD struct array fields (#343)
- fix(types,callconv): register CUDA vector ABI alignment (#321)

## Validation
- `python -m json.tool docs/versions.json`
- `python -m json.tool docs/nv-versions.json`
- `git diff --check`


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **Chores**
  * Version 0.10.0 released
  * Updated version references in documentation configuration files

<!-- review_stack_entry_start -->

[![Review Change
Stack](https://storage.googleapis.com/coderabbit_public_assets/review-stack-in-coderabbit-ui.svg)](https://app.coderabbit.ai/change-stack/NVIDIA/numbast/pull/367?utm_source=github_walkthrough&utm_medium=github&utm_campaign=change_stack)

<!-- review_stack_entry_end -->

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

Co-authored-by: Michael Wang <isVoid@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant