Skip to content

[C++] Implement cudaq::measure_handle#4409

Draft
khalatepradnya wants to merge 10 commits intoNVIDIA:mainfrom
khalatepradnya:pkhalate/measure-handle-pr3b-cpp-frontend
Draft

[C++] Implement cudaq::measure_handle#4409
khalatepradnya wants to merge 10 commits intoNVIDIA:mainfrom
khalatepradnya:pkhalate/measure-handle-pr3b-cpp-frontend

Conversation

@khalatepradnya
Copy link
Copy Markdown
Collaborator

@khalatepradnya khalatepradnya commented Apr 29, 2026

Summary

NOTE: These all apply in MLIR mode only.
Library mode (#ifdef CUDAQ_LIBRARY_MODE) keeps the legacy class measure_result block in runtime/cudaq/qis/execution_manager.h verbatim, and is source-compatible.

Breaking Changes

Type identity

cudaq::measure_result is now a using alias for cudaq::measure_handle. Existing source code that spells measure_result continues to compile, but the underlying class changed:

  • measure_handle is not implicitly convertible to bool at host scope - operator bool() std::abort()s in MLIR mode. The bridge intercepts every legitimate bool coercion inside __qpu__ and emits quake.discriminate; reaching the host body means a coercion site escaped the bridge.
  • measure_handle is not integer-constructible. The legacy measure_result(uint8_t) form no longer compiles; use bool for host-side bit storage.
  • decltype(mz(q)) resolves to measure_handle, not bool. Visible to auto, template specialization, and std::is_same checks.

Return type of mz / mx / my

  • Scalar form: mz(q) / mx(q) / my(q) return measure_handle instead of bool.
  • Vector form: mz(qvec) / mx(qvec) / my(qvec) return std::vector<measure_handle> instead of std::vector<bool> (which is what the prior using measure_result = bool alias resolved to).

Required source migrations

  • __qpu__ auto kernel() { return mz(...); } - auto now deduces measure_handle (or std::vector<measure_handle>), which the host-device boundary rejects (see below). Spell the return type explicitly as bool / std::vector<bool>; for the vector case, also wrap with cudaq::to_bools(...).
  • cudaq::to_integer(mz(qvec)) - the bridge rejects this shim with a spec-named frontend diagnostic. The new
    cudaq::to_bools(std::vector<measure_handle>) API is the spec-mandated bulk discrimination surface.
  • Host-scope mz / mx / my -calling these outside a __qpu__ kernel now std::abort()s instead of silently returning a meaningless bool. Move the call into a kernel.

New compile-time errors

  • Host-device boundary rejection - Entry-point __qpu__ kernels cannot have measure_handle (transitively, including std::vector<measure_handle>, std::function<void(measure_handle)>, cudaq::qkernel<void(measure_handle)>)
    in parameter or return position. Diagnostic:
    measure_handle cannot cross the host-device boundary; entry-point kernels must discriminate first.
  • Functor-operator() - __qpu__ kernels with handle-bearing signatures -same diagnostic.

Downstream Impact

  • CUDA-QX: QEC stabilizer kernels in cudaqx/libraries/qec/... that currently consume std::vector<bool> from mz(ancz, ancx) will need a cudaq::to_bools(...) wrap. Tracked for the CUDA-QX follow-up PR.

Follow-up

  • Python frontend: Separate PR will mirror this for cudaq.kernel.

@khalatepradnya khalatepradnya changed the title [C++] Measure handle support [C++] Implement cudaq::measure_handle Apr 29, 2026
@khalatepradnya khalatepradnya added the breaking change Change breaks backwards compatibility label Apr 29, 2026
khalatepradnya added a commit to khalatepradnya/cuda-quantum that referenced this pull request May 1, 2026
Land 11 of 13 cold-review items in a single bundle. (C1 already
landed in a732fa55f3.) Each item is independently motivated and
tested; bundling preserves ordering invariants between them
(C4's data-layout gate must precede the I3/I4 test attribute
opt-in; C3's boundary recursion must precede I1's diagnostic
narrowing).

Critical items (correctness):
- C2a: explicit inline body for measure_handle::measure_handle()
  so user TUs that mention a handle aggregate or array
  (struct Holder { measure_handle h; }; measure_handle arr[N];)
  emit a COMDAT definition. With = default the compiler is free
  to inline the ctor away in every TU, leaving the bridge-emitted
  link-time symbol unresolved.
- C2b: isBoundHandle in ConvertExpr.cpp now walks through
  cc.compute_ptr / cc.cast to the base alloca; uninitialized
  array-element / aggregate-member discrimination correctly
  diagnoses, while a binding store anywhere in the same alloca
  switches subsequent loads to the bound-handle path.
- C3: new containsMeasureHandleAtBoundary recurses into callable
  signatures and FunctionTypes; ASTBridge.cpp hasMH switches to
  it so std::function<void(measure_handle)> and
  cudaq::qkernel<void(measure_handle)> parameters are caught at
  the host-device boundary. The plain containsMeasureHandle stays
  callable-blind for marshaling code that needs to know whether
  the value being moved is itself a handle.
- C4: FuseCastCascade ptr -> int -> ptr fold gated on integer
  hop width >= pointer width via llvm::DataLayout parsed from
  the module's llvm.data_layout attribute. ptr -> i32 -> ptr on
  a 64-bit target is no longer collapsed (the truncation is
  observable). Bridge-emitted modules always carry the
  attribute; hand-typed lit modules opt in via a module wrapper.

Important items (clarity / safety):
- I1: boundary diagnostic narrowed from isa<CXXMethodDecl> to
  CXXMethodDecl + OO_Call so non-call-operator __qpu__ members
  follow the silent-demotion path rather than getting diagnosed
  as functor entry-point violations.
- I2: silently-demoted free __qpu__ functions (handle in
  signature, not a functor) now get an aborting host stub at
  the user's mangled name. The stub calls
  __nvqpp_measureHandleHostBoundaryAbort (added in
  runtime/cudaq/platform/quantum_platform.cpp), which prints a
  clear diagnostic and aborts. Replaces what was previously an
  unresolved-symbol link error with a runtime error pointing at
  the cause.
- I3: handle_array_rw in qir_api_measure_handle.qke drops
  cudaq-entrypoint, matching the silent-demotion shape that
  bridge-produced IR for that signature actually has.
- I4: convert-to-qir-api -canonicalize end-to-end test locks the
  positive Result* round-trip fold + the narrow-int negative
  across pipeline stages, guarding against regressions in either
  the conversion pass's pattern set or the canonicalizer's
  fixpoint behavior.

Suggestions (defensive):
- S1: is_trivially_copyable_v + is_standard_layout_v
  static_asserts on measure_handle pin the host-side ABI shape
  the bridge marshaling code assumes.
- S2: to_integer in ConvertExpr.cpp loads if the argument
  arrives as cc.ptr<cc.stdvec<...>>, before the existing
  handle-stdvec discriminate insertion. Defensive: every C++
  shape exercised in tests already lvalue-to-rvalue lowers to
  the value form.
- S3: FuseCastCascade doc comment in CCOps.cpp carries the
  NVQIR-profile rationale; cast_fold.qke header now references
  it instead of duplicating.
- S4: to_bools(const std::vector<measure_handle>&) overload
  remains under #ifndef CUDAQ_LIBRARY_MODE; mode-asymmetry
  rationale comment added inline.

Lit sanity post-polish: AST-Quake + AST-error + Transforms =
304/306 pass + 2 pre-existing XFAIL (unrelated). The two new
tests (measure_handle_silent_demotion.cpp,
qir_api_measure_handle_e2e.qke) lock the I2 and I4 behavior in
place. measure_handle_to_integer.cpp brought into git from the
pre-existing working tree; it covers PR-3b-scoped to_integer
auto-discriminate plumbing.

Signed-off-by: Pradnya Khalate <pkhalate@nvidia.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 1, 2026

CI Summary — ❌ failed

Run #25349882587 · trigger push · ✅ 5 · ⏩ 7 · ❌ 0 · ⛔ 1

❌ Failed or cancelled
Job Result Link
build_and_test ⛔ cancelled view
Top-level jobs (13)
Job Result
binaries ⏩ skipped
build_and_test ⛔ cancelled
config_devdeps ✅ success
config_source_build ⏩ skipped
config_wheeldeps ✅ success
devdeps ✅ success
docker_image ⏩ skipped
gen_code_coverage ⏩ skipped
metadata ✅ success
python_metapackages ⏩ skipped
python_wheels ⏩ skipped
source_build ⏩ skipped
wheeldeps ✅ success
⏩ Skipped jobs (7) — intentionally skipped on PR builds; run on merge_group / workflow_dispatch
Job
binaries
config_source_build
docker_image
gen_code_coverage
python_metapackages
python_wheels
source_build
All sub-jobs (50) — every matrix leg, with links
Job Status Link
Build and test (amd64, clang16, openmpi) / Dev environment (Debug) ✅ success view
Build and test (amd64, clang16, openmpi) / Dev environment (Python) ✅ success view
Build and test (amd64, gcc11, openmpi) / Dev environment (Debug) ⛔ cancelled view
Build and test (amd64, gcc11, openmpi) / Dev environment (Python) ✅ success view
Build and test (amd64, gcc12, openmpi) / Dev environment (Debug) ⛔ cancelled view
Build and test (amd64, gcc12, openmpi) / Dev environment (Python) ✅ success view
Build and test (arm64, clang16, openmpi) / Dev environment (Debug) ⛔ cancelled view
Build and test (arm64, clang16, openmpi) / Dev environment (Python) ✅ success view
CI Summary ❔ in_progress view
Configure build (devdeps) ✅ success view
Configure build (source_build) ⏩ skipped view
Configure build (wheeldeps) ✅ success view
Create CUDA Quantum installer ⏩ skipped view
Create Docker images ⏩ skipped view
Create Python metapackages ⏩ skipped view
Create Python wheels ⏩ skipped view
Gen code coverage ⏩ skipped view
Load dependencies (amd64, clang16) / Caching ✅ success view
Load dependencies (amd64, clang16) / Finalize ✅ success view
Load dependencies (amd64, clang16) / Metadata ✅ success view
Load dependencies (amd64, gcc11) / Caching ✅ success view
Load dependencies (amd64, gcc11) / Finalize ✅ success view
Load dependencies (amd64, gcc11) / Metadata ✅ success view
Load dependencies (amd64, gcc12) / Caching ✅ success view
Load dependencies (amd64, gcc12) / Finalize ✅ success view
Load dependencies (amd64, gcc12) / Metadata ✅ success view
Load dependencies (arm64, clang16) / Caching ✅ success view
Load dependencies (arm64, clang16) / Finalize ✅ success view
Load dependencies (arm64, clang16) / Metadata ✅ success view
Load dependencies (arm64, gcc11) / Caching ✅ success view
Load dependencies (arm64, gcc11) / Finalize ✅ success view
Load dependencies (arm64, gcc11) / Metadata ✅ success view
Load dependencies (arm64, gcc12) / Caching ✅ success view
Load dependencies (arm64, gcc12) / Finalize ✅ success view
Load dependencies (arm64, gcc12) / Metadata ✅ success view
Load source build cache ⏩ skipped view
Load wheel dependencies (amd64, 12.6) / Caching ✅ success view
Load wheel dependencies (amd64, 12.6) / Finalize ✅ success view
Load wheel dependencies (amd64, 12.6) / Metadata ✅ success view
Load wheel dependencies (amd64, 13.0) / Caching ✅ success view
Load wheel dependencies (amd64, 13.0) / Finalize ✅ success view
Load wheel dependencies (amd64, 13.0) / Metadata ✅ success view
Load wheel dependencies (arm64, 12.6) / Caching ✅ success view
Load wheel dependencies (arm64, 12.6) / Finalize ✅ success view
Load wheel dependencies (arm64, 12.6) / Metadata ✅ success view
Load wheel dependencies (arm64, 13.0) / Caching ✅ success view
Load wheel dependencies (arm64, 13.0) / Finalize ✅ success view
Load wheel dependencies (arm64, 13.0) / Metadata ✅ success view
Prepare cache clean-up ❔ queued view
Retrieve PR info ✅ success view
⚠️ Required checks (5/8) — 3 missing — declared in .github/required-checks.yml for push
Required check Status Link
Build and test (amd64, clang16, openmpi) / Dev environment (Debug) ✅ success view
Build and test (amd64, clang16, openmpi) / Dev environment (Python) ✅ success view
Build and test (amd64, gcc11, openmpi) / Dev environment (Debug) ⛔ cancelled view
Build and test (amd64, gcc11, openmpi) / Dev environment (Python) ✅ success view
Build and test (amd64, gcc12, openmpi) / Dev environment (Debug) ⛔ cancelled view
Build and test (amd64, gcc12, openmpi) / Dev environment (Python) ✅ success view
Build and test (arm64, clang16, openmpi) / Dev environment (Debug) ⛔ cancelled view
Build and test (arm64, clang16, openmpi) / Dev environment (Python) ✅ success view

khalatepradnya added a commit to khalatepradnya/cuda-quantum that referenced this pull request May 1, 2026
Introduce `using measure_result = measure_handle;` in MLIR mode
(library mode keeps the legacy class) so existing source names get
the new deferred-measurement semantics without generating host
wrappers for device-only handle signatures. This is the Option C
direction confirmed in the 2026-04-30 runtime sync; rationale and
the rejected alternatives live in
`.cursor/measure-handle-rename-evaluation.md`.

Library-mode CMake routing for `unittests/` (`add_compile_definitions
(CUDAQ_LIBRARY_MODE)`) keeps host-side overloads in
`qpe_ftqc.cpp` and other measurement-using GTests that cannot be
intercepted by the bridge. Temporary workaround until library mode
itself is removed.

CI green-up across the bridge, marshalling, ODS verifiers, and the
expand-measurements / QIR conversion test suites for the new types.

Squashed from four `*`-prefixed WIP commits per PR NVIDIA#4409 history
cleanup; original SHAs: d84f6d8, 8fbe92d, 0e05f4c, ec5eac5.

Signed-off-by: Pradnya Khalate <pkhalate@nvidia.com>
khalatepradnya added a commit to khalatepradnya/cuda-quantum that referenced this pull request May 1, 2026
Land 11 of 13 cold-review items in a single bundle. (C1 already
landed in a732fa55f3.) Each item is independently motivated and
tested; bundling preserves ordering invariants between them
(C4's data-layout gate must precede the I3/I4 test attribute
opt-in; C3's boundary recursion must precede I1's diagnostic
narrowing).

Critical items (correctness):
- C2a: explicit inline body for measure_handle::measure_handle()
  so user TUs that mention a handle aggregate or array
  (struct Holder { measure_handle h; }; measure_handle arr[N];)
  emit a COMDAT definition. With = default the compiler is free
  to inline the ctor away in every TU, leaving the bridge-emitted
  link-time symbol unresolved.
- C2b: isBoundHandle in ConvertExpr.cpp now walks through
  cc.compute_ptr / cc.cast to the base alloca; uninitialized
  array-element / aggregate-member discrimination correctly
  diagnoses, while a binding store anywhere in the same alloca
  switches subsequent loads to the bound-handle path.
- C3: new containsMeasureHandleAtBoundary recurses into callable
  signatures and FunctionTypes; ASTBridge.cpp hasMH switches to
  it so std::function<void(measure_handle)> and
  cudaq::qkernel<void(measure_handle)> parameters are caught at
  the host-device boundary. The plain containsMeasureHandle stays
  callable-blind for marshaling code that needs to know whether
  the value being moved is itself a handle.
- C4: FuseCastCascade ptr -> int -> ptr fold gated on integer
  hop width >= pointer width via llvm::DataLayout parsed from
  the module's llvm.data_layout attribute. ptr -> i32 -> ptr on
  a 64-bit target is no longer collapsed (the truncation is
  observable). Bridge-emitted modules always carry the
  attribute; hand-typed lit modules opt in via a module wrapper.

Important items (clarity / safety):
- I1: boundary diagnostic narrowed from isa<CXXMethodDecl> to
  CXXMethodDecl + OO_Call so non-call-operator __qpu__ members
  follow the silent-demotion path rather than getting diagnosed
  as functor entry-point violations.
- I2: silently-demoted free __qpu__ functions (handle in
  signature, not a functor) now get an aborting host stub at
  the user's mangled name. The stub calls
  __nvqpp_measureHandleHostBoundaryAbort (added in
  runtime/cudaq/platform/quantum_platform.cpp), which prints a
  clear diagnostic and aborts. Replaces what was previously an
  unresolved-symbol link error with a runtime error pointing at
  the cause.
- I3: handle_array_rw in qir_api_measure_handle.qke drops
  cudaq-entrypoint, matching the silent-demotion shape that
  bridge-produced IR for that signature actually has.
- I4: convert-to-qir-api -canonicalize end-to-end test locks the
  positive Result* round-trip fold + the narrow-int negative
  across pipeline stages, guarding against regressions in either
  the conversion pass's pattern set or the canonicalizer's
  fixpoint behavior.

Suggestions (defensive):
- S1: is_trivially_copyable_v + is_standard_layout_v
  static_asserts on measure_handle pin the host-side ABI shape
  the bridge marshaling code assumes.
- S2: to_integer in ConvertExpr.cpp loads if the argument
  arrives as cc.ptr<cc.stdvec<...>>, before the existing
  handle-stdvec discriminate insertion. Defensive: every C++
  shape exercised in tests already lvalue-to-rvalue lowers to
  the value form.
- S3: FuseCastCascade doc comment in CCOps.cpp carries the
  NVQIR-profile rationale; cast_fold.qke header now references
  it instead of duplicating.
- S4: to_bools(const std::vector<measure_handle>&) overload
  remains under #ifndef CUDAQ_LIBRARY_MODE; mode-asymmetry
  rationale comment added inline.

Lit sanity post-polish: AST-Quake + AST-error + Transforms =
304/306 pass + 2 pre-existing XFAIL (unrelated). The two new
tests (measure_handle_silent_demotion.cpp,
qir_api_measure_handle_e2e.qke) lock the I2 and I4 behavior in
place. measure_handle_to_integer.cpp brought into git from the
pre-existing working tree; it covers PR-3b-scoped to_integer
auto-discriminate plumbing.

Signed-off-by: Pradnya Khalate <pkhalate@nvidia.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Signed-off-by: Pradnya Khalate <pkhalate@nvidia.com>
khalatepradnya added a commit to khalatepradnya/cuda-quantum that referenced this pull request May 1, 2026
PR NVIDIA#4409 widened `quake.mz` ODS to return `!cc.measure_handle`
(scalar) and `!cc.stdvec<!cc.measure_handle>` (vector). The
state-init regression test still asserted the legacy
`!quake.measure` spelling in its MLIR FileCheck blocks; update
the four affected CHECK lines.

Signed-off-by: Pradnya Khalate <pkhalate@nvidia.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Signed-off-by: Pradnya Khalate <pkhalate@nvidia.com>
khalatepradnya added a commit to khalatepradnya/cuda-quantum that referenced this pull request May 1, 2026
PR NVIDIA#4409 made MLIR-mode `mz` return `cudaq::measure_handle`,
so `__qpu__ auto operator()()` now deduces a `measure_handle`
return type. The boundary diagnostic introduced in the
polish-round (`OO_Call` + entry-point) then rejects the kernel.
Spell the return type as `bool` so the bridge inserts
`quake.discriminate` at the bool-typed return boundary, matching
the fix already used in `conditional_run.cpp` and
`sample_to_run_migration.cpp`.

Signed-off-by: Pradnya Khalate <pkhalate@nvidia.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Signed-off-by: Pradnya Khalate <pkhalate@nvidia.com>
khalatepradnya added a commit to khalatepradnya/cuda-quantum that referenced this pull request May 1, 2026
PR NVIDIA#4409 made MLIR-mode `mz(qview)` return
`std::vector<measure_handle>`, so `__qpu__ auto kernel2()` now
deduces a handle-bearing return. `Marshal.cpp::hasLegalType`
filters such kernels out of `GenKernelExecution`'s worklist, the
`.run` companion is never generated, and `cudaq::run(1000,
kernel2)` fails at runtime with `runnable kernel ... is not
present`.

Spell the return as `std::vector<bool>` and wrap `mz(q)` with
`cudaq::to_bools(...)` so the kernel signature stays in the
host-visible subset. Host-side `result[0]` is now a `bool` and
`static_cast<int>(result[0])` is well-defined; previously it
would have routed through `measure_handle::operator bool()`,
which `std::abort()`s in MLIR mode.

The `[Begin Run0]` / `[End Run0]` Sphinx literalinclude markers
are unchanged.

Signed-off-by: Pradnya Khalate <pkhalate@nvidia.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Signed-off-by: Pradnya Khalate <pkhalate@nvidia.com>
khalatepradnya added a commit to khalatepradnya/cuda-quantum that referenced this pull request May 1, 2026
PR NVIDIA#4409 review (High): the conditional-store shape

  cudaq::measure_handle h;
  if (cond) h = mz(q);
  bool b = h;

was silently accepted along the cond=false path because the
prior `isBoundHandle` walk treated *any* store reaching the
scalar-handle alloca as binding, regardless of CFG dominance.
The IR-level result was a `quake.discriminate` over an
uninitialized `i64` payload, which downstream lowering casts to
a `Result*`.

Tighten the scalar-handle alloca path to require that the store
dominate the load (`mlir::DominanceInfo`, computed lazily once
per coercion site and shared across recursion levels). The
aggregate / array path stays coarse for the same reason it was
coarse before (per-element/per-member tracking needs reasoning
about SSA `cc.compute_ptr` indices).

Trade-off documented inline: the all-paths-store shape

  if (c) h = mz(q1); else h = mz(q2);
  bool b = h;

currently fails the dominance check too -- a false positive that
the user can rewrite around with one bind, e.g. `auto h =
mz(q);`. A proper definite-assignment dataflow pass would accept
that shape; tracked as a followup along with the function-arg
case the reviewer flagged.

Test: new `ConditionalStoreUnbound` case in
`measure_handle_unbound_copy.cpp` locks the diagnostic on the
single-branch shape; `ConditionalStoreAfterBind` keeps the
already-bound-then-rewritten shape passing so dominance does not
regress legitimate code.

Signed-off-by: Pradnya Khalate <pkhalate@nvidia.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Signed-off-by: Pradnya Khalate <pkhalate@nvidia.com>
khalatepradnya added a commit to khalatepradnya/cuda-quantum that referenced this pull request May 1, 2026
PR NVIDIA#4409 review (Medium): the bridge silently inserted a
`quake.discriminate` when the user wrote
`cudaq::to_integer(mz(qvec))`, accepting code the spec
(`measure_handle.bs` §C++ API L96) explicitly mandates be
migrated to `cudaq::to_integer(cudaq::to_bools(mz(qvec)))`. The
shim hid the new bulk-discrimination API and let the
spec-mandated migration regress in user code.

Replace the auto-insert at the `to_integer` call lowering with a
spec-named frontend diagnostic. Push a placeholder `i64 zero` to
keep the value stack balanced (same pattern as the unbound-handle
diagnostic) so the outer `TraverseStmt` does not stack a generic
"statement not supported" error on top.

Migrate the two in-tree C++ callers to the explicit form:
  - `targettests/execution/to_integer.cpp:25`
  - `targettests/execution/measurement_cleanup.cpp:44`
(Pre-existing 4-space indent drift on the latter line is left as
is to avoid unrelated whitespace churn.)

Reframe `test/AST-Quake/measure_handle_to_integer.cpp`: the
`ToIntegerDirect` case asserted the auto-insert behavior we are
removing -- delete it. Keep `ToIntegerExplicit` as the positive
IR-shape lock and point at the negative path. Add the matching
negative case `ToIntegerDirectRejected` to
`test/AST-error/measure_handle.cpp` so the diagnostic text is
locked.

`unittests/integration/qpe_ftqc.cpp` is unaffected: the
`unittests/` build defines `CUDAQ_LIBRARY_MODE`, so its
`to_integer(mz(...))` call uses the host-side overload at
`runtime/cudaq/qis/qubit_qis.h:640` and never reaches the bridge.

Python callers (`test_assignments.py`, `test_to_integer.py`)
stay on the auto-insert path until PR 3c (Python frontend)
applies the equivalent change.

Signed-off-by: Pradnya Khalate <pkhalate@nvidia.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Signed-off-by: Pradnya Khalate <pkhalate@nvidia.com>
khalatepradnya added a commit to khalatepradnya/cuda-quantum that referenced this pull request May 1, 2026
Land 11 of 13 cold-review items in a single bundle. (C1 already
landed in a732fa55f3.) Each item is independently motivated and
tested; bundling preserves ordering invariants between them
(C4's data-layout gate must precede the I3/I4 test attribute
opt-in; C3's boundary recursion must precede I1's diagnostic
narrowing).

Critical items (correctness):
- C2a: explicit inline body for measure_handle::measure_handle()
  so user TUs that mention a handle aggregate or array
  (struct Holder { measure_handle h; }; measure_handle arr[N];)
  emit a COMDAT definition. With = default the compiler is free
  to inline the ctor away in every TU, leaving the bridge-emitted
  link-time symbol unresolved.
- C2b: isBoundHandle in ConvertExpr.cpp now walks through
  cc.compute_ptr / cc.cast to the base alloca; uninitialized
  array-element / aggregate-member discrimination correctly
  diagnoses, while a binding store anywhere in the same alloca
  switches subsequent loads to the bound-handle path.
- C3: new containsMeasureHandleAtBoundary recurses into callable
  signatures and FunctionTypes; ASTBridge.cpp hasMH switches to
  it so std::function<void(measure_handle)> and
  cudaq::qkernel<void(measure_handle)> parameters are caught at
  the host-device boundary. The plain containsMeasureHandle stays
  callable-blind for marshaling code that needs to know whether
  the value being moved is itself a handle.
- C4: FuseCastCascade ptr -> int -> ptr fold gated on integer
  hop width >= pointer width via llvm::DataLayout parsed from
  the module's llvm.data_layout attribute. ptr -> i32 -> ptr on
  a 64-bit target is no longer collapsed (the truncation is
  observable). Bridge-emitted modules always carry the
  attribute; hand-typed lit modules opt in via a module wrapper.

Important items (clarity / safety):
- I1: boundary diagnostic narrowed from isa<CXXMethodDecl> to
  CXXMethodDecl + OO_Call so non-call-operator __qpu__ members
  follow the silent-demotion path rather than getting diagnosed
  as functor entry-point violations.
- I2: silently-demoted free __qpu__ functions (handle in
  signature, not a functor) now get an aborting host stub at
  the user's mangled name. The stub calls
  __nvqpp_measureHandleHostBoundaryAbort (added in
  runtime/cudaq/platform/quantum_platform.cpp), which prints a
  clear diagnostic and aborts. Replaces what was previously an
  unresolved-symbol link error with a runtime error pointing at
  the cause.
- I3: handle_array_rw in qir_api_measure_handle.qke drops
  cudaq-entrypoint, matching the silent-demotion shape that
  bridge-produced IR for that signature actually has.
- I4: convert-to-qir-api -canonicalize end-to-end test locks the
  positive Result* round-trip fold + the narrow-int negative
  across pipeline stages, guarding against regressions in either
  the conversion pass's pattern set or the canonicalizer's
  fixpoint behavior.

Suggestions (defensive):
- S1: is_trivially_copyable_v + is_standard_layout_v
  static_asserts on measure_handle pin the host-side ABI shape
  the bridge marshaling code assumes.
- S2: to_integer in ConvertExpr.cpp loads if the argument
  arrives as cc.ptr<cc.stdvec<...>>, before the existing
  handle-stdvec discriminate insertion. Defensive: every C++
  shape exercised in tests already lvalue-to-rvalue lowers to
  the value form.
- S3: FuseCastCascade doc comment in CCOps.cpp carries the
  NVQIR-profile rationale; cast_fold.qke header now references
  it instead of duplicating.
- S4: to_bools(const std::vector<measure_handle>&) overload
  remains under #ifndef CUDAQ_LIBRARY_MODE; mode-asymmetry
  rationale comment added inline.

Lit sanity post-polish: AST-Quake + AST-error + Transforms =
304/306 pass + 2 pre-existing XFAIL (unrelated). The two new
tests (measure_handle_silent_demotion.cpp,
qir_api_measure_handle_e2e.qke) lock the I2 and I4 behavior in
place. measure_handle_to_integer.cpp brought into git from the
pre-existing working tree; it covers PR-3b-scoped to_integer
auto-discriminate plumbing.

Signed-off-by: Pradnya Khalate <pkhalate@nvidia.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
khalatepradnya added a commit to khalatepradnya/cuda-quantum that referenced this pull request May 1, 2026
PR NVIDIA#4409 widened `quake.mz` ODS to return `!cc.measure_handle`
(scalar) and `!cc.stdvec<!cc.measure_handle>` (vector). The
state-init regression test still asserted the legacy
`!quake.measure` spelling in its MLIR FileCheck blocks; update
the four affected CHECK lines.

Signed-off-by: Pradnya Khalate <pkhalate@nvidia.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
khalatepradnya added a commit to khalatepradnya/cuda-quantum that referenced this pull request May 1, 2026
PR NVIDIA#4409 made MLIR-mode `mz` return `cudaq::measure_handle`,
so `__qpu__ auto operator()()` now deduces a `measure_handle`
return type. The boundary diagnostic introduced in the
polish-round (`OO_Call` + entry-point) then rejects the kernel.
Spell the return type as `bool` so the bridge inserts
`quake.discriminate` at the bool-typed return boundary, matching
the fix already used in `conditional_run.cpp` and
`sample_to_run_migration.cpp`.

Signed-off-by: Pradnya Khalate <pkhalate@nvidia.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
khalatepradnya added a commit to khalatepradnya/cuda-quantum that referenced this pull request May 1, 2026
PR NVIDIA#4409 made MLIR-mode `mz(qview)` return
`std::vector<measure_handle>`, so `__qpu__ auto kernel2()` now
deduces a handle-bearing return. `Marshal.cpp::hasLegalType`
filters such kernels out of `GenKernelExecution`'s worklist, the
`.run` companion is never generated, and `cudaq::run(1000,
kernel2)` fails at runtime with `runnable kernel ... is not
present`.

Spell the return as `std::vector<bool>` and wrap `mz(q)` with
`cudaq::to_bools(...)` so the kernel signature stays in the
host-visible subset. Host-side `result[0]` is now a `bool` and
`static_cast<int>(result[0])` is well-defined; previously it
would have routed through `measure_handle::operator bool()`,
which `std::abort()`s in MLIR mode.

The `[Begin Run0]` / `[End Run0]` Sphinx literalinclude markers
are unchanged.

Signed-off-by: Pradnya Khalate <pkhalate@nvidia.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
khalatepradnya added a commit to khalatepradnya/cuda-quantum that referenced this pull request May 1, 2026
PR NVIDIA#4409 review (High): the conditional-store shape

  cudaq::measure_handle h;
  if (cond) h = mz(q);
  bool b = h;

was silently accepted along the cond=false path because the
prior `isBoundHandle` walk treated *any* store reaching the
scalar-handle alloca as binding, regardless of CFG dominance.
The IR-level result was a `quake.discriminate` over an
uninitialized `i64` payload, which downstream lowering casts to
a `Result*`.

Tighten the scalar-handle alloca path to require that the store
dominate the load (`mlir::DominanceInfo`, computed lazily once
per coercion site and shared across recursion levels). The
aggregate / array path stays coarse for the same reason it was
coarse before (per-element/per-member tracking needs reasoning
about SSA `cc.compute_ptr` indices).

Trade-off documented inline: the all-paths-store shape

  if (c) h = mz(q1); else h = mz(q2);
  bool b = h;

currently fails the dominance check too -- a false positive that
the user can rewrite around with one bind, e.g. `auto h =
mz(q);`. A proper definite-assignment dataflow pass would accept
that shape; tracked as a followup along with the function-arg
case the reviewer flagged.

Test: new `ConditionalStoreUnbound` case in
`measure_handle_unbound_copy.cpp` locks the diagnostic on the
single-branch shape; `ConditionalStoreAfterBind` keeps the
already-bound-then-rewritten shape passing so dominance does not
regress legitimate code.

Signed-off-by: Pradnya Khalate <pkhalate@nvidia.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
khalatepradnya added a commit to khalatepradnya/cuda-quantum that referenced this pull request May 1, 2026
PR NVIDIA#4409 review (Medium): the bridge silently inserted a
`quake.discriminate` when the user wrote
`cudaq::to_integer(mz(qvec))`, accepting code the spec
(`measure_handle.bs` §C++ API L96) explicitly mandates be
migrated to `cudaq::to_integer(cudaq::to_bools(mz(qvec)))`. The
shim hid the new bulk-discrimination API and let the
spec-mandated migration regress in user code.

Replace the auto-insert at the `to_integer` call lowering with a
spec-named frontend diagnostic. Push a placeholder `i64 zero` to
keep the value stack balanced (same pattern as the unbound-handle
diagnostic) so the outer `TraverseStmt` does not stack a generic
"statement not supported" error on top.

Migrate the two in-tree C++ callers to the explicit form:
  - `targettests/execution/to_integer.cpp:25`
  - `targettests/execution/measurement_cleanup.cpp:44`
(Pre-existing 4-space indent drift on the latter line is left as
is to avoid unrelated whitespace churn.)

Reframe `test/AST-Quake/measure_handle_to_integer.cpp`: the
`ToIntegerDirect` case asserted the auto-insert behavior we are
removing -- delete it. Keep `ToIntegerExplicit` as the positive
IR-shape lock and point at the negative path. Add the matching
negative case `ToIntegerDirectRejected` to
`test/AST-error/measure_handle.cpp` so the diagnostic text is
locked.

`unittests/integration/qpe_ftqc.cpp` is unaffected: the
`unittests/` build defines `CUDAQ_LIBRARY_MODE`, so its
`to_integer(mz(...))` call uses the host-side overload at
`runtime/cudaq/qis/qubit_qis.h:640` and never reaches the bridge.

Python callers (`test_assignments.py`, `test_to_integer.py`)
stay on the auto-insert path until PR 3c (Python frontend)
applies the equivalent change.

Signed-off-by: Pradnya Khalate <pkhalate@nvidia.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
khalatepradnya and others added 4 commits May 4, 2026 17:12
Widen the `expand-measurements` pass to lower vector-form
`quake.mz`/`mx`/`my` whose result is `!cc.stdvec<!cc.measure_handle>`,
mirroring the legacy `!cc.stdvec<!quake.measure>` path. Adds a
secondary `ExpandStdvecHandleDiscriminate` pattern for the
post-SSA-boundary case where `quake.discriminate` consumes a handle
vector that the bridge has stored to / loaded from memory.

Co-authored-by: Cursor <cursoragent@cursor.com>
Signed-off-by: Pradnya Khalate <pkhalate@nvidia.com>
Implements the C++ runtime half of the measure_handle spec
(cudaq-spec/proposals/measure_handle.bs). measure_handle is the new
return type of mz / mx / my; the legacy `measure_result` spelling is
preserved via Option C — under MLIR mode `measure_result` is a `using`
alias for `measure_handle`, so existing callers compile unchanged but
gain the new handle semantics. Library mode keeps the legacy
`class measure_result` block untouched, including the
`__nvqpp__MeasureResultBoolConversion` adaptive-feedback hook.

Spec invariant: mz / mx / my are __qpu__-only entry points (Kernel
Signature Rule). MLIR-mode inline bodies trap with `std::abort()` so
host-scope misuse fails loudly instead of computing on a meaningless
value. measure_handle::operator bool() also aborts: the bridge
intercepts every legitimate bool coercion at AST time and emits
quake.discriminate, so reaching the host body means a bridge
interception path was missed and the program would otherwise compute
on a meaningless `bool`.

Files
- runtime/cudaq/qis/measure_handle.h (new): class declaration gated by
  `#ifndef CUDAQ_LIBRARY_MODE`. Tag-dispatched
  `measure_handle(handle_index, idx)` constructor reserved for runtime
  use; default constructor produces an unbound handle whose `index`
  carries a `numeric_limits<int64_t>::max()` sentinel.
  `static_assert`s pin the i64 payload width / trivially copyable /
  standard layout invariants the bridge marshalling relies on.
- runtime/cudaq/qis/execution_manager.h: under
  `#ifdef CUDAQ_LIBRARY_MODE` the existing `class measure_result` is
  unchanged; under `#else` the previous `using measure_result = bool`
  becomes `using measure_result = measure_handle`.
- runtime/cudaq/qis/qubit_qis.h: MLIR-mode bodies of `measureZ` /
  `measureX` / `measureY` and the bulk `to_bools` overload trap with
  `std::abort()`. Library-mode behavior is preserved verbatim.

Bridge wiring, byte-size machinery, QIR conversion gap fix, boundary
verifier, test migration, and docs updates follow as separate commits
in this PR.

Co-authored-by: Cursor <cursoragent@cursor.com>
Signed-off-by: Pradnya Khalate <pkhalate@nvidia.com>
…nery, and QIR conversion

Consolidates the bridge-side, type-system, and QIR-conversion work for
the measure_handle PR stack. The runtime API arrived in the previous
commit; this commit makes the AST bridge produce !cc.measure_handle
SSA values, teaches the verifier to reject handles at the host-device
boundary, fills the byte-size and marshaling gaps, and patches the QIR
conversion so handle pointer/stdvec ops survive --convert-to-qir-api.

Type-system support
- include/cudaq/Optimizer/Dialect/CC/CCTypes.h, CCTypes.cpp:
  containsMeasureHandle (value-shape check) and
  containsMeasureHandleAtBoundary (recursive into callable signatures
  and bare function types). The boundary variant is required so
  `std::function<void(measure_handle)>` and `cudaq::qkernel<...>`
  parameters are caught at entry-point classification.
- lib/Optimizer/Dialect/CC/CCOps.cpp:
  MeasureHandleType case in getByteSizeOfType returning a constant
  8 bytes, the IR-mode width of a class with a single std::int64_t
  field. Without this, a pure-device kernel returning
  std::vector<measure_handle> aborts in ConvertStmt.cpp with
  "unhandled vector element type" because __nvqpp_vectorCopyCtor
  cannot get a constant element size for the heap-copy prologue.

AST bridge
- lib/Frontend/nvqpp/ConvertType.cpp, ConvertDecl.cpp:
  cudaq::measure_handle maps to !cc.measure_handle;
  std::vector<measure_handle> is recognised in measurement
  register-name handling.
- lib/Frontend/nvqpp/ConvertExpr.cpp: the central rewire.
  * mz / mx / my emit !cc.measure_handle (scalar) or
    !cc.stdvec<!cc.measure_handle> (range/variadic) directly.
  * CK_UserDefinedConversion at measure_handle::operator bool inserts
    quake.discriminate at every spec-mandated bool-coercion site.
  * The discriminate-insertion path runs an isBoundHandle check that
    walks through cc.compute_ptr / cc.cast to the base alloca and,
    on the scalar-handle alloca shape, requires that a binding store
    dominate the load (mlir::DominanceInfo, computed lazily once per
    coercion site). Conditional-store shapes that previously emitted
    a discriminate over an uninitialized i64 payload now diagnose.
  * cudaq::to_bools is intercepted by name and lowered to a vectorized
    quake.discriminate on the entire handle stdvec; it is the bulk
    counterpart to operator bool.
  * cudaq::to_integer rejects vector<measure_handle> with a
    spec-named diagnostic (per measure_handle.bs §C++ API): the
    silent auto-insert that hid the bulk-discrimination API is gone.
  * measure_handle copy/move construction and operator= are
    intercepted as value-typed aliasing of the sub-i64 stack value;
    chained `h3 = h2 = h;` works because the dispatch drops the
    callee value the visitor pushed.
  * default-construct produces only the storage slot (cc.alloca);
    VisitVarDecl binds it directly so any read at a discriminate
    site is statically diagnosed by the unbound-handle path.
- lib/Frontend/nvqpp/ASTBridge.cpp: __qpu__ entry-point classification
  rejects functor operator() shapes whose signature transitively
  mentions measure_handle, the only disambiguable spec violation at
  AST time.

Marshaling and QIR conversion
- lib/Optimizer/Builder/Marshal.cpp:
  hasLegalType extends the entry-point predicate to reject
  measure_handle alongside qubit-typed parameters/results.
  lookupHostEntryPointFunc early-returns for device-only kernels
  whose signature cannot cross the host boundary, so the host-side
  rewriter skips them entirely.
- lib/Optimizer/CodeGen/ConvertToQIRAPI.cpp:
  The TypeConverter rewrites !cc.measure_handle to i64, but
  cc.compute_ptr / cc.stdvec_data / cc.stdvec_init / cc.stdvec_size
  carrying handle pointer or stdvec types had no patterns and no
  dynamic-legality predicates, so the framework left them
  legal-by-default and inserted unrealized_conversion_casts that
  applyPartialConversion could not resolve. Add OpInterfacePattern
  instantiations and extend the legality predicate so all four ops
  participate in the same operand/result-type rewrite the existing
  CC pointer ops already use.

LLVM 22 idiom
- All 15 op-creation sites added by this commit in ConvertExpr.cpp
  use the LLVM 22 form Op::create(builder, loc, ...). Two of those
  sites are arith::ConstantIntOp poison-result fallbacks (the
  unbound-handle and to_integer-rejection paths) and additionally
  use the LLVM 22 (builder, loc, type, value) signature.

Tests, runtime helpers, and docs follow as the next commit in this
PR. The dialect type itself (PR NVIDIA#4403) and the QIR conversion's
TypeConverter entry for !cc.measure_handle (PR NVIDIA#4404) are already on
main.

Co-authored-by: Cursor <cursoragent@cursor.com>
Signed-off-by: Pradnya Khalate <pkhalate@nvidia.com>
Closes the test surface for the measure_handle PR stack: every existing
AST-Quake / Transforms / targettests / docs site that observed the
old `!quake.measure` SSA shape now matches the new `!cc.measure_handle`
shape, and the new test files locking down spec-mandated behavior land
alongside.

New tests
- test/AST-Quake/measure_handle.cpp: scalar-handle bridge shape lock,
  including the bind-store / load-with-dominance round-trip and the
  `quake.discriminate` insertion at every CK_UserDefinedConversion site
  the bridge intercepts.
- test/AST-Quake/measure_handle_qir.cpp: end-to-end QIR conversion
  shape for a kernel whose handle escapes a basic block, exercising
  the cc.compute_ptr / cc.stdvec_data / cc.stdvec_init / cc.stdvec_size
  patterns added to ConvertToQIRAPI in the previous commit.
- test/AST-error/measure_handle.cpp: spec-named diagnostics for the
  unbound-handle path, the host-boundary `std::vector<measure_handle>`
  rejection, and the `cudaq::to_integer(mz(qvec))` rejection.
- test/Transforms/qir_api_measure_handle.qke: lit-replay of the QIR
  conversion's measure_handle paths against hand-rolled IR, so a
  bridge-side regression cannot mask a conversion-side regression.
- test/Transforms/cast_fold.qke: extends the cast-fold checks to the
  no-op case where a discriminate result is immediately re-cast.
- test/AST-Quake/qir_profiles.cpp: rename of base_profile-1.cpp; the
  rename is the only PR-3b contribution to that file -- LLVM 22 had
  already removed the `read_result` lines this PR originally targeted.

CHECK migrations (all unchanged shape, just the type string)
- 33 test/AST-Quake/*.cpp + 11 targettests/{Kernel,execution}/*.cpp +
  3 test/Transforms/*.qke + 3 docs/sphinx/examples/cpp/**/*.cpp
  switched from `!quake.measure` to `!cc.measure_handle` and, where
  the bridge inserts the bound-handle round-trip, picked up the
  matching `cc.alloca` / `cc.store` / `cc.load` CHECK lines.

LLVM 22 conflict resolutions
- test/AST-Quake/bug_3270.cpp: layered PR 3b's type change on LLVM 22's
  loosened `result%{{.*}}` regex form; the unused SSA captures from
  the pre-LLVM-22 form are dropped (no downstream CHECK referenced
  them).
- test/AST-Quake/if.cpp: PR 3b's `kernel_short_circuit_or` CHECK form
  is preserved verbatim (alloca/store/load + discriminate + cmpi ne
  against `arith.constant false`); the `arith.constant false` CHECK
  line LLVM 22 dropped is re-added because PR 3b's bool coercion still
  emits a `cmpi ne, x, false`.
- test/AST-Quake/to_qir.cpp: layered PR 3b's load-delay intent (move
  the `load i1, ptr %VAL_9` from before the second mz to inside the
  successor block of `br i1 %VAL_12`) on top of LLVM 22's opaque-ptr
  form; the typed-pointer / `bitcast %Result*` CHECK form from the
  pre-LLVM-22 base is dropped, and basic-block label captures stay on
  LLVM 22's loose `{{[0-9]+}}` form.
- test/AST-Quake/qir_profiles.cpp: PR 3b's only intent vs the OLD base
  was removing 6 `__quantum__qis__read_result__body` CHECK lines;
  LLVM 22 had already removed the same six lines (and additionally
  added an `array_record_output` line and the typed-to-opaque pointer
  conversion). The rename is the remaining PR 3b contribution.
- test/AST-Quake/cudaq_run.cpp, test/AST-Quake/qalloc_initialization.cpp,
  test/Transforms/qir_base_profile.qke: 3-way merge resolved cleanly;
  LLVM 22's `!llvm.ptr` / opaque-pointer churn lives alongside PR 3b's
  type-name updates without conflict.

Dropped
- unittests/CMakeLists.txt: PR 3b's directory-scope
  `add_compile_definitions(CUDAQ_LIBRARY_MODE)` is already on main as
  of PR NVIDIA#4427 (which split out the same workaround). PR 3b's wording
  reword would be a drive-by; main's wording stays.

Docs and examples
- docs/sphinx/examples/cpp/measuring_kernels.cpp,
  docs/sphinx/examples/cpp/sample_to_run_migration.cpp,
  docs/sphinx/examples/cpp/basics/mid_circuit_measurement.cpp:
  reflect the spec-mandated `cudaq::to_bools(mz(...))` and
  `bool` return-type form so the examples compile under measure_handle.

Co-authored-by: Cursor <cursoragent@cursor.com>
Signed-off-by: Pradnya Khalate <pkhalate@nvidia.com>
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 5, 2026

CI Summary (push) — ✅ passed

Run #25411065421 · ✅ 6 · ⏩ 7 · ❌ 0 · ⛔ 0

Top-level jobs (13)
Job Result
binaries ⏩ skipped
build_and_test ✅ success
config_devdeps ✅ success
config_source_build ⏩ skipped
config_wheeldeps ✅ success
devdeps ✅ success
docker_image ⏩ skipped
gen_code_coverage ⏩ skipped
metadata ✅ success
python_metapackages ⏩ skipped
python_wheels ⏩ skipped
source_build ⏩ skipped
wheeldeps ✅ success
⏩ Skipped jobs (7) — intentionally skipped on PR builds; run on merge_group / workflow_dispatch
Job
binaries
config_source_build
docker_image
gen_code_coverage
python_metapackages
python_wheels
source_build
All sub-jobs (40) — every matrix leg, with links
Job Status Link
Build and test (amd64, llvm, openmpi) / Dev environment (Debug) ✅ success view
Build and test (amd64, llvm, openmpi) / Dev environment (Python) ✅ success view
Build and test (arm64, llvm, openmpi) / Dev environment (Debug) ✅ success view
Build and test (arm64, llvm, openmpi) / Dev environment (Python) ✅ success view
CI Summary ❔ in_progress view
Configure build (devdeps) ✅ success view
Configure build (source_build) ⏩ skipped view
Configure build (wheeldeps) ✅ success view
Create CUDA Quantum installer ⏩ skipped view
Create Docker images ⏩ skipped view
Create Python metapackages ⏩ skipped view
Create Python wheels ⏩ skipped view
Gen code coverage ⏩ skipped view
Load dependencies (amd64, gcc12) / Caching ✅ success view
Load dependencies (amd64, gcc12) / Finalize ✅ success view
Load dependencies (amd64, gcc12) / Metadata ✅ success view
Load dependencies (amd64, llvm) / Caching ✅ success view
Load dependencies (amd64, llvm) / Finalize ✅ success view
Load dependencies (amd64, llvm) / Metadata ✅ success view
Load dependencies (arm64, gcc12) / Caching ✅ success view
Load dependencies (arm64, gcc12) / Finalize ✅ success view
Load dependencies (arm64, gcc12) / Metadata ✅ success view
Load dependencies (arm64, llvm) / Caching ✅ success view
Load dependencies (arm64, llvm) / Finalize ✅ success view
Load dependencies (arm64, llvm) / Metadata ✅ success view
Load source build cache ⏩ skipped view
Load wheel dependencies (amd64, 12.6) / Caching ✅ success view
Load wheel dependencies (amd64, 12.6) / Finalize ✅ success view
Load wheel dependencies (amd64, 12.6) / Metadata ✅ success view
Load wheel dependencies (amd64, 13.0) / Caching ✅ success view
Load wheel dependencies (amd64, 13.0) / Finalize ✅ success view
Load wheel dependencies (amd64, 13.0) / Metadata ✅ success view
Load wheel dependencies (arm64, 12.6) / Caching ✅ success view
Load wheel dependencies (arm64, 12.6) / Finalize ✅ success view
Load wheel dependencies (arm64, 12.6) / Metadata ✅ success view
Load wheel dependencies (arm64, 13.0) / Caching ✅ success view
Load wheel dependencies (arm64, 13.0) / Finalize ✅ success view
Load wheel dependencies (arm64, 13.0) / Metadata ✅ success view
Prepare cache clean-up ❔ in_progress view
Retrieve PR info ✅ success view
⚠️ Required checks (4/6) — 2 missing — declared in .github/required-checks.yml for push
Required check Status Link
Build and test (amd64, llvm, openmpi) / Dev environment (Debug) ✅ success view
Build and test (amd64, llvm, openmpi) / Dev environment (Python) ✅ success view
Build and test (arm64, llvm, openmpi) / Dev environment (Debug) ✅ success view
Build and test (arm64, llvm, openmpi) / Dev environment (Python) ✅ success view
Build and test (amd64, gcc12, openmpi) / Dev environment (Debug) ❔ missing
Build and test (amd64, gcc12, openmpi) / Dev environment (Python) ❔ missing

`measure_handle` bool-coercion used `mlir::DominanceInfo`, which can't
see ops in the orphan region built by `cc::IfOp::create` for the
operands of `&&`, `||`, and `?:` — so named handles like
`if (result0 && result1)` were rejected as unbound. Replace with a
structural ancestor walk that works on partially-built IR.

Also refresh two stale `||` CHECK blocks (`kernel_short_circuit_or`,
`HandleOr`) that the LLVM-22 canonicalizer broke by folding
`cmpi ne %i1, false` to `%i1`. Add three regression patterns covering
the orphan-region path.

Co-authored-by: Cursor Agent <cursoragent@cursor.com>
Signed-off-by: Pradnya Khalate <pkhalate@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

breaking change Change breaks backwards compatibility

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant