[C++] Implement cudaq::measure_handle#4409
Draft
khalatepradnya wants to merge 10 commits intoNVIDIA:mainfrom
Draft
[C++] Implement cudaq::measure_handle#4409khalatepradnya wants to merge 10 commits intoNVIDIA:mainfrom
cudaq::measure_handle#4409khalatepradnya wants to merge 10 commits intoNVIDIA:mainfrom
Conversation
cudaq::measure_handle
khalatepradnya
added a commit
to khalatepradnya/cuda-quantum
that referenced
this pull request
May 1, 2026
Land 11 of 13 cold-review items in a single bundle. (C1 already
landed in a732fa55f3.) Each item is independently motivated and
tested; bundling preserves ordering invariants between them
(C4's data-layout gate must precede the I3/I4 test attribute
opt-in; C3's boundary recursion must precede I1's diagnostic
narrowing).
Critical items (correctness):
- C2a: explicit inline body for measure_handle::measure_handle()
so user TUs that mention a handle aggregate or array
(struct Holder { measure_handle h; }; measure_handle arr[N];)
emit a COMDAT definition. With = default the compiler is free
to inline the ctor away in every TU, leaving the bridge-emitted
link-time symbol unresolved.
- C2b: isBoundHandle in ConvertExpr.cpp now walks through
cc.compute_ptr / cc.cast to the base alloca; uninitialized
array-element / aggregate-member discrimination correctly
diagnoses, while a binding store anywhere in the same alloca
switches subsequent loads to the bound-handle path.
- C3: new containsMeasureHandleAtBoundary recurses into callable
signatures and FunctionTypes; ASTBridge.cpp hasMH switches to
it so std::function<void(measure_handle)> and
cudaq::qkernel<void(measure_handle)> parameters are caught at
the host-device boundary. The plain containsMeasureHandle stays
callable-blind for marshaling code that needs to know whether
the value being moved is itself a handle.
- C4: FuseCastCascade ptr -> int -> ptr fold gated on integer
hop width >= pointer width via llvm::DataLayout parsed from
the module's llvm.data_layout attribute. ptr -> i32 -> ptr on
a 64-bit target is no longer collapsed (the truncation is
observable). Bridge-emitted modules always carry the
attribute; hand-typed lit modules opt in via a module wrapper.
Important items (clarity / safety):
- I1: boundary diagnostic narrowed from isa<CXXMethodDecl> to
CXXMethodDecl + OO_Call so non-call-operator __qpu__ members
follow the silent-demotion path rather than getting diagnosed
as functor entry-point violations.
- I2: silently-demoted free __qpu__ functions (handle in
signature, not a functor) now get an aborting host stub at
the user's mangled name. The stub calls
__nvqpp_measureHandleHostBoundaryAbort (added in
runtime/cudaq/platform/quantum_platform.cpp), which prints a
clear diagnostic and aborts. Replaces what was previously an
unresolved-symbol link error with a runtime error pointing at
the cause.
- I3: handle_array_rw in qir_api_measure_handle.qke drops
cudaq-entrypoint, matching the silent-demotion shape that
bridge-produced IR for that signature actually has.
- I4: convert-to-qir-api -canonicalize end-to-end test locks the
positive Result* round-trip fold + the narrow-int negative
across pipeline stages, guarding against regressions in either
the conversion pass's pattern set or the canonicalizer's
fixpoint behavior.
Suggestions (defensive):
- S1: is_trivially_copyable_v + is_standard_layout_v
static_asserts on measure_handle pin the host-side ABI shape
the bridge marshaling code assumes.
- S2: to_integer in ConvertExpr.cpp loads if the argument
arrives as cc.ptr<cc.stdvec<...>>, before the existing
handle-stdvec discriminate insertion. Defensive: every C++
shape exercised in tests already lvalue-to-rvalue lowers to
the value form.
- S3: FuseCastCascade doc comment in CCOps.cpp carries the
NVQIR-profile rationale; cast_fold.qke header now references
it instead of duplicating.
- S4: to_bools(const std::vector<measure_handle>&) overload
remains under #ifndef CUDAQ_LIBRARY_MODE; mode-asymmetry
rationale comment added inline.
Lit sanity post-polish: AST-Quake + AST-error + Transforms =
304/306 pass + 2 pre-existing XFAIL (unrelated). The two new
tests (measure_handle_silent_demotion.cpp,
qir_api_measure_handle_e2e.qke) lock the I2 and I4 behavior in
place. measure_handle_to_integer.cpp brought into git from the
pre-existing working tree; it covers PR-3b-scoped to_integer
auto-discriminate plumbing.
Signed-off-by: Pradnya Khalate <pkhalate@nvidia.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
CI Summary — ❌ failedRun #25349882587 · trigger ❌ Failed or cancelled
Top-level jobs (13)
⏩ Skipped jobs (7) — intentionally skipped on PR builds; run on merge_group / workflow_dispatch
All sub-jobs (50) — every matrix leg, with links
|
| Required check | Status | Link |
|---|---|---|
| Build and test (amd64, clang16, openmpi) / Dev environment (Debug) | ✅ success | view |
| Build and test (amd64, clang16, openmpi) / Dev environment (Python) | ✅ success | view |
| Build and test (amd64, gcc11, openmpi) / Dev environment (Debug) | ⛔ cancelled | view |
| Build and test (amd64, gcc11, openmpi) / Dev environment (Python) | ✅ success | view |
| Build and test (amd64, gcc12, openmpi) / Dev environment (Debug) | ⛔ cancelled | view |
| Build and test (amd64, gcc12, openmpi) / Dev environment (Python) | ✅ success | view |
| Build and test (arm64, clang16, openmpi) / Dev environment (Debug) | ⛔ cancelled | view |
| Build and test (arm64, clang16, openmpi) / Dev environment (Python) | ✅ success | view |
khalatepradnya
added a commit
to khalatepradnya/cuda-quantum
that referenced
this pull request
May 1, 2026
Introduce `using measure_result = measure_handle;` in MLIR mode (library mode keeps the legacy class) so existing source names get the new deferred-measurement semantics without generating host wrappers for device-only handle signatures. This is the Option C direction confirmed in the 2026-04-30 runtime sync; rationale and the rejected alternatives live in `.cursor/measure-handle-rename-evaluation.md`. Library-mode CMake routing for `unittests/` (`add_compile_definitions (CUDAQ_LIBRARY_MODE)`) keeps host-side overloads in `qpe_ftqc.cpp` and other measurement-using GTests that cannot be intercepted by the bridge. Temporary workaround until library mode itself is removed. CI green-up across the bridge, marshalling, ODS verifiers, and the expand-measurements / QIR conversion test suites for the new types. Squashed from four `*`-prefixed WIP commits per PR NVIDIA#4409 history cleanup; original SHAs: d84f6d8, 8fbe92d, 0e05f4c, ec5eac5. Signed-off-by: Pradnya Khalate <pkhalate@nvidia.com>
khalatepradnya
added a commit
to khalatepradnya/cuda-quantum
that referenced
this pull request
May 1, 2026
Land 11 of 13 cold-review items in a single bundle. (C1 already
landed in a732fa55f3.) Each item is independently motivated and
tested; bundling preserves ordering invariants between them
(C4's data-layout gate must precede the I3/I4 test attribute
opt-in; C3's boundary recursion must precede I1's diagnostic
narrowing).
Critical items (correctness):
- C2a: explicit inline body for measure_handle::measure_handle()
so user TUs that mention a handle aggregate or array
(struct Holder { measure_handle h; }; measure_handle arr[N];)
emit a COMDAT definition. With = default the compiler is free
to inline the ctor away in every TU, leaving the bridge-emitted
link-time symbol unresolved.
- C2b: isBoundHandle in ConvertExpr.cpp now walks through
cc.compute_ptr / cc.cast to the base alloca; uninitialized
array-element / aggregate-member discrimination correctly
diagnoses, while a binding store anywhere in the same alloca
switches subsequent loads to the bound-handle path.
- C3: new containsMeasureHandleAtBoundary recurses into callable
signatures and FunctionTypes; ASTBridge.cpp hasMH switches to
it so std::function<void(measure_handle)> and
cudaq::qkernel<void(measure_handle)> parameters are caught at
the host-device boundary. The plain containsMeasureHandle stays
callable-blind for marshaling code that needs to know whether
the value being moved is itself a handle.
- C4: FuseCastCascade ptr -> int -> ptr fold gated on integer
hop width >= pointer width via llvm::DataLayout parsed from
the module's llvm.data_layout attribute. ptr -> i32 -> ptr on
a 64-bit target is no longer collapsed (the truncation is
observable). Bridge-emitted modules always carry the
attribute; hand-typed lit modules opt in via a module wrapper.
Important items (clarity / safety):
- I1: boundary diagnostic narrowed from isa<CXXMethodDecl> to
CXXMethodDecl + OO_Call so non-call-operator __qpu__ members
follow the silent-demotion path rather than getting diagnosed
as functor entry-point violations.
- I2: silently-demoted free __qpu__ functions (handle in
signature, not a functor) now get an aborting host stub at
the user's mangled name. The stub calls
__nvqpp_measureHandleHostBoundaryAbort (added in
runtime/cudaq/platform/quantum_platform.cpp), which prints a
clear diagnostic and aborts. Replaces what was previously an
unresolved-symbol link error with a runtime error pointing at
the cause.
- I3: handle_array_rw in qir_api_measure_handle.qke drops
cudaq-entrypoint, matching the silent-demotion shape that
bridge-produced IR for that signature actually has.
- I4: convert-to-qir-api -canonicalize end-to-end test locks the
positive Result* round-trip fold + the narrow-int negative
across pipeline stages, guarding against regressions in either
the conversion pass's pattern set or the canonicalizer's
fixpoint behavior.
Suggestions (defensive):
- S1: is_trivially_copyable_v + is_standard_layout_v
static_asserts on measure_handle pin the host-side ABI shape
the bridge marshaling code assumes.
- S2: to_integer in ConvertExpr.cpp loads if the argument
arrives as cc.ptr<cc.stdvec<...>>, before the existing
handle-stdvec discriminate insertion. Defensive: every C++
shape exercised in tests already lvalue-to-rvalue lowers to
the value form.
- S3: FuseCastCascade doc comment in CCOps.cpp carries the
NVQIR-profile rationale; cast_fold.qke header now references
it instead of duplicating.
- S4: to_bools(const std::vector<measure_handle>&) overload
remains under #ifndef CUDAQ_LIBRARY_MODE; mode-asymmetry
rationale comment added inline.
Lit sanity post-polish: AST-Quake + AST-error + Transforms =
304/306 pass + 2 pre-existing XFAIL (unrelated). The two new
tests (measure_handle_silent_demotion.cpp,
qir_api_measure_handle_e2e.qke) lock the I2 and I4 behavior in
place. measure_handle_to_integer.cpp brought into git from the
pre-existing working tree; it covers PR-3b-scoped to_integer
auto-discriminate plumbing.
Signed-off-by: Pradnya Khalate <pkhalate@nvidia.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Signed-off-by: Pradnya Khalate <pkhalate@nvidia.com>
khalatepradnya
added a commit
to khalatepradnya/cuda-quantum
that referenced
this pull request
May 1, 2026
PR NVIDIA#4409 widened `quake.mz` ODS to return `!cc.measure_handle` (scalar) and `!cc.stdvec<!cc.measure_handle>` (vector). The state-init regression test still asserted the legacy `!quake.measure` spelling in its MLIR FileCheck blocks; update the four affected CHECK lines. Signed-off-by: Pradnya Khalate <pkhalate@nvidia.com> Co-authored-by: Cursor <cursoragent@cursor.com> Signed-off-by: Pradnya Khalate <pkhalate@nvidia.com>
khalatepradnya
added a commit
to khalatepradnya/cuda-quantum
that referenced
this pull request
May 1, 2026
PR NVIDIA#4409 made MLIR-mode `mz` return `cudaq::measure_handle`, so `__qpu__ auto operator()()` now deduces a `measure_handle` return type. The boundary diagnostic introduced in the polish-round (`OO_Call` + entry-point) then rejects the kernel. Spell the return type as `bool` so the bridge inserts `quake.discriminate` at the bool-typed return boundary, matching the fix already used in `conditional_run.cpp` and `sample_to_run_migration.cpp`. Signed-off-by: Pradnya Khalate <pkhalate@nvidia.com> Co-authored-by: Cursor <cursoragent@cursor.com> Signed-off-by: Pradnya Khalate <pkhalate@nvidia.com>
khalatepradnya
added a commit
to khalatepradnya/cuda-quantum
that referenced
this pull request
May 1, 2026
PR NVIDIA#4409 made MLIR-mode `mz(qview)` return `std::vector<measure_handle>`, so `__qpu__ auto kernel2()` now deduces a handle-bearing return. `Marshal.cpp::hasLegalType` filters such kernels out of `GenKernelExecution`'s worklist, the `.run` companion is never generated, and `cudaq::run(1000, kernel2)` fails at runtime with `runnable kernel ... is not present`. Spell the return as `std::vector<bool>` and wrap `mz(q)` with `cudaq::to_bools(...)` so the kernel signature stays in the host-visible subset. Host-side `result[0]` is now a `bool` and `static_cast<int>(result[0])` is well-defined; previously it would have routed through `measure_handle::operator bool()`, which `std::abort()`s in MLIR mode. The `[Begin Run0]` / `[End Run0]` Sphinx literalinclude markers are unchanged. Signed-off-by: Pradnya Khalate <pkhalate@nvidia.com> Co-authored-by: Cursor <cursoragent@cursor.com> Signed-off-by: Pradnya Khalate <pkhalate@nvidia.com>
khalatepradnya
added a commit
to khalatepradnya/cuda-quantum
that referenced
this pull request
May 1, 2026
PR NVIDIA#4409 review (High): the conditional-store shape cudaq::measure_handle h; if (cond) h = mz(q); bool b = h; was silently accepted along the cond=false path because the prior `isBoundHandle` walk treated *any* store reaching the scalar-handle alloca as binding, regardless of CFG dominance. The IR-level result was a `quake.discriminate` over an uninitialized `i64` payload, which downstream lowering casts to a `Result*`. Tighten the scalar-handle alloca path to require that the store dominate the load (`mlir::DominanceInfo`, computed lazily once per coercion site and shared across recursion levels). The aggregate / array path stays coarse for the same reason it was coarse before (per-element/per-member tracking needs reasoning about SSA `cc.compute_ptr` indices). Trade-off documented inline: the all-paths-store shape if (c) h = mz(q1); else h = mz(q2); bool b = h; currently fails the dominance check too -- a false positive that the user can rewrite around with one bind, e.g. `auto h = mz(q);`. A proper definite-assignment dataflow pass would accept that shape; tracked as a followup along with the function-arg case the reviewer flagged. Test: new `ConditionalStoreUnbound` case in `measure_handle_unbound_copy.cpp` locks the diagnostic on the single-branch shape; `ConditionalStoreAfterBind` keeps the already-bound-then-rewritten shape passing so dominance does not regress legitimate code. Signed-off-by: Pradnya Khalate <pkhalate@nvidia.com> Co-authored-by: Cursor <cursoragent@cursor.com> Signed-off-by: Pradnya Khalate <pkhalate@nvidia.com>
khalatepradnya
added a commit
to khalatepradnya/cuda-quantum
that referenced
this pull request
May 1, 2026
PR NVIDIA#4409 review (Medium): the bridge silently inserted a `quake.discriminate` when the user wrote `cudaq::to_integer(mz(qvec))`, accepting code the spec (`measure_handle.bs` §C++ API L96) explicitly mandates be migrated to `cudaq::to_integer(cudaq::to_bools(mz(qvec)))`. The shim hid the new bulk-discrimination API and let the spec-mandated migration regress in user code. Replace the auto-insert at the `to_integer` call lowering with a spec-named frontend diagnostic. Push a placeholder `i64 zero` to keep the value stack balanced (same pattern as the unbound-handle diagnostic) so the outer `TraverseStmt` does not stack a generic "statement not supported" error on top. Migrate the two in-tree C++ callers to the explicit form: - `targettests/execution/to_integer.cpp:25` - `targettests/execution/measurement_cleanup.cpp:44` (Pre-existing 4-space indent drift on the latter line is left as is to avoid unrelated whitespace churn.) Reframe `test/AST-Quake/measure_handle_to_integer.cpp`: the `ToIntegerDirect` case asserted the auto-insert behavior we are removing -- delete it. Keep `ToIntegerExplicit` as the positive IR-shape lock and point at the negative path. Add the matching negative case `ToIntegerDirectRejected` to `test/AST-error/measure_handle.cpp` so the diagnostic text is locked. `unittests/integration/qpe_ftqc.cpp` is unaffected: the `unittests/` build defines `CUDAQ_LIBRARY_MODE`, so its `to_integer(mz(...))` call uses the host-side overload at `runtime/cudaq/qis/qubit_qis.h:640` and never reaches the bridge. Python callers (`test_assignments.py`, `test_to_integer.py`) stay on the auto-insert path until PR 3c (Python frontend) applies the equivalent change. Signed-off-by: Pradnya Khalate <pkhalate@nvidia.com> Co-authored-by: Cursor <cursoragent@cursor.com> Signed-off-by: Pradnya Khalate <pkhalate@nvidia.com>
khalatepradnya
added a commit
to khalatepradnya/cuda-quantum
that referenced
this pull request
May 1, 2026
Land 11 of 13 cold-review items in a single bundle. (C1 already
landed in a732fa55f3.) Each item is independently motivated and
tested; bundling preserves ordering invariants between them
(C4's data-layout gate must precede the I3/I4 test attribute
opt-in; C3's boundary recursion must precede I1's diagnostic
narrowing).
Critical items (correctness):
- C2a: explicit inline body for measure_handle::measure_handle()
so user TUs that mention a handle aggregate or array
(struct Holder { measure_handle h; }; measure_handle arr[N];)
emit a COMDAT definition. With = default the compiler is free
to inline the ctor away in every TU, leaving the bridge-emitted
link-time symbol unresolved.
- C2b: isBoundHandle in ConvertExpr.cpp now walks through
cc.compute_ptr / cc.cast to the base alloca; uninitialized
array-element / aggregate-member discrimination correctly
diagnoses, while a binding store anywhere in the same alloca
switches subsequent loads to the bound-handle path.
- C3: new containsMeasureHandleAtBoundary recurses into callable
signatures and FunctionTypes; ASTBridge.cpp hasMH switches to
it so std::function<void(measure_handle)> and
cudaq::qkernel<void(measure_handle)> parameters are caught at
the host-device boundary. The plain containsMeasureHandle stays
callable-blind for marshaling code that needs to know whether
the value being moved is itself a handle.
- C4: FuseCastCascade ptr -> int -> ptr fold gated on integer
hop width >= pointer width via llvm::DataLayout parsed from
the module's llvm.data_layout attribute. ptr -> i32 -> ptr on
a 64-bit target is no longer collapsed (the truncation is
observable). Bridge-emitted modules always carry the
attribute; hand-typed lit modules opt in via a module wrapper.
Important items (clarity / safety):
- I1: boundary diagnostic narrowed from isa<CXXMethodDecl> to
CXXMethodDecl + OO_Call so non-call-operator __qpu__ members
follow the silent-demotion path rather than getting diagnosed
as functor entry-point violations.
- I2: silently-demoted free __qpu__ functions (handle in
signature, not a functor) now get an aborting host stub at
the user's mangled name. The stub calls
__nvqpp_measureHandleHostBoundaryAbort (added in
runtime/cudaq/platform/quantum_platform.cpp), which prints a
clear diagnostic and aborts. Replaces what was previously an
unresolved-symbol link error with a runtime error pointing at
the cause.
- I3: handle_array_rw in qir_api_measure_handle.qke drops
cudaq-entrypoint, matching the silent-demotion shape that
bridge-produced IR for that signature actually has.
- I4: convert-to-qir-api -canonicalize end-to-end test locks the
positive Result* round-trip fold + the narrow-int negative
across pipeline stages, guarding against regressions in either
the conversion pass's pattern set or the canonicalizer's
fixpoint behavior.
Suggestions (defensive):
- S1: is_trivially_copyable_v + is_standard_layout_v
static_asserts on measure_handle pin the host-side ABI shape
the bridge marshaling code assumes.
- S2: to_integer in ConvertExpr.cpp loads if the argument
arrives as cc.ptr<cc.stdvec<...>>, before the existing
handle-stdvec discriminate insertion. Defensive: every C++
shape exercised in tests already lvalue-to-rvalue lowers to
the value form.
- S3: FuseCastCascade doc comment in CCOps.cpp carries the
NVQIR-profile rationale; cast_fold.qke header now references
it instead of duplicating.
- S4: to_bools(const std::vector<measure_handle>&) overload
remains under #ifndef CUDAQ_LIBRARY_MODE; mode-asymmetry
rationale comment added inline.
Lit sanity post-polish: AST-Quake + AST-error + Transforms =
304/306 pass + 2 pre-existing XFAIL (unrelated). The two new
tests (measure_handle_silent_demotion.cpp,
qir_api_measure_handle_e2e.qke) lock the I2 and I4 behavior in
place. measure_handle_to_integer.cpp brought into git from the
pre-existing working tree; it covers PR-3b-scoped to_integer
auto-discriminate plumbing.
Signed-off-by: Pradnya Khalate <pkhalate@nvidia.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
khalatepradnya
added a commit
to khalatepradnya/cuda-quantum
that referenced
this pull request
May 1, 2026
PR NVIDIA#4409 widened `quake.mz` ODS to return `!cc.measure_handle` (scalar) and `!cc.stdvec<!cc.measure_handle>` (vector). The state-init regression test still asserted the legacy `!quake.measure` spelling in its MLIR FileCheck blocks; update the four affected CHECK lines. Signed-off-by: Pradnya Khalate <pkhalate@nvidia.com> Co-authored-by: Cursor <cursoragent@cursor.com>
khalatepradnya
added a commit
to khalatepradnya/cuda-quantum
that referenced
this pull request
May 1, 2026
PR NVIDIA#4409 made MLIR-mode `mz` return `cudaq::measure_handle`, so `__qpu__ auto operator()()` now deduces a `measure_handle` return type. The boundary diagnostic introduced in the polish-round (`OO_Call` + entry-point) then rejects the kernel. Spell the return type as `bool` so the bridge inserts `quake.discriminate` at the bool-typed return boundary, matching the fix already used in `conditional_run.cpp` and `sample_to_run_migration.cpp`. Signed-off-by: Pradnya Khalate <pkhalate@nvidia.com> Co-authored-by: Cursor <cursoragent@cursor.com>
khalatepradnya
added a commit
to khalatepradnya/cuda-quantum
that referenced
this pull request
May 1, 2026
PR NVIDIA#4409 made MLIR-mode `mz(qview)` return `std::vector<measure_handle>`, so `__qpu__ auto kernel2()` now deduces a handle-bearing return. `Marshal.cpp::hasLegalType` filters such kernels out of `GenKernelExecution`'s worklist, the `.run` companion is never generated, and `cudaq::run(1000, kernel2)` fails at runtime with `runnable kernel ... is not present`. Spell the return as `std::vector<bool>` and wrap `mz(q)` with `cudaq::to_bools(...)` so the kernel signature stays in the host-visible subset. Host-side `result[0]` is now a `bool` and `static_cast<int>(result[0])` is well-defined; previously it would have routed through `measure_handle::operator bool()`, which `std::abort()`s in MLIR mode. The `[Begin Run0]` / `[End Run0]` Sphinx literalinclude markers are unchanged. Signed-off-by: Pradnya Khalate <pkhalate@nvidia.com> Co-authored-by: Cursor <cursoragent@cursor.com>
khalatepradnya
added a commit
to khalatepradnya/cuda-quantum
that referenced
this pull request
May 1, 2026
PR NVIDIA#4409 review (High): the conditional-store shape cudaq::measure_handle h; if (cond) h = mz(q); bool b = h; was silently accepted along the cond=false path because the prior `isBoundHandle` walk treated *any* store reaching the scalar-handle alloca as binding, regardless of CFG dominance. The IR-level result was a `quake.discriminate` over an uninitialized `i64` payload, which downstream lowering casts to a `Result*`. Tighten the scalar-handle alloca path to require that the store dominate the load (`mlir::DominanceInfo`, computed lazily once per coercion site and shared across recursion levels). The aggregate / array path stays coarse for the same reason it was coarse before (per-element/per-member tracking needs reasoning about SSA `cc.compute_ptr` indices). Trade-off documented inline: the all-paths-store shape if (c) h = mz(q1); else h = mz(q2); bool b = h; currently fails the dominance check too -- a false positive that the user can rewrite around with one bind, e.g. `auto h = mz(q);`. A proper definite-assignment dataflow pass would accept that shape; tracked as a followup along with the function-arg case the reviewer flagged. Test: new `ConditionalStoreUnbound` case in `measure_handle_unbound_copy.cpp` locks the diagnostic on the single-branch shape; `ConditionalStoreAfterBind` keeps the already-bound-then-rewritten shape passing so dominance does not regress legitimate code. Signed-off-by: Pradnya Khalate <pkhalate@nvidia.com> Co-authored-by: Cursor <cursoragent@cursor.com>
khalatepradnya
added a commit
to khalatepradnya/cuda-quantum
that referenced
this pull request
May 1, 2026
PR NVIDIA#4409 review (Medium): the bridge silently inserted a `quake.discriminate` when the user wrote `cudaq::to_integer(mz(qvec))`, accepting code the spec (`measure_handle.bs` §C++ API L96) explicitly mandates be migrated to `cudaq::to_integer(cudaq::to_bools(mz(qvec)))`. The shim hid the new bulk-discrimination API and let the spec-mandated migration regress in user code. Replace the auto-insert at the `to_integer` call lowering with a spec-named frontend diagnostic. Push a placeholder `i64 zero` to keep the value stack balanced (same pattern as the unbound-handle diagnostic) so the outer `TraverseStmt` does not stack a generic "statement not supported" error on top. Migrate the two in-tree C++ callers to the explicit form: - `targettests/execution/to_integer.cpp:25` - `targettests/execution/measurement_cleanup.cpp:44` (Pre-existing 4-space indent drift on the latter line is left as is to avoid unrelated whitespace churn.) Reframe `test/AST-Quake/measure_handle_to_integer.cpp`: the `ToIntegerDirect` case asserted the auto-insert behavior we are removing -- delete it. Keep `ToIntegerExplicit` as the positive IR-shape lock and point at the negative path. Add the matching negative case `ToIntegerDirectRejected` to `test/AST-error/measure_handle.cpp` so the diagnostic text is locked. `unittests/integration/qpe_ftqc.cpp` is unaffected: the `unittests/` build defines `CUDAQ_LIBRARY_MODE`, so its `to_integer(mz(...))` call uses the host-side overload at `runtime/cudaq/qis/qubit_qis.h:640` and never reaches the bridge. Python callers (`test_assignments.py`, `test_to_integer.py`) stay on the auto-insert path until PR 3c (Python frontend) applies the equivalent change. Signed-off-by: Pradnya Khalate <pkhalate@nvidia.com> Co-authored-by: Cursor <cursoragent@cursor.com>
This was referenced May 2, 2026
Widen the `expand-measurements` pass to lower vector-form `quake.mz`/`mx`/`my` whose result is `!cc.stdvec<!cc.measure_handle>`, mirroring the legacy `!cc.stdvec<!quake.measure>` path. Adds a secondary `ExpandStdvecHandleDiscriminate` pattern for the post-SSA-boundary case where `quake.discriminate` consumes a handle vector that the bridge has stored to / loaded from memory. Co-authored-by: Cursor <cursoragent@cursor.com> Signed-off-by: Pradnya Khalate <pkhalate@nvidia.com>
Implements the C++ runtime half of the measure_handle spec (cudaq-spec/proposals/measure_handle.bs). measure_handle is the new return type of mz / mx / my; the legacy `measure_result` spelling is preserved via Option C — under MLIR mode `measure_result` is a `using` alias for `measure_handle`, so existing callers compile unchanged but gain the new handle semantics. Library mode keeps the legacy `class measure_result` block untouched, including the `__nvqpp__MeasureResultBoolConversion` adaptive-feedback hook. Spec invariant: mz / mx / my are __qpu__-only entry points (Kernel Signature Rule). MLIR-mode inline bodies trap with `std::abort()` so host-scope misuse fails loudly instead of computing on a meaningless value. measure_handle::operator bool() also aborts: the bridge intercepts every legitimate bool coercion at AST time and emits quake.discriminate, so reaching the host body means a bridge interception path was missed and the program would otherwise compute on a meaningless `bool`. Files - runtime/cudaq/qis/measure_handle.h (new): class declaration gated by `#ifndef CUDAQ_LIBRARY_MODE`. Tag-dispatched `measure_handle(handle_index, idx)` constructor reserved for runtime use; default constructor produces an unbound handle whose `index` carries a `numeric_limits<int64_t>::max()` sentinel. `static_assert`s pin the i64 payload width / trivially copyable / standard layout invariants the bridge marshalling relies on. - runtime/cudaq/qis/execution_manager.h: under `#ifdef CUDAQ_LIBRARY_MODE` the existing `class measure_result` is unchanged; under `#else` the previous `using measure_result = bool` becomes `using measure_result = measure_handle`. - runtime/cudaq/qis/qubit_qis.h: MLIR-mode bodies of `measureZ` / `measureX` / `measureY` and the bulk `to_bools` overload trap with `std::abort()`. Library-mode behavior is preserved verbatim. Bridge wiring, byte-size machinery, QIR conversion gap fix, boundary verifier, test migration, and docs updates follow as separate commits in this PR. Co-authored-by: Cursor <cursoragent@cursor.com> Signed-off-by: Pradnya Khalate <pkhalate@nvidia.com>
…nery, and QIR conversion
Consolidates the bridge-side, type-system, and QIR-conversion work for
the measure_handle PR stack. The runtime API arrived in the previous
commit; this commit makes the AST bridge produce !cc.measure_handle
SSA values, teaches the verifier to reject handles at the host-device
boundary, fills the byte-size and marshaling gaps, and patches the QIR
conversion so handle pointer/stdvec ops survive --convert-to-qir-api.
Type-system support
- include/cudaq/Optimizer/Dialect/CC/CCTypes.h, CCTypes.cpp:
containsMeasureHandle (value-shape check) and
containsMeasureHandleAtBoundary (recursive into callable signatures
and bare function types). The boundary variant is required so
`std::function<void(measure_handle)>` and `cudaq::qkernel<...>`
parameters are caught at entry-point classification.
- lib/Optimizer/Dialect/CC/CCOps.cpp:
MeasureHandleType case in getByteSizeOfType returning a constant
8 bytes, the IR-mode width of a class with a single std::int64_t
field. Without this, a pure-device kernel returning
std::vector<measure_handle> aborts in ConvertStmt.cpp with
"unhandled vector element type" because __nvqpp_vectorCopyCtor
cannot get a constant element size for the heap-copy prologue.
AST bridge
- lib/Frontend/nvqpp/ConvertType.cpp, ConvertDecl.cpp:
cudaq::measure_handle maps to !cc.measure_handle;
std::vector<measure_handle> is recognised in measurement
register-name handling.
- lib/Frontend/nvqpp/ConvertExpr.cpp: the central rewire.
* mz / mx / my emit !cc.measure_handle (scalar) or
!cc.stdvec<!cc.measure_handle> (range/variadic) directly.
* CK_UserDefinedConversion at measure_handle::operator bool inserts
quake.discriminate at every spec-mandated bool-coercion site.
* The discriminate-insertion path runs an isBoundHandle check that
walks through cc.compute_ptr / cc.cast to the base alloca and,
on the scalar-handle alloca shape, requires that a binding store
dominate the load (mlir::DominanceInfo, computed lazily once per
coercion site). Conditional-store shapes that previously emitted
a discriminate over an uninitialized i64 payload now diagnose.
* cudaq::to_bools is intercepted by name and lowered to a vectorized
quake.discriminate on the entire handle stdvec; it is the bulk
counterpart to operator bool.
* cudaq::to_integer rejects vector<measure_handle> with a
spec-named diagnostic (per measure_handle.bs §C++ API): the
silent auto-insert that hid the bulk-discrimination API is gone.
* measure_handle copy/move construction and operator= are
intercepted as value-typed aliasing of the sub-i64 stack value;
chained `h3 = h2 = h;` works because the dispatch drops the
callee value the visitor pushed.
* default-construct produces only the storage slot (cc.alloca);
VisitVarDecl binds it directly so any read at a discriminate
site is statically diagnosed by the unbound-handle path.
- lib/Frontend/nvqpp/ASTBridge.cpp: __qpu__ entry-point classification
rejects functor operator() shapes whose signature transitively
mentions measure_handle, the only disambiguable spec violation at
AST time.
Marshaling and QIR conversion
- lib/Optimizer/Builder/Marshal.cpp:
hasLegalType extends the entry-point predicate to reject
measure_handle alongside qubit-typed parameters/results.
lookupHostEntryPointFunc early-returns for device-only kernels
whose signature cannot cross the host boundary, so the host-side
rewriter skips them entirely.
- lib/Optimizer/CodeGen/ConvertToQIRAPI.cpp:
The TypeConverter rewrites !cc.measure_handle to i64, but
cc.compute_ptr / cc.stdvec_data / cc.stdvec_init / cc.stdvec_size
carrying handle pointer or stdvec types had no patterns and no
dynamic-legality predicates, so the framework left them
legal-by-default and inserted unrealized_conversion_casts that
applyPartialConversion could not resolve. Add OpInterfacePattern
instantiations and extend the legality predicate so all four ops
participate in the same operand/result-type rewrite the existing
CC pointer ops already use.
LLVM 22 idiom
- All 15 op-creation sites added by this commit in ConvertExpr.cpp
use the LLVM 22 form Op::create(builder, loc, ...). Two of those
sites are arith::ConstantIntOp poison-result fallbacks (the
unbound-handle and to_integer-rejection paths) and additionally
use the LLVM 22 (builder, loc, type, value) signature.
Tests, runtime helpers, and docs follow as the next commit in this
PR. The dialect type itself (PR NVIDIA#4403) and the QIR conversion's
TypeConverter entry for !cc.measure_handle (PR NVIDIA#4404) are already on
main.
Co-authored-by: Cursor <cursoragent@cursor.com>
Signed-off-by: Pradnya Khalate <pkhalate@nvidia.com>
Closes the test surface for the measure_handle PR stack: every existing
AST-Quake / Transforms / targettests / docs site that observed the
old `!quake.measure` SSA shape now matches the new `!cc.measure_handle`
shape, and the new test files locking down spec-mandated behavior land
alongside.
New tests
- test/AST-Quake/measure_handle.cpp: scalar-handle bridge shape lock,
including the bind-store / load-with-dominance round-trip and the
`quake.discriminate` insertion at every CK_UserDefinedConversion site
the bridge intercepts.
- test/AST-Quake/measure_handle_qir.cpp: end-to-end QIR conversion
shape for a kernel whose handle escapes a basic block, exercising
the cc.compute_ptr / cc.stdvec_data / cc.stdvec_init / cc.stdvec_size
patterns added to ConvertToQIRAPI in the previous commit.
- test/AST-error/measure_handle.cpp: spec-named diagnostics for the
unbound-handle path, the host-boundary `std::vector<measure_handle>`
rejection, and the `cudaq::to_integer(mz(qvec))` rejection.
- test/Transforms/qir_api_measure_handle.qke: lit-replay of the QIR
conversion's measure_handle paths against hand-rolled IR, so a
bridge-side regression cannot mask a conversion-side regression.
- test/Transforms/cast_fold.qke: extends the cast-fold checks to the
no-op case where a discriminate result is immediately re-cast.
- test/AST-Quake/qir_profiles.cpp: rename of base_profile-1.cpp; the
rename is the only PR-3b contribution to that file -- LLVM 22 had
already removed the `read_result` lines this PR originally targeted.
CHECK migrations (all unchanged shape, just the type string)
- 33 test/AST-Quake/*.cpp + 11 targettests/{Kernel,execution}/*.cpp +
3 test/Transforms/*.qke + 3 docs/sphinx/examples/cpp/**/*.cpp
switched from `!quake.measure` to `!cc.measure_handle` and, where
the bridge inserts the bound-handle round-trip, picked up the
matching `cc.alloca` / `cc.store` / `cc.load` CHECK lines.
LLVM 22 conflict resolutions
- test/AST-Quake/bug_3270.cpp: layered PR 3b's type change on LLVM 22's
loosened `result%{{.*}}` regex form; the unused SSA captures from
the pre-LLVM-22 form are dropped (no downstream CHECK referenced
them).
- test/AST-Quake/if.cpp: PR 3b's `kernel_short_circuit_or` CHECK form
is preserved verbatim (alloca/store/load + discriminate + cmpi ne
against `arith.constant false`); the `arith.constant false` CHECK
line LLVM 22 dropped is re-added because PR 3b's bool coercion still
emits a `cmpi ne, x, false`.
- test/AST-Quake/to_qir.cpp: layered PR 3b's load-delay intent (move
the `load i1, ptr %VAL_9` from before the second mz to inside the
successor block of `br i1 %VAL_12`) on top of LLVM 22's opaque-ptr
form; the typed-pointer / `bitcast %Result*` CHECK form from the
pre-LLVM-22 base is dropped, and basic-block label captures stay on
LLVM 22's loose `{{[0-9]+}}` form.
- test/AST-Quake/qir_profiles.cpp: PR 3b's only intent vs the OLD base
was removing 6 `__quantum__qis__read_result__body` CHECK lines;
LLVM 22 had already removed the same six lines (and additionally
added an `array_record_output` line and the typed-to-opaque pointer
conversion). The rename is the remaining PR 3b contribution.
- test/AST-Quake/cudaq_run.cpp, test/AST-Quake/qalloc_initialization.cpp,
test/Transforms/qir_base_profile.qke: 3-way merge resolved cleanly;
LLVM 22's `!llvm.ptr` / opaque-pointer churn lives alongside PR 3b's
type-name updates without conflict.
Dropped
- unittests/CMakeLists.txt: PR 3b's directory-scope
`add_compile_definitions(CUDAQ_LIBRARY_MODE)` is already on main as
of PR NVIDIA#4427 (which split out the same workaround). PR 3b's wording
reword would be a drive-by; main's wording stays.
Docs and examples
- docs/sphinx/examples/cpp/measuring_kernels.cpp,
docs/sphinx/examples/cpp/sample_to_run_migration.cpp,
docs/sphinx/examples/cpp/basics/mid_circuit_measurement.cpp:
reflect the spec-mandated `cudaq::to_bools(mz(...))` and
`bool` return-type form so the examples compile under measure_handle.
Co-authored-by: Cursor <cursoragent@cursor.com>
Signed-off-by: Pradnya Khalate <pkhalate@nvidia.com>
CI Summary (
|
| Job | Result |
|---|---|
binaries |
⏩ skipped |
build_and_test |
✅ success |
config_devdeps |
✅ success |
config_source_build |
⏩ skipped |
config_wheeldeps |
✅ success |
devdeps |
✅ success |
docker_image |
⏩ skipped |
gen_code_coverage |
⏩ skipped |
metadata |
✅ success |
python_metapackages |
⏩ skipped |
python_wheels |
⏩ skipped |
source_build |
⏩ skipped |
wheeldeps |
✅ success |
⏩ Skipped jobs (7) — intentionally skipped on PR builds; run on merge_group / workflow_dispatch
| Job |
|---|
binaries |
config_source_build |
docker_image |
gen_code_coverage |
python_metapackages |
python_wheels |
source_build |
All sub-jobs (40) — every matrix leg, with links
| Job | Status | Link |
|---|---|---|
| Build and test (amd64, llvm, openmpi) / Dev environment (Debug) | ✅ success | view |
| Build and test (amd64, llvm, openmpi) / Dev environment (Python) | ✅ success | view |
| Build and test (arm64, llvm, openmpi) / Dev environment (Debug) | ✅ success | view |
| Build and test (arm64, llvm, openmpi) / Dev environment (Python) | ✅ success | view |
| CI Summary | ❔ in_progress | view |
| Configure build (devdeps) | ✅ success | view |
| Configure build (source_build) | ⏩ skipped | view |
| Configure build (wheeldeps) | ✅ success | view |
| Create CUDA Quantum installer | ⏩ skipped | view |
| Create Docker images | ⏩ skipped | view |
| Create Python metapackages | ⏩ skipped | view |
| Create Python wheels | ⏩ skipped | view |
| Gen code coverage | ⏩ skipped | view |
| Load dependencies (amd64, gcc12) / Caching | ✅ success | view |
| Load dependencies (amd64, gcc12) / Finalize | ✅ success | view |
| Load dependencies (amd64, gcc12) / Metadata | ✅ success | view |
| Load dependencies (amd64, llvm) / Caching | ✅ success | view |
| Load dependencies (amd64, llvm) / Finalize | ✅ success | view |
| Load dependencies (amd64, llvm) / Metadata | ✅ success | view |
| Load dependencies (arm64, gcc12) / Caching | ✅ success | view |
| Load dependencies (arm64, gcc12) / Finalize | ✅ success | view |
| Load dependencies (arm64, gcc12) / Metadata | ✅ success | view |
| Load dependencies (arm64, llvm) / Caching | ✅ success | view |
| Load dependencies (arm64, llvm) / Finalize | ✅ success | view |
| Load dependencies (arm64, llvm) / Metadata | ✅ success | view |
| Load source build cache | ⏩ skipped | view |
| Load wheel dependencies (amd64, 12.6) / Caching | ✅ success | view |
| Load wheel dependencies (amd64, 12.6) / Finalize | ✅ success | view |
| Load wheel dependencies (amd64, 12.6) / Metadata | ✅ success | view |
| Load wheel dependencies (amd64, 13.0) / Caching | ✅ success | view |
| Load wheel dependencies (amd64, 13.0) / Finalize | ✅ success | view |
| Load wheel dependencies (amd64, 13.0) / Metadata | ✅ success | view |
| Load wheel dependencies (arm64, 12.6) / Caching | ✅ success | view |
| Load wheel dependencies (arm64, 12.6) / Finalize | ✅ success | view |
| Load wheel dependencies (arm64, 12.6) / Metadata | ✅ success | view |
| Load wheel dependencies (arm64, 13.0) / Caching | ✅ success | view |
| Load wheel dependencies (arm64, 13.0) / Finalize | ✅ success | view |
| Load wheel dependencies (arm64, 13.0) / Metadata | ✅ success | view |
| Prepare cache clean-up | ❔ in_progress | view |
| Retrieve PR info | ✅ success | view |
⚠️ Required checks (4/6) — 2 missing — declared in .github/required-checks.yml for push
| Required check | Status | Link |
|---|---|---|
| Build and test (amd64, llvm, openmpi) / Dev environment (Debug) | ✅ success | view |
| Build and test (amd64, llvm, openmpi) / Dev environment (Python) | ✅ success | view |
| Build and test (arm64, llvm, openmpi) / Dev environment (Debug) | ✅ success | view |
| Build and test (arm64, llvm, openmpi) / Dev environment (Python) | ✅ success | view |
| Build and test (amd64, gcc12, openmpi) / Dev environment (Debug) | ❔ missing | |
| Build and test (amd64, gcc12, openmpi) / Dev environment (Python) | ❔ missing |
Signed-off-by: Pradnya Khalate <pkhalate@nvidia.com>
…pkhalate/measure-handle-pr3b-cpp-frontend Signed-off-by: Pradnya Khalate <pkhalate@nvidia.com>
`measure_handle` bool-coercion used `mlir::DominanceInfo`, which can't see ops in the orphan region built by `cc::IfOp::create` for the operands of `&&`, `||`, and `?:` — so named handles like `if (result0 && result1)` were rejected as unbound. Replace with a structural ancestor walk that works on partially-built IR. Also refresh two stale `||` CHECK blocks (`kernel_short_circuit_or`, `HandleOr`) that the LLVM-22 canonicalizer broke by folding `cmpi ne %i1, false` to `%i1`. Add three regression patterns covering the orphan-region path. Co-authored-by: Cursor Agent <cursoragent@cursor.com> Signed-off-by: Pradnya Khalate <pkhalate@nvidia.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
measure_handleproposal!cc.measure_handlethroughConvertToQIRAPI#4404 (QIR conversion).NOTE: These all apply in MLIR mode only.
Library mode (
#ifdef CUDAQ_LIBRARY_MODE) keeps the legacyclass measure_resultblock inruntime/cudaq/qis/execution_manager.hverbatim, and is source-compatible.Breaking Changes
Type identity
cudaq::measure_resultis now ausingalias forcudaq::measure_handle. Existing source code that spellsmeasure_resultcontinues to compile, but the underlying class changed:measure_handleis not implicitly convertible toboolat host scope -operator bool()std::abort()s in MLIR mode. The bridge intercepts every legitimate bool coercion inside__qpu__and emitsquake.discriminate; reaching the host body means a coercion site escaped the bridge.measure_handleis not integer-constructible. The legacymeasure_result(uint8_t)form no longer compiles; useboolfor host-side bit storage.decltype(mz(q))resolves tomeasure_handle, notbool. Visible toauto, template specialization, andstd::is_samechecks.Return type of
mz/mx/mymz(q)/mx(q)/my(q)returnmeasure_handleinstead ofbool.mz(qvec)/mx(qvec)/my(qvec)returnstd::vector<measure_handle>instead ofstd::vector<bool>(which is what the priorusing measure_result = boolalias resolved to).Required source migrations
__qpu__ auto kernel() { return mz(...); }-autonow deducesmeasure_handle(orstd::vector<measure_handle>), which the host-device boundary rejects (see below). Spell the return type explicitly asbool/std::vector<bool>; for the vector case, also wrap withcudaq::to_bools(...).cudaq::to_integer(mz(qvec))- the bridge rejects this shim with a spec-named frontend diagnostic. The newcudaq::to_bools(std::vector<measure_handle>)API is the spec-mandated bulk discrimination surface.mz/mx/my-calling these outside a__qpu__kernel nowstd::abort()s instead of silently returning a meaninglessbool. Move the call into a kernel.New compile-time errors
__qpu__kernels cannot havemeasure_handle(transitively, includingstd::vector<measure_handle>,std::function<void(measure_handle)>,cudaq::qkernel<void(measure_handle)>)in parameter or return position. Diagnostic:
measure_handle cannot cross the host-device boundary; entry-point kernels must discriminate first.operator()-__qpu__kernels with handle-bearing signatures -same diagnostic.Downstream Impact
cudaqx/libraries/qec/...that currently consumestd::vector<bool>frommz(ancz, ancx)will need acudaq::to_bools(...)wrap. Tracked for the CUDA-QX follow-up PR.Follow-up
cudaq.kernel.