feat(qdp): Encoding + Dtype enums, static encoder dispatch by rich7420 · Pull Request #1276 · apache/mahout

rich7420 · 2026-04-19T08:54:02Z

Related Issues

Changes

Bug fix
New feature
Refactoring
Documentation
Test
CI/CD pipeline
Other

Why

"amplitude" was parsed and lowercased at 6 independent sites across the stack, causing silent divergence (e.g.
encode_from_parquet(…, "Amplitude") silently failed while encode_batch accepted it). get_encoder heap-allocated
a Box per batch for zero-sized unit structs. float32_pipeline: bool was scattered with
inconsistent defaults, making the published baseline ambiguous.

How
- Added Encoding enum and Dtype alias in qdp-core/src/types.rs — single parse at every API boundary, no heap
  allocation in dispatch
- Replaced get_encoder(&str) → Box with Encoding::encoder() → &'static dyn QuantumEncoder
  via OnceLock statics for IQP variants
- PipelineConfig: encoding_method: String → encoding: Encoding, float32_pipeline: bool → dtype: Dtype
- Removed encoding_supports_f32 and vector_len free functions; replaced by methods on Encoding
- Fixed streaming Parquet dispatcher case-sensitivity gap in encoding/mod.rs
- PyO3 boundary (engine.rs, loader.rs, lib.rs, pytorch.rs): strings consumed at boundary, enum passed
  internally
- qumat_qdp/loader.py: added _VALID_ENCODINGS frozenset, early validation in _validate_loader_args and
  .encoding() setter

Checklist

Added or updated unit tests for all changes
Added or updated documentation for all changes

Copilot

Pull request overview

This PR centralizes encoding/dtype parsing and encoder dispatch in qdp-core by introducing canonical Encoding/Dtype domain types, and updates Rust + PyO3/Python boundaries to pass those typed values internally (reducing duplicated string handling and avoiding per-call heap allocation for encoder selection).

Changes:

Add qdp_core::Encoding + Dtype and replace string-based encoder selection with static encoder dispatch (incl. OnceLock for IQP variants).
Refactor pipeline configuration to use encoding: Encoding and dtype (vs encoding_method: String / float32_pipeline: bool) and update pipeline runner dispatch accordingly.
Update Python/Rust boundaries (PyO3 + pure Python loader) to validate/parse encodings at the boundary and add regression tests for case-insensitive parsing.

Reviewed changes

Copilot reviewed 18 out of 18 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
qdp/qdp-core/src/types.rs	Introduces `Encoding` + `Dtype` and static encoder dispatch; adds parsing/helpers
qdp/qdp-core/src/lib.rs	Switches engine APIs from `get_encoder(&str)` to `Encoding` parsing + static dispatch
qdp/qdp-core/src/pipeline_runner.rs	Pipeline config now uses typed encoding/dtype and avoids string parsing in hot paths
qdp/qdp-core/src/gpu/encodings/mod.rs	Removes `get_encoder` factory and tightens `QuantumEncoder` to `'static`
qdp/qdp-core/src/gpu/encodings/iqp.rs	Adds `OnceLock`-backed shared IQP encoder instances for static dispatch
qdp/qdp-core/src/encoding/mod.rs	Fixes streaming parquet dispatch to be case-insensitive via `Encoding::from_str_ci`
qdp/qdp-python/src/engine.rs	Parses `Encoding`/`Dtype` at the boundary and passes typed values internally
qdp/qdp-python/src/pytorch.rs	Uses `Encoding` for CUDA tensor validation logic
qdp/qdp-python/qumat_qdp/loader.py	Adds early encoding validation against a canonical set
qdp/qdp-core/tests/types.rs	Adds tests for encoding/dtype parsing, vector_len, and static dispatch stability

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-19T09:05:27Z

+            Self::Angle => n,
+            Self::Basis => 1,
+            Self::Amplitude | Self::Iqp | Self::IqpZ | Self::Phase => 1 << n,


Encoding::vector_len is documented and used as the per-sample input feature dimension (it drives PipelineIterator file sample_size validation and synthetic batch generation), but the mapping for Iqp/IqpZ/Phase currently returns 1 << num_qubits. That contradicts the encoders’ own input validation (e.g., PhaseEncoder expects sample_size == num_qubits, and IqpEncoder expects num_qubits or num_qubits + num_qubits*(num_qubits-1)/2). This will make pipeline/file loading for these encodings reject valid inputs (or generate invalid synthetic batches). Update vector_len for these variants to match the encoder input lengths.

Suggested change

Self::Angle => n,

Self::Basis => 1,

Self::Amplitude | Self::Iqp | Self::IqpZ | Self::Phase => 1 << n,

Self::Amplitude => 1 << n,

Self::Angle | Self::Iqp | Self::IqpZ | Self::Phase => n,

Self::Basis => 1,

Copilot · 2026-04-19T09:05:28Z

        validate_cuda_tensor_for_encoding(data, self.engine.device().ordinal(), encoding_method)?;

+        let encoding = Encoding::from_str_ci(encoding_method)
+            .map_err(|e| PyRuntimeError::new_err(e.to_string()))?;
        let dtype = data.getattr("dtype")?;
        let dtype_str: String = dtype.str()?.extract()?;
-        let dtype_str_lower = dtype_str.to_ascii_lowercase();
-        let is_f32 = dtype_str_lower.contains("float32");
-        let method = encoding_method.to_ascii_lowercase();
+        let is_f32 = dtype_str.to_ascii_lowercase().contains("float32");


validate_cuda_tensor_for_encoding already parses encoding_method into an Encoding, but _encode_from_cuda_tensor parses the same string again immediately after validation. To avoid duplicated parsing (and the risk of future divergence), consider changing the validator to return the parsed Encoding (or accept Encoding as an argument) so the caller can reuse it.

Copilot · 2026-04-19T09:05:28Z

+/// Dtype for pipeline configuration (alias of [`crate::gpu::memory::Precision`]).
+pub type Dtype = crate::gpu::memory::Precision;
+
+impl Dtype {
+    /// Parse dtype from a short user string (case-insensitive, trimmed).
+    pub fn from_str_ci(s: &str) -> Result<Self> {
+        let t = s.trim();
+        if t.eq_ignore_ascii_case("f32")
+            || t.eq_ignore_ascii_case("float32")
+            || t.eq_ignore_ascii_case("float")
+        {
+            Ok(Self::Float32)
+        } else if t.eq_ignore_ascii_case("f64")
+            || t.eq_ignore_ascii_case("float64")
+            || t.eq_ignore_ascii_case("double")
+        {
+            Ok(Self::Float64)
+        } else {
+            Err(MahoutError::InvalidInput(format!(
+                "Unknown dtype: {s}. Use 'f32' or 'f64'."
+            )))
+        }
+    }


Dtype is declared as a type alias to crate::gpu::memory::Precision, but this file adds an inherent impl Dtype { ... }. Rust does not allow inherent impls on type aliases, so this will fail to compile. Consider moving these methods onto Precision (e.g., impl Precision { ... }) and re-exporting it as Dtype, or changing Dtype into a newtype wrapper if you need a distinct type name.

ryankert01 · 2026-04-21T17:20:37Z


-    if !CUDA_ENCODING_METHODS.contains(&method.as_str()) {
+    // Phase has no CUDA tensor path yet.
+    if matches!(encoding, Encoding::Phase) {


is that a todo?

ryankert01 · 2026-04-21T17:21:13Z

I view this pr briefly, overall make sense.

rich7420 · 2026-05-10T13:20:27Z

@ryankert01 sorry for the late

ryankert01 · 2026-05-10T16:56:31Z

Is this PR ready to merge? I planned to cut release branch around today's community sync.

rich7420 · 2026-05-11T06:20:02Z

yes, I think it's ready to merge

Rephrase the TODO at pytorch.rs:160 into a factual comment. The user-facing error already directs callers to fall back to a CPU tensor for Phase, so a separate tracker is not warranted.

…ture Tests still passed args using the pre-apache#1276 string/bool signature. Update them to the post-rebase (Encoding, Precision) signature.

… and pipeline improvements (#1275) * [Feature][QDP] F32 support for angle/basis encoders, fidelity metrics, and pipeline improvements * update and improve * correctness + completeness for f32 zero-copy paths * update and improve * update and improve * refactor(qdp): clean up loader and gate metrics module as test-only Cleanup pass on review feedback for the f32 angle/basis PR: - gpu/mod.rs: mark `metrics` module and its re-exports `#[doc(hidden)]` to signal that fidelity / trace-distance helpers are test-only and not part of the supported runtime API. - loader.py: lift inline `import os/sys/warnings` to module scope; add named constants for backend literals (`_BACKEND_RUST/PYTORCH/AUTO`) and supported file extensions (`_STREAMING_FILE_EXTS`, `_SUPPORTED_FILE_EXTS`); extract `_path_extension` and `_platform_hint` helpers to remove duplicated string parsing and platform-message construction; cache the IterableDataset subclass at module scope via `_build_torch_dataset` so `as_torch_dataset()` no longer redefines the class on every call. * test(qdp): update compute_optimal_prefetch_depth tests for enum signature Tests still passed args using the pre-#1276 string/bool signature. Update them to the post-rebase (Encoding, Precision) signature. * refactor(qdp): converge CUDA-tensor encoding dispatch Three small cleanups in _encode_from_cuda_tensor: - Hoist validate_shape(ndim, ...) to the top so the redundant per-branch ndim error arms (one in the f32 path, one in the f64 path with the same message) both collapse into a single unreachable! guard. - Hoist the duplicate get_torch_cuda_stream_ptr(data)? call shared by both paths. - Merge the six-arm f32 match-of-tuples into a single match block with shared num_samples/sample_size/input_len bindings, dropping ~45 lines of repeated unsafe { ... }.map_err(...)? scaffolding. Net: -45 LoC. No behavior change; same error messages; clippy clean.

rich7420 requested review from 400Ping, guan404ming and ryankert01 as code owners April 19, 2026 08:54

rich7420 requested a review from Copilot April 19, 2026 08:59

Copilot started reviewing on behalf of rich7420 April 19, 2026 09:00 View session

Copilot AI reviewed Apr 19, 2026

View reviewed changes

ryankert01 reviewed Apr 21, 2026

View reviewed changes

rich7420 added 3 commits May 10, 2026 10:18

feat(qdp): Encoding + Dtype enums, static encoder dispatch

05982c2

update and improve

0c4cf93

resolve conflict, update and improve tests

519c565

rich7420 force-pushed the encode-types branch from 152e31a to 519c565 Compare May 10, 2026 13:19

review: clarify Phase CUDA tensor scope

b3f3b2c

Rephrase the TODO at pytorch.rs:160 into a factual comment. The user-facing error already directs callers to fall back to a CPU tensor for Phase, so a separate tracker is not warranted.

ryankert01 approved these changes May 11, 2026

View reviewed changes

ryankert01 merged commit 6cfbf10 into apache:main May 11, 2026
8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(qdp): Encoding + Dtype enums, static encoder dispatch#1276

feat(qdp): Encoding + Dtype enums, static encoder dispatch#1276
ryankert01 merged 4 commits into
apache:mainfrom
rich7420:encode-types

rich7420 commented Apr 19, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 19, 2026

Uh oh!

Copilot AI Apr 19, 2026

Uh oh!

Copilot AI Apr 19, 2026

Uh oh!

ryankert01 Apr 21, 2026

Uh oh!

ryankert01 commented Apr 21, 2026

Uh oh!

rich7420 commented May 10, 2026

Uh oh!

ryankert01 commented May 10, 2026 •

edited

Loading

Uh oh!

rich7420 commented May 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

rich7420 commented Apr 19, 2026

Related Issues

Changes

Why

How

Checklist

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 19, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 19, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 19, 2026

Choose a reason for hiding this comment

Uh oh!

ryankert01 Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

ryankert01 commented Apr 21, 2026

Uh oh!

rich7420 commented May 10, 2026

Uh oh!

ryankert01 commented May 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rich7420 commented May 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ryankert01 commented May 10, 2026 •

edited

Loading