Skip to content

feat(qdp): Encoding + Dtype enums, static encoder dispatch#1276

Merged
ryankert01 merged 4 commits into
apache:mainfrom
rich7420:encode-types
May 11, 2026
Merged

feat(qdp): Encoding + Dtype enums, static encoder dispatch#1276
ryankert01 merged 4 commits into
apache:mainfrom
rich7420:encode-types

Conversation

@rich7420
Copy link
Copy Markdown
Contributor

Related Issues

Changes

  • Bug fix

  • New feature

  • Refactoring

  • Documentation

  • Test

  • CI/CD pipeline

  • Other

    Why

    "amplitude" was parsed and lowercased at 6 independent sites across the stack, causing silent divergence (e.g.
    encode_from_parquet(…, "Amplitude") silently failed while encode_batch accepted it). get_encoder heap-allocated
    a Box per batch for zero-sized unit structs. float32_pipeline: bool was scattered with
    inconsistent defaults, making the published baseline ambiguous.

    How

    • Added Encoding enum and Dtype alias in qdp-core/src/types.rs — single parse at every API boundary, no heap
      allocation in dispatch
    • Replaced get_encoder(&str) → Box with Encoding::encoder() → &'static dyn QuantumEncoder
      via OnceLock statics for IQP variants
    • PipelineConfig: encoding_method: String → encoding: Encoding, float32_pipeline: bool → dtype: Dtype
    • Removed encoding_supports_f32 and vector_len free functions; replaced by methods on Encoding
    • Fixed streaming Parquet dispatcher case-sensitivity gap in encoding/mod.rs
    • PyO3 boundary (engine.rs, loader.rs, lib.rs, pytorch.rs): strings consumed at boundary, enum passed
      internally
    • qumat_qdp/loader.py: added _VALID_ENCODINGS frozenset, early validation in _validate_loader_args and
      .encoding() setter

Checklist

  • Added or updated unit tests for all changes
  • Added or updated documentation for all changes

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR centralizes encoding/dtype parsing and encoder dispatch in qdp-core by introducing canonical Encoding/Dtype domain types, and updates Rust + PyO3/Python boundaries to pass those typed values internally (reducing duplicated string handling and avoiding per-call heap allocation for encoder selection).

Changes:

  • Add qdp_core::Encoding + Dtype and replace string-based encoder selection with static encoder dispatch (incl. OnceLock for IQP variants).
  • Refactor pipeline configuration to use encoding: Encoding and dtype (vs encoding_method: String / float32_pipeline: bool) and update pipeline runner dispatch accordingly.
  • Update Python/Rust boundaries (PyO3 + pure Python loader) to validate/parse encodings at the boundary and add regression tests for case-insensitive parsing.

Reviewed changes

Copilot reviewed 18 out of 18 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
qdp/qdp-core/src/types.rs Introduces Encoding + Dtype and static encoder dispatch; adds parsing/helpers
qdp/qdp-core/src/lib.rs Switches engine APIs from get_encoder(&str) to Encoding parsing + static dispatch
qdp/qdp-core/src/pipeline_runner.rs Pipeline config now uses typed encoding/dtype and avoids string parsing in hot paths
qdp/qdp-core/src/gpu/encodings/mod.rs Removes get_encoder factory and tightens QuantumEncoder to 'static
qdp/qdp-core/src/gpu/encodings/iqp.rs Adds OnceLock-backed shared IQP encoder instances for static dispatch
qdp/qdp-core/src/encoding/mod.rs Fixes streaming parquet dispatch to be case-insensitive via Encoding::from_str_ci
qdp/qdp-python/src/engine.rs Parses Encoding/Dtype at the boundary and passes typed values internally
qdp/qdp-python/src/pytorch.rs Uses Encoding for CUDA tensor validation logic
qdp/qdp-python/qumat_qdp/loader.py Adds early encoding validation against a canonical set
qdp/qdp-core/tests/types.rs Adds tests for encoding/dtype parsing, vector_len, and static dispatch stability

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread qdp/qdp-core/src/types.rs Outdated
Comment on lines +123 to +125
Self::Angle => n,
Self::Basis => 1,
Self::Amplitude | Self::Iqp | Self::IqpZ | Self::Phase => 1 << n,
Copy link

Copilot AI Apr 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Encoding::vector_len is documented and used as the per-sample input feature dimension (it drives PipelineIterator file sample_size validation and synthetic batch generation), but the mapping for Iqp/IqpZ/Phase currently returns 1 << num_qubits. That contradicts the encoders’ own input validation (e.g., PhaseEncoder expects sample_size == num_qubits, and IqpEncoder expects num_qubits or num_qubits + num_qubits*(num_qubits-1)/2). This will make pipeline/file loading for these encodings reject valid inputs (or generate invalid synthetic batches). Update vector_len for these variants to match the encoder input lengths.

Suggested change
Self::Angle => n,
Self::Basis => 1,
Self::Amplitude | Self::Iqp | Self::IqpZ | Self::Phase => 1 << n,
Self::Amplitude => 1 << n,
Self::Angle | Self::Iqp | Self::IqpZ | Self::Phase => n,
Self::Basis => 1,

Copilot uses AI. Check for mistakes.
Comment thread qdp/qdp-python/src/engine.rs Outdated
Comment on lines +480 to +486
validate_cuda_tensor_for_encoding(data, self.engine.device().ordinal(), encoding_method)?;

let encoding = Encoding::from_str_ci(encoding_method)
.map_err(|e| PyRuntimeError::new_err(e.to_string()))?;
let dtype = data.getattr("dtype")?;
let dtype_str: String = dtype.str()?.extract()?;
let dtype_str_lower = dtype_str.to_ascii_lowercase();
let is_f32 = dtype_str_lower.contains("float32");
let method = encoding_method.to_ascii_lowercase();
let is_f32 = dtype_str.to_ascii_lowercase().contains("float32");
Copy link

Copilot AI Apr 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

validate_cuda_tensor_for_encoding already parses encoding_method into an Encoding, but _encode_from_cuda_tensor parses the same string again immediately after validation. To avoid duplicated parsing (and the risk of future divergence), consider changing the validator to return the parsed Encoding (or accept Encoding as an argument) so the caller can reuse it.

Copilot uses AI. Check for mistakes.
Comment thread qdp/qdp-core/src/types.rs Outdated
Comment on lines +35 to +57
/// Dtype for pipeline configuration (alias of [`crate::gpu::memory::Precision`]).
pub type Dtype = crate::gpu::memory::Precision;

impl Dtype {
/// Parse dtype from a short user string (case-insensitive, trimmed).
pub fn from_str_ci(s: &str) -> Result<Self> {
let t = s.trim();
if t.eq_ignore_ascii_case("f32")
|| t.eq_ignore_ascii_case("float32")
|| t.eq_ignore_ascii_case("float")
{
Ok(Self::Float32)
} else if t.eq_ignore_ascii_case("f64")
|| t.eq_ignore_ascii_case("float64")
|| t.eq_ignore_ascii_case("double")
{
Ok(Self::Float64)
} else {
Err(MahoutError::InvalidInput(format!(
"Unknown dtype: {s}. Use 'f32' or 'f64'."
)))
}
}
Copy link

Copilot AI Apr 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dtype is declared as a type alias to crate::gpu::memory::Precision, but this file adds an inherent impl Dtype { ... }. Rust does not allow inherent impls on type aliases, so this will fail to compile. Consider moving these methods onto Precision (e.g., impl Precision { ... }) and re-exporting it as Dtype, or changing Dtype into a newtype wrapper if you need a distinct type name.

Copilot uses AI. Check for mistakes.

if !CUDA_ENCODING_METHODS.contains(&method.as_str()) {
// Phase has no CUDA tensor path yet.
if matches!(encoding, Encoding::Phase) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is that a todo?

@ryankert01
Copy link
Copy Markdown
Member

I view this pr briefly, overall make sense.

@rich7420
Copy link
Copy Markdown
Contributor Author

@ryankert01 sorry for the late

@ryankert01
Copy link
Copy Markdown
Member

ryankert01 commented May 10, 2026

Is this PR ready to merge? I planned to cut release branch around today's community sync.

@rich7420
Copy link
Copy Markdown
Contributor Author

yes, I think it's ready to merge

Rephrase the TODO at pytorch.rs:160 into a factual comment. The
user-facing error already directs callers to fall back to a CPU
tensor for Phase, so a separate tracker is not warranted.
@ryankert01 ryankert01 merged commit 6cfbf10 into apache:main May 11, 2026
8 checks passed
rich7420 added a commit to rich7420/mahout that referenced this pull request May 11, 2026
…ture

Tests still passed args using the pre-apache#1276 string/bool signature.
Update them to the post-rebase (Encoding, Precision) signature.
ryankert01 pushed a commit that referenced this pull request May 11, 2026
… and pipeline improvements (#1275)

* [Feature][QDP] F32 support for angle/basis encoders, fidelity metrics, and pipeline improvements

* update and improve

* correctness + completeness for f32 zero-copy paths

* update and improve

* update and improve

* refactor(qdp): clean up loader and gate metrics module as test-only

Cleanup pass on review feedback for the f32 angle/basis PR:

- gpu/mod.rs: mark `metrics` module and its re-exports `#[doc(hidden)]`
  to signal that fidelity / trace-distance helpers are test-only and
  not part of the supported runtime API.
- loader.py: lift inline `import os/sys/warnings` to module scope; add
  named constants for backend literals (`_BACKEND_RUST/PYTORCH/AUTO`)
  and supported file extensions (`_STREAMING_FILE_EXTS`,
  `_SUPPORTED_FILE_EXTS`); extract `_path_extension` and
  `_platform_hint` helpers to remove duplicated string parsing and
  platform-message construction; cache the IterableDataset subclass
  at module scope via `_build_torch_dataset` so `as_torch_dataset()`
  no longer redefines the class on every call.

* test(qdp): update compute_optimal_prefetch_depth tests for enum signature

Tests still passed args using the pre-#1276 string/bool signature.
Update them to the post-rebase (Encoding, Precision) signature.

* refactor(qdp): converge CUDA-tensor encoding dispatch

Three small cleanups in _encode_from_cuda_tensor:

- Hoist validate_shape(ndim, ...) to the top so the redundant per-branch
  ndim error arms (one in the f32 path, one in the f64 path with the
  same message) both collapse into a single unreachable! guard.
- Hoist the duplicate get_torch_cuda_stream_ptr(data)? call shared by
  both paths.
- Merge the six-arm f32 match-of-tuples into a single match block with
  shared num_samples/sample_size/input_len bindings, dropping ~45 lines
  of repeated unsafe { ... }.map_err(...)? scaffolding.

Net: -45 LoC. No behavior change; same error messages; clippy clean.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants