Skip to content

Conversation

@guan404ming
Copy link
Member

Purpose of PR

  • Added encode_batch() method to Python bindings accepting NumPy 2D array directly (zero-copy)
  • Updated run_mahout() to process entire batch in single GPU kernel call

Related Issues or PRs

Changes Made

  • Bug fix
  • New feature
  • Refactoring
  • Documentation
  • Test
  • CI/CD pipeline
  • Other

Breaking Changes

  • Yes
  • No

Checklist

  • Added or updated unit tests for all changes
  • Added or updated documentation for all changes
  • Successfully built and ran all unit tests or manual tests locally
  • PR title follows "MAHOUT-XXX: Brief Description" format (if related to an issue)
  • Code follows ASF guidelines

@guan404ming
Copy link
Member Author

(qdp-python) titan% uv run python benchmark/benchmark_latency.py --qubits 12 --batches 20 --batch-size 8 --prefetch 4
Uninstalled 1 package in 1ms
Installed 1 package in 5ms
Generating 160 samples of 12 qubits...
  Batch size   : 8
  Vector length: 4096
  Batches      : 20
  Prefetch     : 4
  Frameworks   : pennylane, qiskit-init, qiskit-statevector, mahout
  Generated 160 samples
  PennyLane/Qiskit format: 5.00 MB
  Mahout format: 5.00 MB

======================================================================
DATA-TO-STATE LATENCY BENCHMARK: 12 Qubits, 160 Samples
======================================================================

[PennyLane] Full Pipeline (DataLoader -> GPU)...
  Total Time: 0.0423 s (0.264 ms/vector)

[Qiskit Initialize] Full Pipeline (DataLoader -> GPU)...
  Total Time: 11.8223 s (73.890 ms/vector)

[Qiskit Statevector] Full Pipeline (DataLoader -> GPU)...
  Total Time: 0.0222 s (0.139 ms/vector)

[Mahout] Full Pipeline (DataLoader -> GPU)...
  Total Time: 0.0157 s (0.098 ms/vector)

======================================================================
LATENCY (Lower is Better)
Samples: 160, Qubits: 12
======================================================================
Mahout                  0.098 ms/vector
Qiskit Statevector      0.139 ms/vector
PennyLane               0.264 ms/vector
Qiskit Initialize      73.890 ms/vector
----------------------------------------------------------------------
Speedup vs PennyLane:            2.69x
Speedup vs Qiskit Init:         751.84x
Speedup vs Qiskit Statevec:       1.41x

@400Ping
Copy link
Contributor

400Ping commented Jan 5, 2026

Thanks for the fix! I was referencing benchmark_throughput when I wrote the code so I didn't notice this issue. I also opened a issue to fix benchmark_throughput to batch encoding.

@guan404ming guan404ming force-pushed the fix/benchmark-batch-encoding branch 2 times, most recently from e076c25 to c5897de Compare January 5, 2026 11:11
@guan404ming guan404ming force-pushed the fix/benchmark-batch-encoding branch from c5897de to 82eaf73 Compare January 5, 2026 11:13
@guan404ming
Copy link
Member Author

Thanks for the fix! I was referencing benchmark_throughput when I wrote the code so I didn't notice this issue. I also opened a issue to fix benchmark_throughput to batch encoding.

Sure, let's do it after this one.

@guan404ming guan404ming marked this pull request as ready for review January 5, 2026 11:14
@guan404ming
Copy link
Member Author

cc @ryankert01 @rich7420

@guan404ming guan404ming changed the title [QDP] Fix benchmark to use batch encoding [QDP] Fix latency benchmark to use batch encoding Jan 5, 2026
@ryankert01
Copy link
Contributor

This is nice

@guan404ming guan404ming merged commit 4c99293 into apache:dev-qdp Jan 5, 2026
2 checks passed
@guan404ming guan404ming deleted the fix/benchmark-batch-encoding branch January 5, 2026 11:32
@guan404ming
Copy link
Member Author

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants