# QDP Tutorial: GPU-Accelerated Quantum Data Preparation

This notebook introduces **QDP** (Quantum Data Preparation), which accelerates encoding classical data into quantum states on GPU.

**What you will do in this tutorial:**
1. Set up a Colab GPU environment for QDP.
2. Initialize `QdpEngine` and run a basic amplitude-encoding example.
3. Integrate QDP output with a small PennyLane + PyTorch training loop.


## 1. Environment Setup

The setup below installs Rust (needed for native extension build), clones Mahout, and installs QDP into the active Colab kernel.


In [None]:
# Check for NVIDIA GPU
!nvidia-smi

In [None]:
# 1. Install Rust toolchain (required for QDP compilation)
!curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y
import os
os.environ['PATH'] += ':/root/.cargo/bin'

# 2. Install uv and clone Mahout repository
!pip install uv
!git clone https://github.com/apache/mahout.git

# 3. Install from repository root (docs-aligned setup with QDP)
%cd /content/mahout
!uv sync --group dev --extra qdp

# 3b. Colab runs the system kernel; install QDP extension into this active kernel
!uv pip install --system -e qdp/qdp-python

# 4. Notebook dependency (if not already present in the runtime)
!pip install pennylane


## 2. Basic Usage

Next, we initialize `QdpEngine` and encode a simple sample into a quantum state tensor.


In [None]:
import torch
import numpy as np
from qumat.qdp import QdpEngine

print("Imported QdpEngine from qumat.qdp")

# Initialize engine on GPU 0
engine = QdpEngine(0)
print("QDP Engine initialized successfully on GPU 0")


In [None]:
# Example 1: Encode a simple Python list
data = [0.5, 0.5, 0.5, 0.5]
n_qubits = 2

# Encode using amplitude encoding
# 4 values can form a state of 2 qubits (2^2 = 4)
qtensor = engine.encode(data, n_qubits, "amplitude")

# Convert to PyTorch tensor (zero-copy)
torch_tensor = torch.from_dlpack(qtensor)

print(f"Quantum state shape: {torch_tensor.shape}")
print(f"Quantum state data:\n{torch_tensor}")


## 3. Real-World Integration: PennyLane Training Loop

QDP is most useful when its encoded states feed directly into a quantum ML workflow.

In this section we will:
1. Generate synthetic classification data.
2. Use **QDP** to encode data into quantum states on GPU.
3. Feed those states into a **PennyLane** quantum model.
4. Train end-to-end with PyTorch.


In [None]:
import pennylane as qml  # ty: ignore[unresolved-import]
import torch.nn as nn
import torch.optim as optim

# Configuration
n_qubits = 4
batch_size = 32
n_features = 1 << n_qubits  # Amplitude encoding: 2^n features
learning_rate = 0.1
epochs = 5

# 1. Create a PennyLane Device
# Use 'default.qubit' (CPU) or 'lightning.gpu' (if installed) for simulation.
# QDP handles the heavy lifting of state preparation on GPU first.
dev = qml.device("default.qubit", wires=n_qubits)

# 2. Define the QNode (Quantum Circuit)
# This takes a pre-calculated state vector as input
@qml.qnode(dev, interface="torch")
def qnn_circuit(inputs, weights):
    # Initialize the qubit register with the state from QDP
    qml.StatePrep(inputs, wires=range(n_qubits))
    
    # Trainable Variational Layers
    qml.StronglyEntanglingLayers(weights, wires=range(n_qubits))
    
    # Measure expectation value
    return [qml.expval(qml.PauliZ(i)) for i in range(n_qubits)]

print("PennyLane QNode defined successfully.")

In [None]:
# 3. Data preparation (synthetic)
# Generate random features and binary labels
input_data = np.random.rand(batch_size, n_features).astype(np.float64)

# Important: use float64 regarding the dtype mismatch error (Float vs Double)
labels = torch.randint(0, 2, (batch_size,), device="cuda:0").to(torch.float64)

# 4. QDP encoding (the acceleration step)
print("Encoding data on GPU with QDP...")
qtensor_batch = engine.encode(input_data, n_qubits, "amplitude")
# Converting to PyTorch tensor (on GPU)
train_states_gpu = torch.from_dlpack(qtensor_batch)

# 5. Define PyTorch model using the QNode
weight_shape = qml.StronglyEntanglingLayers.shape(n_layers=2, n_wires=n_qubits)
# Initialize weights as float64 to match input precision
weights = torch.nn.Parameter(torch.rand(weight_shape, dtype=torch.float64))
optimizer = optim.Adam([weights], lr=learning_rate)
loss_fn = nn.MSELoss()  # Simple MSE for demonstration

print(f"Starting training for {epochs} epochs...")

# 6. Training loop
for epoch in range(epochs):
    optimizer.zero_grad()

    # Forward pass: feed QDP states into PennyLane circuit
    # We sum the Z-expectation values to get a single prediction per sample
    predictions = torch.stack([torch.sum(torch.stack(qnn_circuit(state, weights))) for state in train_states_gpu])

    # Normalize predictions to [0, 1] for dummy classification (sigmoid-like)
    predictions = torch.sigmoid(predictions)

    loss = loss_fn(predictions, labels)
    loss.backward()
    optimizer.step()

    print(f"Epoch {epoch + 1}/{epochs} | Loss: {loss.item():.4f}")

print("Training complete!")
