# Module 6: AI for Quantum Error Mitigation (Sequence Learning)

## ðŸŽ“ Educational Goal
In this module, we bridge the gap between **Quantum Physics** and **Deep Learning**. We will build a model that doesn't just "count gates" but actually *reads* the quantum circuit like a sentence to predict how errors accumulate.

## 6.1 The Problem: Non-Markovian Noise

Standard error mitigation techniques (like ZNE) assume that noise is "simple"â€”that it scales predictably. However, on real hardware (like IBM's superconducting transmons), errors are **Non-Markovian**, meaning they depend on the *history* of operations.

**Example:**
*   Sequence `H -> X -> H` might cause a drift in the qubit frequency.
*   Sequence `X -> H -> H` might cause a different heating effect.

A simple regression model that just counts "2 Hadamards and 1 X" sees these two circuits as identical. **A Recurrent Neural Network (LSTM)** sees them as different sequences, allowing it to learn these subtle memory effects.

## 6.2 The Solution: Long Short-Term Memory (LSTM)

We will treat the quantum circuit as a language.
1.  **Tokenization:** Convert gates to numbers ($H \to 1, CX \to 6$).
2.  **Embedding:** Map each gate to a dense vector in $\mathbb{R}^d$ (Learning the "meaning" of a gate).
3.  **LSTM Layer:** Process the sequence gate-by-gate, maintaining a hidden state $h_t$ that represents the "accumulated noise".

In [None]:
import numpy as np
import random
import torch
import torch.nn as nn
import torch.optim as optim
from sklearn.model_selection import train_test_split
from qiskit_aer import AerSimulator
from qiskit import transpile

# Import our new Shared Utilities for consistent physics
import utils

# --- HYPERPARAMETERS ---
# To learn 'real' physics, we need a 'real' dataset size.
DATASET_SIZE = 2000  # 200 was too small (overfitting). 2000 allows generalization.
MAX_SEQ_LEN = 60     # We pad all circuits to this length.
BATCH_SIZE = 32
EPOCHS = 20

## 6.3 Step 1: Tokenization

Just as a Large Language Model (LLM) needs to convert words into tokens, our QEM model needs to convert Quantum Gates into integers.

We define a vocabulary where `0` is reserved for "Padding" (empty space), and other numbers represent physical gates.

In [None]:
class CircuitTokenizer:
    def __init__(self):
        # 0 is reserved for Padding (essential for batch processing)
        self.gate_map = {"pad": 0, "h": 1, "x": 2, "z": 3, "s": 4, "id": 5, "cx": 6}
        self.vocab_size = len(self.gate_map)

    def tokenize(self, instruction_list):
        """
        Input:  ['h', 'cx', 'x']
        Output: [1, 6, 2]
        """
        return [self.gate_map.get(g, 0) for g in instruction_list]

    def pad_sequence(self, tokenized_seq, max_len):
        """
        Pads sequence with 0s to ensure every circuit has the same length
        Input:  [1, 6]
        Output: [1, 6, 0, 0, 0, ...]
        """
        if len(tokenized_seq) >= max_len:
            return tokenized_seq[:max_len]
        return tokenized_seq + [0] * (max_len - len(tokenized_seq))

tokenizer = CircuitTokenizer()
print(f"Vocabulary: {tokenizer.gate_map}")

## 6.4 Step 2: Generating the "Synthetic" Dataset

We cannot train on a real quantum computer (too slow/expensive). Instead, we use `qiskit_aer` with a **Custom Noise Model** (from Module 4/Utils) that mimics the thermal relaxation ($T_1, T_2$) of a real device.

We generate pairs of $(Sequence, Error)$:
*   **Input ($X$):** The tokenized gate sequence.
*   **Target ($y$):** The difference between the Ideal Result and the Noisy Result.

In [None]:
print(f"Generating {DATASET_SIZE} circuits...")

X_sequences = []
y_targets = []  # The Error (Ideal - Noisy)
y_noisy_vals = [] # For final evaluation
y_ideal_vals = [] # For final evaluation

sim_ideal = AerSimulator(method='stabilizer') # Fast simulation for ground truth
sim_noisy = AerSimulator(noise_model=utils.build_custom_noise_model())

for i in range(DATASET_SIZE):
    # 1. Generate Random Circuit
    depth = random.randint(5, 50)
    # utils.create_random_clifford_circuit is defined in our shared library
    qc, instructions = utils.create_random_clifford_circuit(2, depth, return_instructions=True)
    
    # 2. Tokenize
    tokens = tokenizer.tokenize(instructions)
    padded_tokens = tokenizer.pad_sequence(tokens, MAX_SEQ_LEN)
    X_sequences.append(padded_tokens)
    
    # 3. Simulate (Ideal vs Noisy)
    qc.measure_all()
    qc_t = transpile(qc, sim_noisy)
    
    # Ideal Expectation Value (Propability of even parity - odd parity)
    res_ideal = sim_ideal.run(qc_t, shots=1000).result().get_counts()
    exp_ideal = (res_ideal.get('00', 0) + res_ideal.get('11', 0) - res_ideal.get('01', 0) - res_ideal.get('10', 0)) / 1000
    
    # Noisy Expectation Value
    res_noisy = sim_noisy.run(qc_t, shots=1000).result().get_counts()
    exp_noisy = (res_noisy.get('00', 0) + res_noisy.get('11', 0) - res_noisy.get('01', 0) - res_noisy.get('10', 0)) / 1000
    
    # The AI must learn to predict this Difference
    y_targets.append(exp_ideal - exp_noisy)
    y_noisy_vals.append(exp_noisy)
    y_ideal_vals.append(exp_ideal)
    
    if i % 500 == 0:
        print(f"  Generated {i}/{DATASET_SIZE}...")

# Convert to PyTorch Tensors
X_tensor = torch.tensor(X_sequences, dtype=torch.long) # Long for Embedding index lookups
y_tensor = torch.tensor(y_targets, dtype=torch.float32).unsqueeze(1)

X_train, X_test, y_train, y_test = train_test_split(X_tensor, y_tensor, test_size=0.2)
print("âœ… Dataset Ready.")

## 6.5 Step 3: Neural Network Architecture

We use a standard sequence processing architecture:

1.  **Embedding Layer:** Converts integer token `6` (CNOT) into a learnable vector (e.g., `[0.2, -0.1, 0.9, ...]`). This allows the model to learn that "CNOT is similar to CZ" or "X is different from Z".
2.  **LSTM Layer:** The core logic. It reads the vectors one by one. The internal "cell state" acts as the memory of the noise.
3.  **Linear Head:** Takes the final state of the LSTM and outputs a single number: the predicted error.

In [None]:
class QEM_LSTM(nn.Module):
    def __init__(self, vocab_size, embedding_dim=16, hidden_size=32):
        super(QEM_LSTM, self).__init__()
        # 1. Embedding
        self.embedding = nn.Embedding(vocab_size, embedding_dim, padding_idx=0)
        
        # 2. LSTM
        # batch_first=True means input shape is (Batch, Seq, Feature)
        self.lstm = nn.LSTM(embedding_dim, hidden_size, batch_first=True)
        
        # 3. Output Head
        self.fc = nn.Linear(hidden_size, 1)
        
    def forward(self, x):
        # x shape: [32, 60] (Batch of 32 circuits, each 60 gates long)
        embedded = self.embedding(x)
        
        # LSTM output shape: [32, 60, 32] (Hidden state for EVERY step)
        lstm_out, (h_n, c_n) = self.lstm(embedded)
        
        # We only care about the state after the LAST gate.
        # h_n contains the final hidden state for the sequence.
        # Shape: [1, 32, 32] -> Squeeze to [32, 32]
        last_hidden_state = h_n[-1]
        
        prediction = self.fc(last_hidden_state)
        return prediction

model = QEM_LSTM(vocab_size=tokenizer.vocab_size)
optimizer = optim.Adam(model.parameters(), lr=0.002)
criterion = nn.MSELoss()

In [None]:
print("Starting Training...")
train_dataset = torch.utils.data.TensorDataset(X_train, y_train)
train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=BATCH_SIZE, shuffle=True)

losses = []
for epoch in range(EPOCHS):
    epoch_loss = 0
    for batch_X, batch_y in train_loader:
        optimizer.zero_grad()
        outputs = model(batch_X)
        loss = criterion(outputs, batch_y)
        loss.backward()
        optimizer.step()
        epoch_loss += loss.item()
    
    avg_loss = epoch_loss / len(train_loader)
    losses.append(avg_loss)
    if (epoch+1) % 5 == 0:
        print(f"Epoch {epoch+1}/{EPOCHS} | MSE Loss: {avg_loss:.5f}")

print("Training Complete.")

## 6.6 Verification: "Real World" Test

Now we verify if this math actually works. We create a completely new circuit (the "Test Set") and see if the AI can clean up the noise.

In [None]:
model.eval()

print("Validating on single PROOF circuit...")
# Create random validation circuit
val_qc, val_instr = utils.create_random_clifford_circuit(2, 40, return_instructions=True)
val_qc.measure_all()

# 1. Calculate Ground Truth (Ideal)
v_id_counts = sim_ideal.run(transpile(val_qc, sim_ideal), shots=2000).result().get_counts()
v_ideal = (v_id_counts.get('00',0)+v_id_counts.get('11',0) - v_id_counts.get('01',0)-v_id_counts.get('10',0))/2000

# 2. Calculate Noisy Raw Result
v_no_counts = sim_noisy.run(transpile(val_qc, sim_noisy), shots=2000).result().get_counts()
v_noisy = (v_no_counts.get('00',0)+v_no_counts.get('11',0) - v_no_counts.get('01',0)-v_no_counts.get('10',0))/2000

# 3. AI Prediction
token_seq = tokenizer.tokenize(val_instr)
padded_seq = tokenizer.pad_sequence(token_seq, MAX_SEQ_LEN)
input_tensor = torch.tensor([padded_seq], dtype=torch.long)

with torch.no_grad():
    predicted_error = model(input_tensor).item()

# 4. Mitigation
# Logic: Ideal = Noisy + Error  =>  Estimate = Noisy + Predicted_Error
v_mitigated = v_noisy + predicted_error

# Results
print(f"\n--- RESULTS ---")
print(f"Target (Ideal):    {v_ideal:.3f}")
print(f"Noisy Baseline:    {v_noisy:.3f}   (Diff: {v_ideal-v_noisy:.3f})")
print(f"AI Mitigated:      {v_mitigated:.3f}   (Diff: {v_ideal-v_mitigated:.3f})")

improvement = abs(v_ideal-v_noisy) / (abs(v_ideal-v_mitigated) + 1e-9)
print(f"\nðŸš€ Error Reduction Factor: {improvement:.1f}x")