# Quantum Anomaly Detector — Interactive Demo

This notebook walks through the full pipeline of using a **Quantum Autoencoder** for financial fraud detection:

1. Generate synthetic transaction data
2. Preprocess & encode into quantum states
3. Build & inspect the quantum circuit
4. Train the hybrid model
5. Evaluate anomaly detection performance

In [None]:
import sys
sys.path.insert(0, '..')

import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import cirq
import sympy
import tensorflow as tf
import tensorflow_quantum as tfq

sns.set_theme(style='whitegrid', font_scale=1.1)
print(f'TensorFlow: {tf.__version__}')
print(f'TFQ: {tfq.__version__}')
print(f'Cirq: {cirq.__version__}')

## 1. Generate Synthetic Data

We create a dataset of **5,000 normal** and **500 fraudulent** transactions with 4 features:
- `transaction_amount` — log-normal (normal) vs. inflated (fraud)
- `time_since_last_txn` — exponential gap (normal) vs. rapid-fire (fraud)
- `distance_from_home` — moderate (normal) vs. far away (fraud)
- `merchant_risk_score` — low risk (normal) vs. high risk (fraud)

In [None]:
from src.data.generate_data import create_dataset

df, labels = create_dataset(n_normal=5000, n_fraud=500, seed=42)
print(f'Dataset shape: {df.shape}')
print(f'Normal: {(labels == 0).sum()}, Fraud: {(labels == 1).sum()}')
df.head(10)

In [None]:
# Visualize feature distributions by class
fig, axes = plt.subplots(2, 2, figsize=(12, 8))
features = ['transaction_amount', 'time_since_last_txn', 'distance_from_home', 'merchant_risk_score']

for ax, feat in zip(axes.flat, features):
    ax.hist(df[df['label'] == 0][feat], bins=50, alpha=0.7, label='Normal', color='#2196F3', density=True)
    ax.hist(df[df['label'] == 1][feat], bins=50, alpha=0.7, label='Fraud', color='#F44336', density=True)
    ax.set_title(feat)
    ax.legend()

fig.suptitle('Feature Distributions — Normal vs. Fraud', fontsize=14, fontweight='bold')
fig.tight_layout()
plt.show()

## 2. Preprocessing & Quantum State Preparation

We scale features to $[0, \pi]$ and encode each sample as $R_y(x_i)|0\rangle$ on the corresponding qubit (**angle encoding**).

In [None]:
from src.data.preprocessing import scale_features, split_data, prepare_quantum_data
from src.model.quantum_autoencoder import ALL_QUBITS

X = df.drop(columns=['label']).values

# Split: train on normal ONLY
X_train, X_test, y_train, y_test = split_data(X, labels, seed=42)
print(f'Train: {len(y_train)} samples (all normal)')
print(f'Test:  {len(y_test)} samples (normal: {(y_test==0).sum()}, fraud: {(y_test==1).sum()})')

# Scale
X_train_scaled, scaler = scale_features(X_train)
X_test_scaled = np.clip(scaler.transform(X_test), 0.0, np.pi)

print(f'\nScaled range: [{X_train_scaled.min():.3f}, {X_train_scaled.max():.3f}]')

In [None]:
# Convert to quantum circuits
q_train = prepare_quantum_data(X_train_scaled, ALL_QUBITS)
q_test = prepare_quantum_data(X_test_scaled, ALL_QUBITS)

print(f'Quantum train tensor: {q_train.shape}')
print(f'Quantum test tensor:  {q_test.shape}')

## 3. Inspect the Quantum Autoencoder Circuit

The variational ansatz uses:
- $R_y(\theta)$ single-qubit rotations
- Linear CNOT entanglement chains
- Depth $L=3$ layers

In [None]:
from src.model.quantum_autoencoder import create_quantum_autoencoder_circuit

circuit, symbols, readouts = create_quantum_autoencoder_circuit(depth=3)

print(f'Number of trainable parameters: {len(symbols)}')
print(f'Trash qubit observables: {readouts}')
print(f'\nCircuit depth: {len(circuit)}')
print(f'\n{circuit}')

## 4. Train the Quantum Autoencoder

The model minimizes the **fidelity loss**: $\mathcal{L} = 1 - \text{mean}(\langle Z \rangle_{\text{trash}})$

When the loss is 0, all trash qubits are in $|0\rangle$ — perfect compression.

In [None]:
from src.model.quantum_autoencoder import build_keras_model, fidelity_loss, NUM_TRASH_QUBITS

model = build_keras_model(depth=3)
model.compile(
    optimizer=tf.keras.optimizers.Adam(learning_rate=0.02),
    loss=fidelity_loss,
)
model.summary()

In [None]:
# Train
train_labels = np.ones((len(y_train), NUM_TRASH_QUBITS), dtype=np.float32)

history = model.fit(
    q_train,
    train_labels,
    epochs=50,
    batch_size=32,
    callbacks=[
        tf.keras.callbacks.EarlyStopping(monitor='loss', patience=10, restore_best_weights=True),
    ],
    verbose=1,
)

In [None]:
# Plot training loss
fig, ax = plt.subplots(figsize=(8, 4))
ax.plot(history.history['loss'], linewidth=2, color='#4CAF50')
ax.set_xlabel('Epoch')
ax.set_ylabel('Fidelity Loss (1 − ⟨Z⟩)')
ax.set_title('Training Loss — Quantum Autoencoder')
fig.tight_layout()
plt.show()

## 5. Evaluate — Anomaly Detection

**Key Idea:** The autoencoder was trained *only* on normal data. Fraudulent transactions produce higher anomaly scores because the circuit cannot compress unseen patterns.

In [None]:
from src.model.quantum_autoencoder import compute_anomaly_scores

scores = compute_anomaly_scores(model, q_test)

normal_scores = scores[y_test == 0]
fraud_scores = scores[y_test == 1]

print(f'Normal — mean score: {normal_scores.mean():.4f} ± {normal_scores.std():.4f}')
print(f'Fraud  — mean score: {fraud_scores.mean():.4f} ± {fraud_scores.std():.4f}')

In [None]:
# Anomaly score distribution
fig, ax = plt.subplots(figsize=(8, 5))
ax.hist(normal_scores, bins=40, alpha=0.7, label='Normal', color='#2196F3', density=True)
ax.hist(fraud_scores, bins=40, alpha=0.7, label='Fraud', color='#F44336', density=True)
ax.set_xlabel('Anomaly Score (1 − ⟨Z⟩)')
ax.set_ylabel('Density')
ax.set_title('Quantum Autoencoder — Anomaly Score Distribution')
ax.legend()
fig.tight_layout()
plt.show()

In [None]:
from sklearn.metrics import roc_auc_score, roc_curve, f1_score

# ROC Curve
auroc = roc_auc_score(y_test, scores)
fpr, tpr, _ = roc_curve(y_test, scores)

fig, ax = plt.subplots(figsize=(6, 6))
ax.plot(fpr, tpr, linewidth=2, color='#9C27B0', label=f'QAE (AUROC = {auroc:.3f})')
ax.plot([0, 1], [0, 1], 'k--', linewidth=0.8, label='Random')
ax.set_xlabel('False Positive Rate')
ax.set_ylabel('True Positive Rate')
ax.set_title('ROC Curve — Fraud Detection')
ax.legend(loc='lower right')
ax.set_aspect('equal')
fig.tight_layout()
plt.show()

print(f'AUROC: {auroc:.4f}')

## Summary

| Component | Detail |
|---|---|
| **Encoding** | Angle encoding via $R_y(x_i)$ — 4 features → 4 qubits |
| **Ansatz** | Hardware-efficient: $R_y$ rotations + CNOT chains, depth 3 |
| **Latent Space** | 2 latent qubits retain compressed state |
| **Trash Qubits** | 2 trash qubits trained to collapse to $|0\rangle$ |
| **Cost Function** | $\mathcal{L} = 1 - \text{mean}(\langle Z \rangle_{\text{trash}})$ |
| **Anomaly Score** | Same as cost — high for unseen (fraud) patterns |