# üß¨ Hybrid QMolNet: Quantum-Classical Drug Molecule Property Prediction

## A Complete Tutorial and Demonstration

This notebook provides a comprehensive walkthrough of the **Hybrid Graph Neural Network + Variational Quantum Neural Network** for predicting drug molecule properties from SMILES strings.

### Pipeline Overview

```
SMILES ‚Üí RDKit Molecular Graph ‚Üí GNN Encoder ‚Üí Embedding Compression ‚Üí 
         Variational Quantum Circuit ‚Üí Classifier ‚Üí Property Prediction
```

### What You'll Learn

1. **Molecular Graph Construction** - Converting SMILES to PyTorch Geometric graphs
2. **GNN Encoding** - Message-passing neural networks for molecular embeddings
3. **Quantum Computing** - Variational quantum circuits with PennyLane
4. **Hybrid Architecture** - Combining classical and quantum layers
5. **Model Training** - End-to-end training with PyTorch
6. **Evaluation** - Comparing hybrid vs. classical approaches

---
## üì¶ Setup and Imports

In [None]:
# Standard imports
import os
import sys
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# PyTorch
import torch
import torch.nn as nn
import torch.nn.functional as F

# Set style
plt.style.use('seaborn-v0_8-whitegrid')
sns.set_palette('husl')

# Check versions
print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")

In [None]:
# Project imports
from utils.helpers import set_seed, get_device
from utils.smiles_to_graph import smiles_to_graph, MoleculeGraphBuilder
from utils.data_loader import load_dataset, create_data_loaders

from models.gnn_encoder import GNNEncoder, print_gnn_summary
from models.quantum_layer import VariationalQuantumLayer, draw_quantum_circuit, print_quantum_layer_summary
from models.hybrid_model import HybridQMolNet, print_hybrid_model_summary
from models.baselines import GNNClassifier

from training.trainer import Trainer
from training.callbacks import EarlyStoppingCallback

from evaluation.evaluator import ModelEvaluator
from evaluation.metrics import compute_metrics, print_metrics

from visualization.molecule_viz import plot_molecule, plot_molecular_graph
from visualization.plots import plot_training_curves, plot_confusion_matrix, plot_metrics_comparison
from visualization.embedding_viz import plot_embedding_comparison

# Set seed for reproducibility
SEED = 42
set_seed(SEED)
device = get_device()

---
## üß™ Part 1: Molecular Graph Construction

### Theory: From SMILES to Graphs

**SMILES** (Simplified Molecular Input Line Entry System) is a string notation for representing molecules. We convert these to **molecular graphs** where:

- **Nodes** = Atoms with features (atomic number, degree, charge, hybridization, aromaticity)
- **Edges** = Chemical bonds (bidirectional)

This graph representation is then processed by Graph Neural Networks.

In [None]:
# Example molecules
molecules = [
    ('CCO', 'Ethanol'),
    ('c1ccccc1', 'Benzene'),
    ('CC(=O)O', 'Acetic Acid'),
    ('CC(=O)Nc1ccc(O)cc1', 'Paracetamol'),
]

# Build graphs
builder = MoleculeGraphBuilder()
print(f"Node feature dimension: {builder.node_feature_dim}")
print(f"Edge feature dimension: {builder.edge_feature_dim}")
print()

for smiles, name in molecules:
    data = builder.build(smiles, label=1)
    print(f"{name} ({smiles}):")
    print(f"  Atoms: {data.x.shape[0]}")
    print(f"  Bonds: {data.edge_index.shape[1] // 2}")
    print(f"  Node features shape: {data.x.shape}")
    print()

In [None]:
# Visualize molecular structures
fig, axes = plt.subplots(2, 2, figsize=(12, 12))

for ax, (smiles, name) in zip(axes.flat, molecules):
    plot_molecule(smiles, title=name)
    
plt.tight_layout()
plt.show()

In [None]:
# Visualize as a graph
fig = plot_molecular_graph(
    'CC(=O)Nc1ccc(O)cc1', 
    title='Paracetamol - Graph Representation'
)
plt.show()

---
## üß† Part 2: Graph Neural Network Encoder

### Theory: Message Passing Neural Networks

GNNs learn molecular representations through **message passing**:

1. **Aggregation**: Each node collects features from its neighbors
2. **Update**: Node features are updated based on aggregated messages
3. **Pooling**: Node features are combined into a graph-level embedding

$$h_v^{(l+1)} = \sigma\left(W^{(l)} \cdot \text{AGG}\left(\{h_u^{(l)} : u \in \mathcal{N}(v)\}\right)\right)$$

In [None]:
# Create GNN encoder
gnn_encoder = GNNEncoder(
    input_dim=builder.node_feature_dim,
    hidden_dim=64,
    embedding_dim=32,  # Output: 32-dimensional embedding
    num_layers=3,
    conv_type='gcn',
    pooling='mean',
)

print_gnn_summary(gnn_encoder)

In [None]:
# Test encoding a molecule
from torch_geometric.data import Batch

# Create batch from example molecules
graphs = [builder.build(smiles) for smiles, _ in molecules]
batch = Batch.from_data_list(graphs)

# Forward pass
gnn_encoder.eval()
with torch.no_grad():
    embeddings = gnn_encoder.forward_batch(batch)

print(f"Input: {batch.num_graphs} molecules")
print(f"Output embeddings shape: {embeddings.shape}")
print(f"\nEmbedding vectors:")
for i, (_, name) in enumerate(molecules):
    print(f"  {name}: [{embeddings[i, :5].numpy()}...]")

---
## ‚öõÔ∏è Part 3: Variational Quantum Circuit

### Theory: Quantum Neural Networks

The variational quantum circuit (VQC) is a parameterized quantum operation:

1. **Angle Encoding**: Classical features are encoded as qubit rotations
   $$|\psi_0\rangle = \prod_i RY(\pi \cdot x_i)|0\rangle^{\otimes n}$$

2. **Entanglement**: CNOT gates create quantum correlations

3. **Parameterized Rotations**: Trainable parameters $\theta$
   $$U(\theta) = \prod_l \left(\text{CNOT-ring} \cdot \prod_i RX(\theta_i^l)RY(\theta_i^l)RZ(\theta_i^l)\right)$$

4. **Measurement**: Pauli-Z expectation values
   $$\langle\psi|Z_i|\psi\rangle \in [-1, 1]$$

Gradients are computed via the **parameter-shift rule**:
$$\frac{\partial f}{\partial \theta} = \frac{1}{2}\left[f(\theta + \pi/2) - f(\theta - \pi/2)\right]$$

In [None]:
# Create quantum layer
quantum_layer = VariationalQuantumLayer(
    n_qubits=8,      # 8 qubits matching compressed embedding
    n_layers=3,      # 3 variational blocks
    diff_method='parameter-shift'  # Quantum gradient method
)

print_quantum_layer_summary(quantum_layer)

In [None]:
# Visualize the quantum circuit
fig = draw_quantum_circuit(n_qubits=8, n_layers=2, figsize=(16, 10))
plt.show()

In [None]:
# Test quantum layer
x_test = torch.randn(2, 8)  # 2 samples, 8 features

print(f"Input shape: {x_test.shape}")
print(f"Input:\n{x_test}")
print()

with torch.no_grad():
    output = quantum_layer(x_test)

print(f"Output shape: {output.shape}")
print(f"Output (expectation values):\n{output}")
print(f"\nOutput range: [{output.min():.3f}, {output.max():.3f}]")
print("(Pauli-Z expectation values are in [-1, 1])")

---
## üîó Part 4: Hybrid Quantum-Classical Model

### Architecture

```
‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê
‚îÇ  SMILES ‚Üí Graph ‚Üí GNN ‚Üí Compress ‚Üí [Quantum] ‚Üí Classifier      ‚îÇ
‚îÇ                    ‚Üì         ‚Üì          ‚Üì            ‚Üì          ‚îÇ
‚îÇ               145D ‚Üí     32D ‚Üí      8D  ‚Üí        8D  ‚Üí    2     ‚îÇ
‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò
```

The hybrid model combines:
- **Classical GNN** for molecular understanding
- **Quantum VQC** for non-linear processing
- **Classical MLP** for final classification

In [None]:
# Create hybrid model
hybrid_model = HybridQMolNet(
    node_feature_dim=builder.node_feature_dim,
    gnn_hidden_dim=64,
    gnn_embedding_dim=32,
    gnn_layers=3,
    n_qubits=8,
    quantum_layers=3,
    num_classes=2,
)

print_hybrid_model_summary(hybrid_model)

In [None]:
# Test hybrid model on example molecules
hybrid_model.eval()
with torch.no_grad():
    logits = hybrid_model.forward_batch(batch)
    probs = F.softmax(logits, dim=1)

print(f"Input: {batch.num_graphs} molecules")
print(f"Output logits shape: {logits.shape}")
print(f"\nPredictions:")
for i, (_, name) in enumerate(molecules):
    pred = logits[i].argmax().item()
    prob = probs[i, pred].item()
    print(f"  {name}: Class {pred} (confidence: {prob:.2%})")

---
## üèãÔ∏è Part 5: Training the Models

Now we train both the hybrid model and a classical baseline to compare their performance.

In [None]:
# Load dataset
smiles_list, labels = load_dataset(n_samples=300, seed=SEED)

train_loader, val_loader, test_loader, dataset = create_data_loaders(
    smiles_list, labels,
    batch_size=32,
    seed=SEED,
)

print(f"\nNode feature dimension: {dataset.node_feature_dim}")

In [None]:
# Train GNN Baseline
print("Training GNN Baseline...")

gnn_baseline = GNNClassifier(
    node_feature_dim=dataset.node_feature_dim,
    num_classes=2,
)

gnn_trainer = Trainer(
    model=gnn_baseline,
    device=device,
    callbacks=[EarlyStoppingCallback(patience=10)],
    model_name="GNN_Baseline",
)

gnn_history = gnn_trainer.fit(
    train_loader, val_loader,
    num_epochs=30,
    verbose=True
)

In [None]:
# Train Hybrid Model (this will take longer due to quantum simulation)
print("Training Hybrid QMolNet...")
print("(Note: Quantum simulation is computationally intensive)")

hybrid_train = HybridQMolNet(
    node_feature_dim=dataset.node_feature_dim,
    n_qubits=8,
    quantum_layers=2,
    num_classes=2,
)

hybrid_trainer = Trainer(
    model=hybrid_train,
    device=device,
    callbacks=[EarlyStoppingCallback(patience=15)],
    model_name="Hybrid_QMolNet",
)

hybrid_history = hybrid_trainer.fit(
    train_loader, val_loader,
    num_epochs=30,
    verbose=True
)

In [None]:
# Plot training curves
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# GNN
ax = axes[0]
ax.plot(gnn_history.train_loss, label='Train Loss')
ax.plot(gnn_history.val_loss, label='Val Loss')
ax.set_title('GNN Baseline Training')
ax.set_xlabel('Epoch')
ax.set_ylabel('Loss')
ax.legend()

# Hybrid
ax = axes[1]
ax.plot(hybrid_history.train_loss, label='Train Loss')
ax.plot(hybrid_history.val_loss, label='Val Loss')
ax.set_title('Hybrid QMolNet Training')
ax.set_xlabel('Epoch')
ax.set_ylabel('Loss')
ax.legend()

plt.tight_layout()
plt.show()

---
## üìä Part 6: Evaluation and Comparison

In [None]:
# Evaluate GNN Baseline
gnn_evaluator = ModelEvaluator(gnn_baseline, device=device, model_name="GNN Baseline")
gnn_metrics = gnn_evaluator.evaluate(test_loader)
gnn_evaluator.print_results()

In [None]:
# Evaluate Hybrid Model
hybrid_evaluator = ModelEvaluator(hybrid_train, device=device, model_name="Hybrid QMolNet")
hybrid_metrics = hybrid_evaluator.evaluate(test_loader)
hybrid_evaluator.print_results()

In [None]:
# Compare metrics
all_metrics = {
    'GNN Baseline': gnn_metrics,
    'Hybrid QMolNet': hybrid_metrics,
}

fig = plot_metrics_comparison(all_metrics, title='Model Performance Comparison')
plt.show()

In [None]:
# Plot confusion matrices
fig, axes = plt.subplots(1, 2, figsize=(12, 5))

cm_gnn = gnn_evaluator.get_confusion_matrix()
cm_hybrid = hybrid_evaluator.get_confusion_matrix()

sns.heatmap(cm_gnn, annot=True, fmt='d', cmap='Blues', ax=axes[0])
axes[0].set_title('GNN Baseline')
axes[0].set_xlabel('Predicted')
axes[0].set_ylabel('True')

sns.heatmap(cm_hybrid, annot=True, fmt='d', cmap='Greens', ax=axes[1])
axes[1].set_title('Hybrid QMolNet')
axes[1].set_xlabel('Predicted')
axes[1].set_ylabel('True')

plt.tight_layout()
plt.show()

In [None]:
# Visualize embeddings
embeddings, labels = hybrid_evaluator.get_embeddings(test_loader, layer='gnn')

fig = plot_embedding_comparison(
    embeddings, labels,
    title='Learned Molecular Embeddings',
    class_names=['Inactive', 'Active']
)
plt.show()

---
## üéØ Summary

### Key Takeaways

1. **Molecular Graphs**: SMILES strings are converted to graphs with atom features and bond connectivity

2. **GNN Encoding**: Message-passing layers learn molecular representations by aggregating neighbor information

3. **Quantum Processing**: The VQC uses angle encoding and parameterized rotations to process compressed embeddings

4. **Hybrid Architecture**: Combining classical and quantum layers enables novel computational capabilities

5. **Parameter-Shift Gradients**: Quantum circuits are differentiable, enabling end-to-end training

### Research Directions

- Larger datasets (BBBP, Tox21, HIV)
- More qubits and deeper circuits
- Different quantum ansatzes
- Hardware execution (IBM Quantum, IonQ)

---

**Thank you for exploring Hybrid QMolNet!** üß¨‚öõÔ∏è