# **Graph Classification with MUTAG Dataset**
This notebook demonstrates **graph classification** using a **Graph Neural Network (GNN)** on the **MUTAG dataset**, which consists of molecular graphs.

## **1. Load the MUTAG Dataset** 📂
The **MUTAG dataset** is a collection of molecular graphs labeled based on mutagenic effects.

In [1]:
import torch
import torch.nn.functional as F
from torch_geometric.datasets import TUDataset
from torch_geometric.loader import DataLoader
from torch_geometric.nn import GCNConv, global_mean_pool

# Load the MUTAG dataset
dataset = TUDataset(root='data', name='MUTAG')
print(f'Dataset: {dataset}')
print(f'Number of graphs: {len(dataset)}')
print(f'Number of features per node: {dataset.num_node_features}')
print(f'Number of classes: {dataset.num_classes}')



Dataset: MUTAG(188)
Number of graphs: 188
Number of features per node: 7
Number of classes: 2


## **2. Define a GNN for Graph Classification** 🧠
We use **Graph Convolutional Networks (GCN)** and **global mean pooling** to classify entire graphs.

In [2]:
class GraphClassifier(torch.nn.Module):
    def __init__(self, in_channels, hidden_channels, out_channels):
        super(GraphClassifier, self).__init__()
        self.conv1 = GCNConv(in_channels, hidden_channels)
        self.conv2 = GCNConv(hidden_channels, hidden_channels)
        self.fc = torch.nn.Linear(hidden_channels, out_channels)

    def forward(self, x, edge_index, batch):
        x = self.conv1(x, edge_index).relu()
        x = self.conv2(x, edge_index).relu()
        x = global_mean_pool(x, batch)  # Pooling over all nodes in a graph
        return self.fc(x)


## **3. Train the Model** 🚀
We train the GNN using the **Adam optimizer** and **cross-entropy loss**.

In [3]:
# Create data loader
loader = DataLoader(dataset, batch_size=32, shuffle=True)

# Initialize model, optimizer, and loss function
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = GraphClassifier(dataset.num_node_features, 64, dataset.num_classes).to(device)
optimizer = torch.optim.Adam(model.parameters(), lr=0.01, weight_decay=5e-4)
criterion = torch.nn.CrossEntropyLoss()

# Training loop
def train():
    model.train()
    total_loss = 0
    for batch in loader:
        batch = batch.to(device)
        optimizer.zero_grad()
        out = model(batch.x, batch.edge_index, batch.batch)
        loss = criterion(out, batch.y)
        loss.backward()
        optimizer.step()
        total_loss += loss.item()
    return total_loss / len(loader)

# Train the model for 100 epochs
for epoch in range(100):
    loss = train()
    if epoch % 10 == 0:
        print(f'Epoch {epoch}, Loss: {loss:.4f}')

Epoch 0, Loss: 0.6662
Epoch 10, Loss: 0.5787
Epoch 20, Loss: 0.5240
Epoch 30, Loss: 0.5136
Epoch 40, Loss: 0.5087
Epoch 50, Loss: 0.5133
Epoch 60, Loss: 0.5148
Epoch 70, Loss: 0.5127
Epoch 80, Loss: 0.5032
Epoch 90, Loss: 0.4936


## **4. Evaluate the Model** 📊
We measure the accuracy of the trained model.

In [4]:
def test():
    model.eval()
    correct = 0
    total = 0
    for batch in loader:
        batch = batch.to(device)
        out = model(batch.x, batch.edge_index, batch.batch)
        pred = out.argmax(dim=1)
        correct += (pred == batch.y).sum().item()
        total += batch.y.size(0)
    return correct / total

accuracy = test()
print(f'Graph Classification Accuracy: {accuracy:.4f}')

Graph Classification Accuracy: 0.7713


### **Summary of Graph Classification Pipeline**
- **MUTAG Dataset**: Molecular graphs labeled as mutagenic/non-mutagenic.
- **GNN Model**: Uses **GCN layers** + **global mean pooling**.
- **Training**: Optimized using **Adam**, trained for 100 epochs.
- **Evaluation**: Computes **classification accuracy**.