# **Graph Learning with Cora Dataset**
This notebook demonstrates **node classification, edge classification, and graph classification** using the **Cora dataset**, a well-known citation network.

## **1. Load the Cora Dataset** 📂
The **Cora dataset** consists of scientific papers categorized into different subjects. Papers are nodes, and citations between papers form edges.
- **Nodes** represent research papers.
- **Edges** represent citations between papers.
- **Node features** are bag-of-words representations.
- **Node labels** indicate the paper category (7 classes).

In [1]:
import os
import os.path as osp
import torch
import torch.nn.functional as F
from torch_geometric.datasets import Planetoid
import torch_geometric.transforms as T

# Load the Cora dataset
path = osp.join(os.getcwd(), 'data', 'Cora')
dataset = Planetoid(path, 'Cora', transform=T.NormalizeFeatures())
data = dataset[0]

print(f'Dataset: {dataset}')
print(f'Number of Nodes: {data.num_nodes}')
print(f'Number of Edges: {data.num_edges}')
print(f'Node Feature Dimension: {data.num_node_features}')
print(f'Number of Classes: {dataset.num_classes}')



Dataset: Cora()
Number of Nodes: 2708
Number of Edges: 10556
Node Feature Dimension: 1433
Number of Classes: 7


## **2. Node Classification with Graph Convolutional Networks (GCN)** 🟢
**Goal**: Predict the label of a node using a GCN.

In [4]:
from torch_geometric.nn import GCNConv

class GCN(torch.nn.Module):
    def __init__(self, in_channels, hidden_channels, out_channels):
        super(GCN, self).__init__()
        self.conv1 = GCNConv(in_channels, hidden_channels)
        self.conv2 = GCNConv(hidden_channels, out_channels)

    def encode(self, x, edge_index):
        """Return the learned node embeddings."""
        x = self.conv1(x, edge_index).relu()
        return self.conv2(x, edge_index)  # Final node embeddings

    def forward(self, x, edge_index):
        x = self.encode(x, edge_index)
        return F.log_softmax(x, dim=1)


# Train the GCN model
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = GCN(dataset.num_features, 16, dataset.num_classes).to(device)
data = data.to(device)
optimizer = torch.optim.Adam(model.parameters(), lr=0.01, weight_decay=5e-4)

def train():
    model.train()
    optimizer.zero_grad()
    out = model(data.x, data.edge_index)
    loss = F.nll_loss(out[data.train_mask], data.y[data.train_mask])
    loss.backward()
    optimizer.step()
    return loss.item()

# Training loop
for epoch in range(200):
    loss = train()
    if epoch % 20 == 0:
        print(f'Epoch {epoch}, Loss: {loss:.4f}')

# Evaluate
model.eval()
pred = model(data.x, data.edge_index).argmax(dim=1)
correct = (pred[data.test_mask] == data.y[data.test_mask]).sum()
accuracy = int(correct) / int(data.test_mask.sum())
print(f'Node Classification Accuracy: {accuracy:.4f}')

Epoch 0, Loss: 1.9454
Epoch 20, Loss: 1.6305
Epoch 40, Loss: 1.1211
Epoch 60, Loss: 0.7123
Epoch 80, Loss: 0.5006
Epoch 100, Loss: 0.3905
Epoch 120, Loss: 0.3252
Epoch 140, Loss: 0.2822
Epoch 160, Loss: 0.2519
Epoch 180, Loss: 0.2293
Node Classification Accuracy: 0.8050


## **3. Edge Classification (Link Prediction)** 🔗
**Goal**: Predict whether an edge (citation) exists between two nodes.

In [6]:
from torch_geometric.utils import negative_sampling

# Generate positive and negative edges
pos_edge_index = data.edge_index
neg_edge_index = negative_sampling(pos_edge_index, num_neg_samples=pos_edge_index.size(1))

# Train a model using edge features
def edge_predictor(z, edge_index):
    return (z[edge_index[0]] * z[edge_index[1]]).sum(dim=1)

z = model.encode(data.x, data.edge_index)
pos_pred = edge_predictor(z, pos_edge_index)
neg_pred = edge_predictor(z, neg_edge_index)

edge_acc = ((pos_pred > 0).sum() + (neg_pred < 0).sum()) / (pos_pred.size(0) + neg_pred.size(0))
print(f'Edge Classification Accuracy: {edge_acc:.4f}')

Edge Classification Accuracy: 0.7474


## **4. Graph Classification** 📊
**Goal**: Predict the category of a graph.
Since Cora is a single graph, we typically use datasets with multiple graphs. However, we can create synthetic subgraphs from Cora.

In [7]:
from torch_geometric.nn import global_mean_pool

class GraphClassifier(torch.nn.Module):
    def __init__(self, in_channels, hidden_channels, out_channels):
        super(GraphClassifier, self).__init__()
        self.conv1 = GCNConv(in_channels, hidden_channels)
        self.conv2 = GCNConv(hidden_channels, out_channels)

    def forward(self, x, edge_index, batch):
        x = self.conv1(x, edge_index).relu()
        x = self.conv2(x, edge_index)
        return global_mean_pool(x, batch)

model = GraphClassifier(dataset.num_features, 16, dataset.num_classes)
print(model)

GraphClassifier(
  (conv1): GCNConv(1433, 16)
  (conv2): GCNConv(16, 7)
)
