<a href="https://colab.research.google.com/github/GEOM-HWH/Studying-GNN/blob/main/ML_Study_Graph_Classification_Exercise_220816_%EC%88%98%EC%A0%95%EB%B3%B8.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
import os
import torch

os.environ['TORCH'] = torch.__version__
print(torch.__version__)

1.12.1+cu113


In [2]:
!pip install -q torch-scatter -f https://data.pyg.org/whl/torch-${TORCH}.html
!pip install -q torch-sparse -f https://data.pyg.org/whl/torch-${TORCH}.html
!pip install -q git+https://github.com/pyg-team/pytorch_geometric.git

[K     |████████████████████████████████| 7.9 MB 3.3 MB/s 
[K     |████████████████████████████████| 3.5 MB 3.0 MB/s 
[?25h  Building wheel for torch-geometric (setup.py) ... [?25l[?25hdone


In [3]:
from torch_geometric.datasets import TUDataset

dataset = TUDataset(root='data/TUDataset', name='MUTAG')

Downloading https://www.chrsmrrs.com/graphkerneldatasets/MUTAG.zip
Extracting data/TUDataset/MUTAG/MUTAG.zip
Processing...
Done!


In [4]:
torch.manual_seed(12345)
dataset = dataset.shuffle()

train_dataset = dataset[:150]
test_dataset = dataset[150:]

print(f'Number of training graphs:{len(train_dataset)}')
print(f'Number of test graphs:{len(test_dataset)}')

Number of training graphs:150
Number of test graphs:38


In [5]:
from torch_geometric.loader import DataLoader

train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=64, shuffle=False)

for step, data in enumerate(train_loader):
  print(f'Step {step+1}:')
  print('=======')
  print(f'Number of graphs in the current batch: {data.num_graphs}')
  print(data)
  print()

Step 1:
Number of graphs in the current batch: 64
DataBatch(edge_index=[2, 2636], x=[1188, 7], edge_attr=[2636, 4], y=[64], batch=[1188], ptr=[65])

Step 2:
Number of graphs in the current batch: 64
DataBatch(edge_index=[2, 2506], x=[1139, 7], edge_attr=[2506, 4], y=[64], batch=[1139], ptr=[65])

Step 3:
Number of graphs in the current batch: 22
DataBatch(edge_index=[2, 852], x=[387, 7], edge_attr=[852, 4], y=[22], batch=[387], ptr=[23])



GCNConv 대신 GraphConv를 사용

In [6]:
from torch.nn import Linear
import torch.nn.functional as F
from torch_geometric.nn import GraphConv
from torch_geometric.nn import global_mean_pool


class GNN(torch.nn.Module):
    def __init__(self, hidden_channels):
        super(GNN, self).__init__()
        torch.manual_seed(12345)
        self.conv1 = GraphConv(dataset.num_node_features, hidden_channels)
        self.conv2 = GraphConv(hidden_channels, hidden_channels)
        self.conv3 = GraphConv(hidden_channels, hidden_channels)
        self.lin = Linear(hidden_channels, dataset.num_classes)

    def forward(self, x, edge_index, batch):
        x = self.conv1(x, edge_index)
        x = x.relu()
        x = self.conv2(x, edge_index)
        x = x.relu()
        x = self.conv3(x, edge_index)

        x = global_mean_pool(x, batch)

        x = F.dropout(x, p=0.5, training=self.training)
        x = self.lin(x)
        
        return x

model = GNN(hidden_channels=64)
print(model)

GNN(
  (conv1): GraphConv(7, 64)
  (conv2): GraphConv(64, 64)
  (conv3): GraphConv(64, 64)
  (lin): Linear(in_features=64, out_features=2, bias=True)
)


In [7]:
from IPython.display import Javascript
display(Javascript('''google.colab.output.setIframeHeight(0,true,{maxHeight:300})'''))

model = GNN(hidden_channels=64)
optimizer = torch.optim.Adam(model.parameters(), lr=0.01)
criterion = torch.nn.CrossEntropyLoss()

def train():
  model.train()

  for data in train_loader:
    out = model(data.x, data.edge_index, data.batch)
    loss = criterion(out, data.y)
    loss.backward()
    optimizer.step()
    optimizer.zero_grad()

def test(loader):
  model.eval()

  correct = 0
  for data in loader:
    out = model(data.x, data.edge_index, data.batch)
    pred = out.argmax(dim=1)
    correct += int((pred == data.y).sum())
  return correct/len(loader.dataset)

for epoch in range(1,171):
  train()
  train_acc = test(train_loader)
  test_acc = test(test_loader)
  print(f'Epoch: {epoch:03d}, Train Acc:{train_acc:.4f}, Test Acc: {test_acc:.4f}')

<IPython.core.display.Javascript object>

Epoch: 001, Train Acc:0.7333, Test Acc: 0.7895
Epoch: 002, Train Acc:0.6467, Test Acc: 0.7368
Epoch: 003, Train Acc:0.6467, Test Acc: 0.7368
Epoch: 004, Train Acc:0.6467, Test Acc: 0.7368
Epoch: 005, Train Acc:0.6467, Test Acc: 0.7368
Epoch: 006, Train Acc:0.6533, Test Acc: 0.7368
Epoch: 007, Train Acc:0.7333, Test Acc: 0.8158
Epoch: 008, Train Acc:0.7267, Test Acc: 0.8158
Epoch: 009, Train Acc:0.7867, Test Acc: 0.8421
Epoch: 010, Train Acc:0.7733, Test Acc: 0.8158
Epoch: 011, Train Acc:0.7733, Test Acc: 0.7895
Epoch: 012, Train Acc:0.7933, Test Acc: 0.8421
Epoch: 013, Train Acc:0.7733, Test Acc: 0.8421
Epoch: 014, Train Acc:0.7733, Test Acc: 0.7895
Epoch: 015, Train Acc:0.7933, Test Acc: 0.8421
Epoch: 016, Train Acc:0.7667, Test Acc: 0.7632
Epoch: 017, Train Acc:0.7933, Test Acc: 0.8421
Epoch: 018, Train Acc:0.7867, Test Acc: 0.7895
Epoch: 019, Train Acc:0.7867, Test Acc: 0.7895
Epoch: 020, Train Acc:0.8133, Test Acc: 0.8421
Epoch: 021, Train Acc:0.8000, Test Acc: 0.7632
Epoch: 022, T

Test accuracy가 76%대에서 81%대로 상승한 것을 확인가능