# DGL 101: Use DGL to implement a simple node classification with Karat Club data

Almost every computer 101 class starts with a "Hello World" example. Like MNIST for deep learning, in graph study domain we have the Zachary's Karate Club problem. The karate club is a social network that includes 34 members and documents pairwise links between members who interact outside the club. The club later divides into two communities led by the instructor and the club president. The network is visualized as follows with the color indicating the community.
<img src='./images/karat_club.png' align='center' width="400px" height="300px" />
The club is used as a typical node classification task, which purely leverage graph structure information. In this tutorial, we will use Graph Convolutional Network, a basic Graph Neural Network, to do node classification.

You will learn:
- How to define a graph, adding nodes and edges;
- How to setup features and labels for nodes;
- How to define a GCN model using DGL's building modules;
- How to train the GCN model, and
- How to check the results

Notice: this tutorial is using PyTorch as backend. You can find MXNet version GCN in <a href='https://github.com/dmlc/dgl/blob/master/examples/mxnet/gcn/gcn.py'>here</a> and TensorFlow version in <a href='https://github.com/dmlc/dgl/blob/master/examples/tensorflow/gcn/gcn.py'>here</a>. And more examples could be found in our <a href="https://github.com/dmlc/dgl/examples">github link</a>.
<!--
### 使用DGL的实现简单的节点分类 
-->

In [10]:
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch import optim
import dgl
from dgl.nn.pytorch import GraphConv
import networkx as nx
import pandas as pd

In [3]:
# five steps of training 
# ----------- 1. data loader --------------- #
# first create the graph
g = dgl.DGLGraph()
g.add_nodes(34)
# second add edges
edge_list = [(1, 0), (2, 0), (2, 1), (3, 0), (3, 1), (3, 2),
        (4, 0), (5, 0), (6, 0), (6, 4), (6, 5), (7, 0), (7, 1),
        (7, 2), (7, 3), (8, 0), (8, 2), (9, 2), (10, 0), (10, 4),
        (10, 5), (11, 0), (12, 0), (12, 3), (13, 0), (13, 1), (13, 2),
        (13, 3), (16, 5), (16, 6), (17, 0), (17, 1), (19, 0), (19, 1),
        (21, 0), (21, 1), (25, 23), (25, 24), (27, 2), (27, 23),
        (27, 24), (28, 2), (29, 23), (29, 26), (30, 1), (30, 8),
        (31, 0), (31, 24), (31, 25), (31, 28), (32, 2), (32, 8),
        (32, 14), (32, 15), (32, 18), (32, 20), (32, 22), (32, 23),
        (32, 29), (32, 30), (32, 31), (33, 8), (33, 9), (33, 13),
        (33, 14), (33, 15), (33, 18), (33, 19), (33, 20), (33, 22),
        (33, 23), (33, 26), (33, 27), (33, 28), (33, 29), (33, 30),
        (33, 31), (33, 32)]
src, dst = tuple(zip(*edge_list))
g.add_edges(src, dst)
g.add_edges(dst, src)

In [3]:
# third add some features to nodes
g.ndata['feats'] = torch.eye(34)
labeled_nodes = torch.tensor([0, 33])
labeled_labels = torch.tensor([0, 1])
# fourth create initial inputs
inputs = torch.eye(34)

In [4]:
# ----------- 2. create model -------------- #
# build a two layer GCN
class GCN(nn.Module):
    def __init__(self, in_feats, h_feats, num_classes):
        super(GCN, self).__init__()
#         if activation == None:
#             activation = F.relu
        self.gcn_layer1 = GraphConv(in_feats, h_feats)
        self.gcn_layer2 = GraphConv(h_feats, num_classes)
    
    def forward(self, g, in_feat):
        h = self.gcn_layer1(g, in_feat)
        h = F.relu(h)
        h = self.gcn_layer2(g, h)
        return h
    
# create a GCN with given dimensions 
net = GCN(34, 16, 2)

In [None]:
# ----------- 3. set up loss and optimizer -------------- #
# in this case, loss will in training loop
optimizer = optim.Adam(net.parameters(), lr=0.01)

# ----------- 4. traing ------------- #
for e in range(40):
    logits = net(g, inputs)
    
    # compute loss
    logp = F.log_softmax(logits, 1)
    loss = F.nll_loss(logp[labeled_nodes], labeled_labels)
    
    # backpropagation
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()
    
    print('In epoch {}, loss: {}'.format(e, loss))

In epoch 0, loss: 0.7104997634887695
In epoch 1, loss: 0.6838003396987915
In epoch 2, loss: 0.6629501581192017
In epoch 3, loss: 0.6447422504425049
In epoch 4, loss: 0.627848744392395
In epoch 5, loss: 0.6115652322769165
In epoch 6, loss: 0.5942103266716003
In epoch 7, loss: 0.5763446092605591
In epoch 8, loss: 0.5582635402679443
In epoch 9, loss: 0.5393538475036621
In epoch 10, loss: 0.5194204449653625
In epoch 11, loss: 0.49799326062202454
In epoch 12, loss: 0.4752960205078125
In epoch 13, loss: 0.451964795589447
In epoch 14, loss: 0.428019642829895
In epoch 15, loss: 0.40288984775543213
In epoch 16, loss: 0.3768406808376312
In epoch 17, loss: 0.35059037804603577
In epoch 18, loss: 0.3250768184661865
In epoch 19, loss: 0.29963386058807373
In epoch 20, loss: 0.2741641402244568
In epoch 21, loss: 0.24918541312217712
In epoch 22, loss: 0.22496524453163147
In epoch 23, loss: 0.20184579491615295
In epoch 24, loss: 0.17940904200077057
In epoch 25, loss: 0.1583772599697113
In epoch 26, loss

## Basic operations on DGL graph

1. Generate graphs in different ways and save
2. Explore graph structures and different types of graphs
3. Assign features to nodes/edges
4. Message passing function and Reduce(Aggregation) function

In [12]:
# save data with networkx help
nx_g = g.to_networkx()
nx.write_edgelist(nx_g, 'karat_club.txt', delimiter=',', data=False)

# read edge list data into DGL graph data
edgelist = pd.read_csv('karat_club.txt')

In [15]:
edgelist.tail()

Unnamed: 0,0,1
150,33,28
151,33,29
152,33,30
153,33,31
154,33,32


## Take home exercise

Print out each club member’s feature during training