## $ \text{Load and Process Graph Data} $

[Zachery's Karate Club network](https://en.wikipedia.org/wiki/Zachary%27s_karate_club) 데이터를 사용합니다.

* `node.csv`는 모든 클럽의 멤버들과 멤버들의 속성을 담고 있습니다.
* `edge.csv`는 멤버 간의 `interaction` 즉, 상호작용을 담고 있습니다.

In [53]:
import pandas as pd 
from IPython.display import Image 

nodes_data = pd.read_csv('data/nodes.csv')
nodes_data.head() # attribute of nodes(members)

Unnamed: 0,Id,Club,Age
0,0,Mr. Hi,45
1,1,Mr. Hi,33
2,2,Mr. Hi,36
3,3,Mr. Hi,31
4,4,Mr. Hi,41


In [6]:
edges_data = pd.read_csv('data/edges.csv')
edges_data.head() # weight matrix 

Unnamed: 0,Src,Dst,Weight
0,0,1,0.318451
1,0,2,0.551215
2,0,3,0.227416
3,0,4,0.266919
4,0,5,0.475449


DGL을 사용하기 위해서는, `dgl.graph`를 통해 첫 번째 행이 첫 번째 노드가 해당하는 식으로 계속되도록 설정해주어야 합니다. 

이때 `src`는 source 즉, 시작 노드를 의미하고, `dst`는 destination 즉, 도착 노드를 의미합니다.

In [15]:
import dgl 

src = edges_data['Src'].values
dst = edges_data['Dst'].values

In [17]:
g = dgl.graph((src, dst))
g

Graph(num_nodes=34, num_edges=156,
      ndata_schemes={}
      edata_schemes={})

![figure_1](asset/figure_1.PNG)

In [None]:
import networkx as nx
# Since the actual graph is undirected, we convert it for visualization
# purpose.
nx_g = g.to_networkx().to_undirected()
# Kamada-Kawaii layout usually looks pretty for arbitrary graphs
pos = nx.kamada_kawai_layout(nx_g)
nx.draw(nx_g, pos, with_labels=True, node_color=[[.7, .7, .7]])

In [58]:
print('#Nodes', g.number_of_nodes())
print('#Edges', g.number_of_edges())

#Nodes 34
#Edges 156


In [61]:
# get the in-degree of node 0:
g.in_degrees(0)

16

In [63]:
# `successors`:다음, 즉 이웃을 의미함.
g.successors(0)

tensor([ 1,  2,  3,  4,  5,  6,  7,  8, 10, 11, 12, 13, 17, 19, 21, 31])

### $ \text{Load node and edge features} $

node와 edge에는 다양한 속성들이 존재할 수 있습니다(categorical, contents etc..).

DGL은 tensor를 입력으로 사용하기 때문에 graph 속성들을 tensor(with numerical contents)로 변환하는 작업이 필요합니다. 

* categorical attributes (gender, occupation)
* variable length string contents (news article, quote)
* images
  
following attribute columns:
* `Age`: integer attribute
* `Club`: categorical attribute representing which community each member belong to.
* `Weight`: floating number indicating the strength of each intercation.

In [66]:
import torch 
import torch.nn.functional as F 

age = torch.tensor(nodes_data['Age'].values, dtype=torch.float) / 100
print(age)

tensor([0.4500, 0.3300, 0.3600, 0.3100, 0.4100, 0.4200, 0.4800, 0.4100, 0.3000,
        0.3500, 0.3800, 0.4400, 0.3700, 0.3900, 0.3600, 0.3800, 0.4700, 0.4500,
        0.4100, 0.3100, 0.3100, 0.4400, 0.4200, 0.3200, 0.3000, 0.5000, 0.3000,
        0.4300, 0.4800, 0.4000, 0.3900, 0.4500, 0.4700, 0.3300])


In [69]:
g.ndata['age'] = age 
print(g)

Graph(num_nodes=34, num_edges=156,
      ndata_schemes={'age': Scheme(shape=(), dtype=torch.float32)}
      edata_schemes={})


In [80]:
club = nodes_data['Club'].to_list()

# 'Officer' == 1, 'Mr. Hi' == 0
club = torch.tensor([c == 'Officer' for c in club]).long()

# one-hot encoding 
club_onehot = F.one_hot(club)


In [81]:
g.ndata.update({'club':club, 'club_onehot':club_onehot})

In [83]:
edge_weight = torch.tensor(edges_data['Weight'].values)
g.edata['weight'] = edge_weight 
g

Graph(num_nodes=34, num_edges=156,
      ndata_schemes={'age': Scheme(shape=(), dtype=torch.float32), 'club': Scheme(shape=(), dtype=torch.int64), 'club_onehot': Scheme(shape=(2,), dtype=torch.int64)}
      edata_schemes={'weight': Scheme(shape=(), dtype=torch.float64)})