#### How does DGL represent a graph
* You will learn:
   1. Construct a graph in DGL fron sratch
   2. Assign node and edge features to a graph
   3. Query properties of a DGL graph such as nodes
   4. Transform a DGL graph into another graph
   5. Load and save DGL graphs 
* 学会DGL里面是如何构建一个图的，也就是对DGL里面图的存储结构和属性学习

#### DGL Graph Construction
1. DGL represents a directed graph as a **DGLGraph object**. By specifying **the number of nodes** in the graph as well as **the list of source and destination nodes**.
2. DGL将一个有向图表示成一个DGLGraph对象，通过指定图中定点数和源节点、目标节点列表来构架图

In [5]:
import dgl
import numpy as np
import torch

g = dgl.graph(([0, 0, 0, 0, 0], [1, 2, 3, 4, 5]), num_nodes=6)
# 同样我们也可以用torch.LongTensor来作为节点列表的输入
g = dgl.graph((torch.LongTensor([0,0,0,0,0]), torch.LongTensor([1, 2, 3, 4,5])), num_nodes=6)
# 当然在给定了source nodes list和 destination nodes list情况下，也可以不指定num_nodes
g = dgl.graph(([0,0,0,0,0], [1,2,3,4,5]))

* mark一下，这里是通过输入subjects list 和 objects list两个列表来构建图对象的，这里和之前自己复现的那篇CompGCN的源码里面对图数据的最终处理方式类似，或者说这是图数据处理的常用方式吧
* 在那篇paper的源码里面数据处理最终会得到一个 edge_inedx和edge_type，也就是边序列和边类型，二者index一一对应，edge_index是一个(num_edges * 2)的tensor，这个tensor的第0维是所有边的入点，第1维是所有边的出点，其实也就是subjects和objects

In [6]:
g

Graph(num_nodes=6, num_edges=5,
      ndata_schemes={}
      edata_schemes={})

In [8]:
print(g.edges())

(tensor([0, 0, 0, 0, 0]), tensor([1, 2, 3, 4, 5]))


* Attention
* If you want to **handle undirected graphs**, you may consider **treating it as a bidirectional graph**, and you can see **Graph Transformations** for an example of making a bidirectuibak graph

#### Assigning Node and Edge Features to Graph
* 为我们创建的图添加特征属性
* DGLGraph only accepts **attributes stored in tensors(with numerical contents)**. Consenquently, an attribute of **all the nodes or edges must have the same shape**. In the context of DL, those attributes are offen called features.
* 图的属性必须用tensors类型数据进行赋值，节点和边的特征维度必须相同

In [10]:
# Assign a 3-dim node feature vector for each node
g.ndata['x'] = torch.randn(6, 3)
# Assign a 4-dim edge feature vector for each edge
g.edata['a'] = torch.randn(5, 4)
# Assign a 5*4 node feature matrix for each node, node and edge features in DGL can be multi-dim
g.ndata['y'] = torch.randn(6, 5, 4)

In [15]:
print(g.edata['a'])

tensor([[ 0.3728, -0.6051, -0.1773,  0.3692],
        [-1.9710, -1.6117,  0.4212, -1.0328],
        [ 2.4561, -2.6404,  0.5772,  0.3688],
        [ 0.7520,  0.1186,  0.1175,  0.0425],
        [ 0.6102, -0.4714,  1.3186,  0.3512]])


* Attention
* There are many ways to encode various types of attributes into numerical features in DeepLearning.
   1. For **categorical attributes**(e.g. gender, occupation), consider converting them to **integers or one-hot encoding**.
   2. For **variable length string contents**(e.g. news, article, quote), consider applying **a language model**.
   3. For images, consider applying a **vision model such CNNs**.

#### Querying Graph Structures

In [17]:
g

Graph(num_nodes=6, num_edges=5,
      ndata_schemes={'x': Scheme(shape=(3,), dtype=torch.float32), 'y': Scheme(shape=(5, 4), dtype=torch.float32)}
      edata_schemes={'a': Scheme(shape=(4,), dtype=torch.float32)})

In [21]:
print(g.num_nodes())
print(g.num_edges())
print(g.out_degree(0))

6
5
5


#### Graph Transformations
* DGL provides many APIs to transform a graph to another such as **extracting a subgraph**.

In [30]:
print(g.num_nodes)
# Induce a subgraph from node 0,1,3 from the original grpah
sg1 = g.subgraph([0, 1, 3])
# Induce a subgrah from edge 0,1,3 from the original graph
sg2 = g.edge_subgraph([0, 1, 3])

<bound method DGLHeteroGraph.num_nodes of Graph(num_nodes=6, num_edges=5,
      ndata_schemes={'x': Scheme(shape=(3,), dtype=torch.float32), 'y': Scheme(shape=(5, 4), dtype=torch.float32)}
      edata_schemes={'a': Scheme(shape=(4,), dtype=torch.float32)})>


#### Loading and Saving Graphs
* You can save a graph or a list of graphs via dgl.save_graphs and load them back with dgl.load_graphs

In [31]:
dgl.save_graphs('graph.dgl', g)
dgl.save_graphs('graphs.dgl', [g, sg1, sg2])


In [32]:
# Load graphs
(g,), _ = dgl.load_graphs('graph.dgl')
print(g)
(g, sg1, sg2), _ = dgl.load_graphs('graphs.dgl')
print(g)
print(sg1)
print(sg2)

Graph(num_nodes=6, num_edges=5,
      ndata_schemes={'y': Scheme(shape=(5, 4), dtype=torch.float32), 'x': Scheme(shape=(3,), dtype=torch.float32)}
      edata_schemes={'a': Scheme(shape=(4,), dtype=torch.float32)})
Graph(num_nodes=6, num_edges=5,
      ndata_schemes={'y': Scheme(shape=(5, 4), dtype=torch.float32), 'x': Scheme(shape=(3,), dtype=torch.float32)}
      edata_schemes={'a': Scheme(shape=(4,), dtype=torch.float32)})
Graph(num_nodes=3, num_edges=2,
      ndata_schemes={'_ID': Scheme(shape=(), dtype=torch.int64), 'x': Scheme(shape=(3,), dtype=torch.float32), 'y': Scheme(shape=(5, 4), dtype=torch.float32)}
      edata_schemes={'_ID': Scheme(shape=(), dtype=torch.int64), 'a': Scheme(shape=(4,), dtype=torch.float32)})
Graph(num_nodes=4, num_edges=3,
      ndata_schemes={'_ID': Scheme(shape=(), dtype=torch.int64), 'x': Scheme(shape=(3,), dtype=torch.float32), 'y': Scheme(shape=(5, 4), dtype=torch.float32)}
      edata_schemes={'_ID': Scheme(shape=(), dtype=torch.int64), 'a': Scheme