# Supervised Learning with GraphSAGE

Graph neural networks (GNNs) combines superiority of both graph analytics and machine learning. 
GraphScope provides the capability to process learning tasks. In this tutorial, we demonstrate 
how GraphScope trains a supervised GraphSAGE model.

The learning task is node classification on a citation network. In this task, the algorithm has 
to determine the label of the nodes in [Cora](https://linqs.soe.ucsc.edu/data) dataset. 
The dataset consists of academic publications as the nodes and the citations between them as the links: if publication A cites publication B, then the graph has an edge from A to B. The nodes are classified into one of seven subjects, and our model will learn to predict this subject.

This tutorial has the following steps:

- Launching learning engine and attaching the loaded graph.
- Defining train process with builtin GraphSAGE model and config hyperparameters
- Training and evaluating


In [None]:
# Install graphscope package if you are NOT in the Playground

!pip3 install graphscope

In [None]:
# Import the graphscope module.

import graphscope

graphscope.set_option(show_log=False)  # enable logging

In [None]:
# Load cora dataset

from graphscope.dataset import load_cora

graph = load_cora()


Then, we need to define a feature list for training. The training feature list should be seleted from the vertex properties. In this case, we choose all the properties prefix with "feat_" as the training features.

With the featrue list, next we launch a learning engine with the [graphlearn](https://graphscope.io/docs/reference/session.html#graphscope.Session.graphlearn) method of graphscope. 

In this case,  we specify the model training over "paper" nodes and "cites" edges.

With "gen_labels", we split the "paper" nodes into three parts, 75% are used as training set, 10% are used for validation and 15% used for testing.


In [None]:
# define the features for learning
paper_features = []
for i in range(1433):
    paper_features.append("feat_" + str(i))

# launch a learning engine.
lg = graphscope.graphlearn(
    graph,
    nodes=[("paper", paper_features)],
    edges=[("paper", "cites", "paper")],
    gen_labels=[
        ("train", "paper", 100, (0, 75)),
        ("val", "paper", 100, (75, 85)),
        ("test", "paper", 100, (85, 100)),
    ],
)

We use the builtin GraphSAGE model to define the training process, and use tensorflow as "NN" backend trainer.

In [None]:
try:
  # https://www.tensorflow.org/guide/migrate
  import tensorflow.compat.v1 as tf
  tf.disable_v2_behavior()
except ImportError:
  import tensorflow as tf

import argparse
import graphscope.learning as gl
import graphscope.learning.graphlearn.python.nn.tf as tfg
from graphscope.learning.examples import EgoGraphSAGE
from graphscope.learning.examples import EgoSAGESupervisedDataLoader
from graphscope.learning.examples.tf.trainer import LocalTrainer

def parse_args():
  argparser = argparse.ArgumentParser("Train EgoSAGE Supervised.")
  argparser.add_argument('--class_num', type=int, default=7)
  argparser.add_argument('--features_num', type=int, default=1433)
  argparser.add_argument('--train_batch_size', type=int, default=140)
  argparser.add_argument('--val_batch_size', type=int, default=300)
  argparser.add_argument('--test_batch_size', type=int, default=1000)
  argparser.add_argument('--hidden_dim', type=int, default=128)
  argparser.add_argument('--in_drop_rate', type=float, default=0.5)
  argparser.add_argument('--hops_num', type=int, default=2)
  argparser.add_argument('--nbrs_num', type=list, default=[25, 10])
  argparser.add_argument('--agg_type', type=str, default="gcn")
  argparser.add_argument('--learning_algo', type=str, default="adam")
  argparser.add_argument('--learning_rate', type=float, default=0.05)
  argparser.add_argument('--weight_decay', type=float, default=0.0005)
  argparser.add_argument('--epochs', type=int, default=40)
  argparser.add_argument('--node_type', type=str, default='paper')
  argparser.add_argument('--edge_type', type=str, default='cites')
  return argparser.parse_args()
args = parse_args()

def supervised_loss(logits, labels):
  loss = tf.nn.sparse_softmax_cross_entropy_with_logits(
      labels=labels, logits=logits)
  return tf.reduce_mean(loss)

def accuracy(logits, labels):
  indices = tf.math.argmax(logits, 1, output_type=tf.int32)
  correct = tf.reduce_sum(tf.cast(tf.math.equal(indices, labels), tf.float32))
  return correct / tf.cast(tf.shape(labels)[0], tf.float32)

# Define Model
dims = [args.features_num] + [args.hidden_dim] * (args.hops_num - 1) \
    + [args.class_num]
model = EgoGraphSAGE(dims,
                    agg_type=args.agg_type,
                    act_func=tf.nn.relu,
                    dropout=args.in_drop_rate)

# prepare train dataset
train_data = EgoSAGESupervisedDataLoader(lg, gl.Mask.TRAIN, 'random', args.train_batch_size,
                                        node_type=args.node_type, edge_type=args.edge_type,
                                        nbrs_num=args.nbrs_num, hops_num=args.hops_num)
train_embedding = model.forward(train_data.src_ego)
loss = supervised_loss(train_embedding, train_data.src_ego.src.labels)
optimizer = tf.train.AdamOptimizer(learning_rate=args.learning_rate)

# prepare test dataset
test_data = EgoSAGESupervisedDataLoader(lg, gl.Mask.TEST, 'random', args.test_batch_size,
                                        node_type=args.node_type, edge_type=args.edge_type,
                                        nbrs_num=args.nbrs_num, hops_num=args.hops_num)
test_embedding = model.forward(test_data.src_ego)
test_acc = accuracy(test_embedding, test_data.src_ego.src.labels)

Now we can start the training and testing process:

In [None]:
# train and test
trainer = LocalTrainer()
trainer.train(train_data.iterator, loss, optimizer, epochs=args.epochs)
trainer.test(test_data.iterator, test_acc)