# Unsupervised Graph Learning with GraphSage


GraphScope provides the capability to process learning tasks. In this tutorial, we demonstrate how GraphScope trains a model with GraphSage.

The task is link prediction, which estimates the probability of links between nodes in a graph.

In this task, we use our implementation of GraphSAGE algorithm to build a model that predicts protein-protein links in the [PPI](https://humgenomics.biomedcentral.com/articles/10.1186/1479-7364-3-3-291) dataset. In which every node represents a protein. The task can be treated as a unsupervised link prediction on a homogeneous link network.

In this task, GraphSage algorithm would compress both structural and attribute information in the graph into low-dimensional embedding vectors on each node. These embeddings can be further used to predict links between nodes.

This tutorial has following steps:
- Launching the learning engine and attaching to loaded graph.
- Defining train process with builtin GraphSage model and hyper-parameters
- Training and evaluating


In [None]:
# Install graphscope package if you are NOT in the Playground

!pip3 install graphscope

In [None]:
# Import the graphscope module.

import graphscope

graphscope.set_option(show_log=False)  # enable logging

In [None]:
# Load ppi dataset

from graphscope.dataset import load_ppi

graph = load_ppi()

## Launch learning engine 
Then, we need to define a feature list for training. The training feature list should be selected from the vertex properties. In this case, we choose all the properties prefix with "feat-" as the training features.

With the feature list, next we launch a learning engine with the [graphlearn](https://graphscope.io/docs/reference/session.html#graphscope.Session.graphlearn) method of graphscope.

In this case, we specify the GraphSAGE training over "protein" nodes and "link" edges.

With gen_labels, we take all the protein nodes as training set.

In [None]:
# define the features for learning
protein_features = []
for i in range(50):
    protein_features.append("feat-" + str(i))

# launch a learning engine.
lg = graphscope.graphlearn(
    graph,
    nodes=[("protein", protein_features)],
    edges=[("protein", "link", "protein")],
    gen_labels=[
        ("train", "protein", 100, (0, 100)),
    ],
)


We use the builtin GraphSAGE model to define the training process.

In the example, we use tensorflow as "NN" backend trainer.

In [None]:
try:
  # https://www.tensorflow.org/guide/migrate
  import tensorflow.compat.v1 as tf
  tf.disable_v2_behavior()
except ImportError:
  import tensorflow as tf

import argparse
import graphscope.learning.graphlearn.python.nn.tf as tfg
from graphscope.learning.examples import EgoGraphSAGE
from graphscope.learning.examples import EgoSAGEUnsupervisedDataLoader
from graphscope.learning.examples.tf.trainer import LocalTrainer

def parse_args():
  argparser = argparse.ArgumentParser("Train EgoSAGE Unsupervised.")
  argparser.add_argument('--batch_size', type=int, default=512)
  argparser.add_argument('--features_num', type=int, default=50)
  argparser.add_argument('--hidden_dim', type=int, default=128)
  argparser.add_argument('--output_dim', type=int, default=128)
  argparser.add_argument('--nbrs_num', type=list, default=[5, 5])
  argparser.add_argument('--learning_rate', type=float, default=0.01)
  argparser.add_argument('--epochs', type=int, default=2)
  argparser.add_argument('--drop_out', type=float, default=0.0)
  argparser.add_argument('--temperature', type=float, default=0.07)
  argparser.add_argument('--node_type', type=str, default="protein")
  argparser.add_argument('--edge_type', type=str, default="link")
  return argparser.parse_args()
args = parse_args()

# Define Model
dims = [args.features_num] + [args.hidden_dim] * (len(args.nbrs_num) - 1) + [args.output_dim]
model = EgoGraphSAGE(dims, dropout=args.drop_out)

# Prepare train dataset
train_data = EgoSAGEUnsupervisedDataLoader(lg, None, batch_size=args.batch_size,
    node_type=args.node_type, edge_type=args.edge_type, nbrs_num=args.nbrs_num)
src_emb = model.forward(train_data.src_ego)
dst_emb = model.forward(train_data.dst_ego)
neg_dst_emb = model.forward(train_data.neg_dst_ego)
loss = tfg.unsupervised_softmax_cross_entropy_loss(
    src_emb, dst_emb, neg_dst_emb, temperature=args.temperature)
optimizer = tf.train.AdamOptimizer(learning_rate=args.learning_rate)

## Run training process

After define training process and hyperparameters, we can start training.

In [None]:
trainer = LocalTrainer()
trainer.train(train_data.iterator, loss, optimizer, epochs=args.epochs)