# Graph Neural Network
## Notebook 3

In this notebook, we will define, train, and test a Graph Neural Network to predict sale prices of NFTs.

## Connect to TigerGraph Database

The code block below connects to a TigerGraph database. Make sure to change the authentication details in order for you to connect to the instance successfully.

In [1]:
!pip install --force-reinstall --no-deps git+https://github.com/tigergraph/pyTigerGraph.git@topicDelay

Collecting git+https://github.com/tigergraph/pyTigerGraph.git@topicDelay
  Cloning https://github.com/tigergraph/pyTigerGraph.git (to revision topicDelay) to /tmp/pip-req-build-okiz1m_o
  Running command git clone --filter=blob:none --quiet https://github.com/tigergraph/pyTigerGraph.git /tmp/pip-req-build-okiz1m_o
  Running command git checkout -b topicDelay --track origin/topicDelay
  Switched to a new branch 'topicDelay'
  branch 'topicDelay' set up to track 'origin/topicDelay'.
  Resolved https://github.com/tigergraph/pyTigerGraph.git to commit a23b6dfc951c1de16c53bee117da0ce633f7c7d3
  Installing build dependencies ... [?25ldone
[?25h  Getting requirements to build wheel ... [?25ldone
[?25h  Preparing metadata (pyproject.toml) ... [?25ldone
[?25hBuilding wheels for collected packages: pyTigerGraph
  Building wheel for pyTigerGraph (pyproject.toml) ... [?25ldone
[?25h  Created wheel for pyTigerGraph: filename=pyTigerGraph-1.0.2-py3-none-any.whl size=113622 sha256=f8ae6edc834

In [2]:
from pyTigerGraph import TigerGraphConnection

conn=TigerGraphConnection(
    host="YOUR_HOSTNAME_HERE",
    graphname="KDD_2022_NFT",
    gsqlSecret="YOUR_SECRET_HERE"
)
conn.getToken("YOUR_SECRET_HERE")

('5jcbhq5c1sdeeq2ekkvljbbq0q0sa1jb', 1662843611, '2022-09-10 21:00:11')

In [3]:
import warnings
warnings.filterwarnings('ignore')

conn.gds.configureKafka(kafka_address="kaf.kdd.tigergraphlabs.com:19092")

## Create Graph Features

Some of the vertices don't have features we can use to pass into the Graph Neural Network we are defining later. To fix this, we are using FastRP to generate a feature vector that is a topologically-based embedding of the vertices in the graph we are embedding.

We are only running FastRP on Categories, Collections, and NFTs in the graph to prevent data contamination on Transactions. Future improvments could include using image-derived features for NFTs.

In [4]:
f = conn.gds.featurizer()

f.installAlgorithm("tg_fastRP")

'tg_fastRP'

In [5]:
params = {"v_type": ["Category", "NFT_Collection", "NFT"], 
          "e_type": ["COLLECTION_HAS_NFT", "CATEGORY_HAS_NFT", "NFT_IN_CATEGORY", "NFT_IN_COLLECTION"], 
          "weights": "1,2,4", 
          "beta": -0.1,
          "k": 3,
          "reduced_dim": 64, 
          "sampling_constant": 3,
          "random_seed": 42,
          "print_accum": False,
          "result_attr": "fastrp_embedding"}

f.runAlgorithm("tg_fastRP", params)

[]

## Define Data Loader

Here we define a subgraph neighbor loader to train our GNN with. This neighbor loader was introduced in the GraphSAGE paper.

By default, 2 hops with 10 neighbors each are used to sample the graph.

In [6]:
# DEFINE NEIGHBOR LOADER HERE. SEE code_answers/neighborLoader.py for correct implementation

train_loader = conn.gds.neighborLoader(
    v_in_feats={"Transaction": ["seller_k_size", "buyer_k_size"], 
                "NFT_User": ["pagerank", "kcore_size"], 
                "NFT": ["fastrp_embedding"], 
                "NFT_Collection": ["fastrp_embedding"], 
                "Category": ["fastrp_embedding"]},
    v_out_labels={"Transaction": ["usd_price"]},
    v_extra_feats={"Transaction":  ["train"]},
    filter_by={"Transaction": "train"},
    shuffle=True,
    batch_size=2048,
    buffer_size=4,
    add_self_loop=True,
    reverse_edge=True
)

In [7]:
for batch in train_loader:
    print(batch.metadata())
    break

(['Transaction', 'NFT', 'NFT_User', 'NFT_Collection', 'Category'], [('Transaction', 'NFT_SOLD_BY', 'NFT_User'), ('Transaction', 'NFT_BOUGHT_BY', 'NFT_User'), ('Transaction', 'FOR_SALE_OF', 'NFT'), ('NFT_User', 'USER_SOLD_NFT', 'Transaction'), ('NFT_User', 'USER_SOLD_TO', 'NFT_User'), ('NFT_User', 'USER_BOUGHT_FROM', 'NFT_User'), ('NFT_User', 'USER_BOUGHT_NFT', 'Transaction'), ('NFT', 'HAD_TRANSACTION', 'Transaction'), ('NFT', 'NFT_IN_COLLECTION', 'NFT_Collection'), ('NFT', 'NFT_IN_CATEGORY', 'Category')])


In [8]:
train_loader.num_batches

24

## Define Graph Attention Network

We define a Graph Attention Network that we will train to perform our regression task. PyTorch Geometric includes a utility to convert homogenous GNN models to work on heterogeneous graphs that we will be utilizing here.

In [9]:
import torch
import torch.nn.functional as F
from torch_geometric.nn import GATConv, to_hetero


device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Create a normal (homogeneous) GAT model
# SEE GAT model definition in code_answers/gat.py for correct implementation
class GAT(torch.nn.Module):
    def __init__(
        self, num_layers, out_dim, dropout, hidden_dim, num_heads
    ):
        super().__init__()
        self.dropout = dropout
        self.layers = torch.nn.ModuleList()
        for i in range(num_layers):
            in_units = (-1, -1) if i == 0 else hidden_dim * num_heads
            out_units = out_dim if i == (num_layers - 1) else hidden_dim
            heads = 1 if i == (num_layers - 1) else num_heads
            self.layers.append(
                GATConv(in_units, out_units, heads=heads, dropout=dropout)
            )
        self.double()

    def reset_parameters(self):
        for layer in self.layers:
            layer.reset_parameters()

    def forward(self, x, edge_index):
        x = x.float()
        for layer in self.layers[:-1]:
            x = layer(x, edge_index)
            x = F.elu(x)
            x = F.dropout(x, p=self.dropout, training=self.training)
        x = self.layers[-1](x, edge_index)
        return x

    
model = GAT(
    num_layers=2,
    out_dim=1,
    dropout=0.8,
    hidden_dim=8,
    num_heads=4,
)

# Convert it to a heterogeneous model. See https://pytorch-geometric.readthedocs.io/en/latest/modules/nn.html#torch_geometric.nn.to_hetero_transformer.to_hetero for details.
model = to_hetero(model, batch.metadata(), aggr='mul').to(device)

optimizer = torch.optim.Adam(model.parameters(), lr=0.001)

mae = torch.nn.L1Loss()

## Train GNN

We will be training the GNN for 20 epochs, and logging the results to TensorBoard.

In [10]:
from torch.utils.tensorboard import SummaryWriter
from datetime import datetime
# default `log_dir` is "runs" - we'll be more specific here
writer = SummaryWriter('runs/gnn_training'+str(datetime.now()))

In [11]:
for i in range(10):
    epochLoss = 0
    epochMae = 0

    j = 0
    for batch in train_loader:
        model.train()
        optimizer.zero_grad()
        out = model(batch.x_dict, batch.edge_index_dict)
        mask = batch["Transaction"].train
        loss = F.smooth_l1_loss(out["Transaction"][mask].flatten(), batch["Transaction"].y[mask])
        loss.backward()
        optimizer.step()
        epochLoss += loss.item()
        batchMae = mae(out["Transaction"][mask].flatten(), batch["Transaction"].y[mask])
        epochMae += batchMae.item()
        #print("Batch:", j, "Loss:", loss.item(), "MAE:", batchMae.item())

                # ...log the running loss
        writer.add_scalar('training loss',
                        loss.item(),
                        i * train_loader.num_batches + j)
        writer.add_scalar('training mae',
                          batchMae.item(),
                          i * train_loader.num_batches + j)

        j += 1
    print("EPOCH:", i, "LOSS:", epochLoss / train_loader.num_batches, "MAE:", epochMae / train_loader.num_batches)

EPOCH: 0 LOSS: 85.27291761465274 MAE: 85.63408809073924
EPOCH: 1 LOSS: 83.98076552629963 MAE: 84.34152881763622
EPOCH: 2 LOSS: 86.93331092411626 MAE: 87.2941620931299
EPOCH: 3 LOSS: 85.60263742033158 MAE: 85.96256543173915
EPOCH: 4 LOSS: 83.39928797380585 MAE: 83.75923085905602
EPOCH: 5 LOSS: 86.62987410545702 MAE: 86.99048633737402
EPOCH: 6 LOSS: 83.91957104965908 MAE: 84.28048034965917
EPOCH: 7 LOSS: 85.02264730016785 MAE: 85.38314597898149
EPOCH: 8 LOSS: 84.21897639932214 MAE: 84.57996342240907
EPOCH: 9 LOSS: 84.10481188456062 MAE: 84.46582055728906


## Test GNN

We define the test data loader and then evaluate the GNN.

In [12]:
test_loader = conn.gds.neighborLoader(
    v_in_feats={"Transaction": ["seller_k_size", "buyer_k_size"], 
                "NFT_User": ["pagerank", "kcore_size"], 
                "NFT": ["fastrp_embedding"], 
                "NFT_Collection": ["fastrp_embedding"], 
                "Category": ["fastrp_embedding"]},
    v_out_labels={"Transaction": ["usd_price"]},
    v_extra_feats={"Transaction":  ["test"]},
    filter_by={"Transaction": "test"},
    shuffle=False,
    batch_size=2048,
    add_self_loop=True,
    reverse_edge=True
)



In [13]:
totLoss = 0
totMAE = 0
for batch in test_loader:
    model.eval()
    with torch.no_grad():
        out = model(batch.x_dict, batch.edge_index_dict)
        mask = batch["Transaction"].test
        loss = F.smooth_l1_loss(out["Transaction"][mask].flatten(), batch["Transaction"].y[mask])
    totMAE += mae(out["Transaction"][mask].flatten(), batch["Transaction"].y[mask]).item()
    totLoss += loss.item()
print("LOSS:", totLoss / test_loader.num_batches, "MAE:", totMAE / test_loader.num_batches)

LOSS: 68.7366580001595 MAE: 69.09861176956103
