# GNN Inference on Azure ML using TigerGraph

In this notebook, we will train a GNN model and deploy it to Azure ML as an inference endpoint.

## Setup

We are going to create a working directory that will eventually be uploaded to the Azure inference endpoint. **Note:** the `mkdir` command below will fail if the directory already exists. You can safely ignore the error message.

In [None]:
import os

source_directory = "gat_cora"

os.mkdir("./{}".format(source_directory))

## Define The Model

We are going to define a Graph Attention Network (GAT) model, and write it to a file called `model.py`.

In [None]:
%%writefile $source_directory/model.py

import torch
import torch.nn.functional as F
from torch_geometric.nn import GATConv

class GAT(torch.nn.Module):
    def __init__(
        self, num_features, num_layers, out_dim, dropout, hidden_dim, num_heads
    ):
        super().__init__()
        self.dropout = dropout
        self.layers = torch.nn.ModuleList()
        for i in range(num_layers):
            in_units = num_features if i == 0 else hidden_dim * num_heads
            out_units = out_dim if i == (num_layers - 1) else hidden_dim
            heads = 1 if i == (num_layers - 1) else num_heads
            self.layers.append(
                GATConv(in_units, out_units, heads=heads, dropout=dropout)
            )

    def reset_parameters(self):
        for layer in self.layers:
            layer.reset_parameters()

    def forward(self, data):
        x, edge_index = data.x.float(), data.edge_index
        for layer in self.layers[:-1]:
            x = layer(x, edge_index)
            x = F.elu(x)
            x = F.dropout(x, p=self.dropout, training=self.training)
        x = self.layers[-1](x, edge_index)
        return x

## Model Parameters

Here, we define a dictionary of the parameters of the model, data loaders, and connection to the database.

In [None]:
parameters = {
    "model_name": "GAT",
    "model_config": {
        "num_features": 1433, # Number of features on Cora vertices 
        "out_dim": 7,         # Number of classes in Cora
        "num_heads": 8,       # Number of attention heads in GAT model
        "hidden_dim": 8,      # Number of hidden units in GAT model
        "num_layers": 2,      # Number of GAT layers in GAT model
        "dropout": 0.6        # Dropout probability in GAT model
    },
    "infer_loader_config": {
        "v_in_feats": ["x"],     # List of vertex features to be loaded
        "v_out_labels": ["y"],   # List of vertex labels to be loaded
        "v_extra_feats": [],     # Don't need any extra features for inference
        "output_format": "PyG",  # Using Pytorch Geometric format
        "batch_size": 64,        # Batch size for inference
        "num_neighbors": 10,     # Number of neighbors per vertex
        "num_hops": 2,           # How deep to go in the graph
        "shuffle": False         # Don't shuffle the data
    },
    "training_loader_config": {
        "v_in_feats": ["x"],
        "v_out_labels": ["y"],
        "v_extra_feats": ["train_mask","val_mask","test_mask"],
        "output_format": "PyG",
        "batch_size": 64, 
        "num_neighbors": 10, 
        "num_hops": 2,
        "shuffle": True
    },
    "optimizer_config": {
        "lr": 0.01,
        "weight_decay": 5e-4,
    },
    "connection_config": {
        "host": "http://35.230.92.92", 
        "graphname": "Cora", 
        "username": "tigergraph", 
        "password": "tigergraph"
    }
}

### Write Parameters to JSON File
We will write the parameters dictionary to a JSON file so that we can easily access the parameters when creating the inference container.

In [None]:
import json

json.dump(parameters, open("{}/config.json".format(source_directory), "w"))

## Train a GNN Model

### Load the Model
Here, we use some Python packaging tools to load the model. This is equivalent to writing `from source_directory.model import ModelName`.

Since `source_directory` and `ModelName` are unique to each developer's configs, we will use the `sys` package to import the model.

In [None]:
import sys
sys.path.append(source_directory)

import model
GAT = getattr(model, parameters["model_name"])

In [None]:
GAT

#### Instantiate the Model Class
Here, we use `kwargs` to pass in the parameters of the model from the parameters dictionary.

In [None]:
gat = GAT(**parameters["model_config"])
gat

### Create Data Loaders
Here, we instantiate a connection to our TigerGraph database with `pyTigerGraph`. Then we create data loaders for training, validation, and testing datasets. We will use the **Neighbor Sampling** technique introduced in the GraphSAGE paper to generate batches of data.

In [None]:
from pyTigerGraph import TigerGraphConnection

conn = TigerGraphConnection(**parameters["connection_config"])

In [None]:
train_loader = conn.gds.neighborLoader(
    **parameters["training_loader_config"],
    filter_by="train_mask"
)

In [None]:
valid_loader = conn.gds.neighborLoader(
    **parameters["training_loader_config"],
    filter_by="val_mask"
)

In [None]:
test_loader = conn.gds.neighborLoader(
    **parameters["training_loader_config"],
    filter_by="test_mask"
)

### Setup Optimizer
Here, we define the `Adam` optimizer and move the model to the correct device (CPU or GPU).

In [None]:
import torch
import torch.nn.functional as F

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

gat.to(device)

optimizer = torch.optim.Adam(
    gat.parameters(), **parameters["optimizer_config"]
)

### Train the Model

In [None]:
from datetime import datetime
from pyTigerGraph.gds.metrics import Accumulator, Accuracy

In [None]:
global_steps = 0
logs = {}
for epoch in range(10):
    # Train
    gat.train()
    epoch_train_loss = Accumulator()
    epoch_train_acc = Accuracy()
    for bid, batch in enumerate(train_loader):
        batchsize = batch.x.shape[0]
        batch.to(device)
        # Forward pass
        out = gat(batch)
        # Calculate loss
        loss = F.cross_entropy(out[batch.train_mask], batch.y[batch.train_mask])
        # Backward pass
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        epoch_train_loss.update(loss.item() * batchsize, batchsize)
        # Predict on training data
        with torch.no_grad():
            pred = out.argmax(dim=1)
            epoch_train_acc.update(pred[batch.train_mask], batch.y[batch.train_mask])
        # Log training status after each batch
        logs["loss"] = epoch_train_loss.mean
        logs["acc"] = epoch_train_acc.value
        print(
            "Epoch {}, Train Batch {}, Loss {:.4f}, Accuracy {:.4f}".format(
                epoch, bid, logs["loss"], logs["acc"]
            )
        )
        global_steps += 1
    # Evaluate
    gat.eval()
    epoch_val_loss = Accumulator()
    epoch_val_acc = Accuracy()
    for batch in valid_loader:
        batchsize = batch.x.shape[0]
        batch.to(device)
        with torch.no_grad():
            # Forward pass
            out = gat(batch)
            # Calculate loss
            valid_loss = F.cross_entropy(out[batch.val_mask], batch.y[batch.val_mask])
            epoch_val_loss.update(valid_loss.item() * batchsize, batchsize)
            # Prediction
            pred = out.argmax(dim=1)
            epoch_val_acc.update(pred[batch.val_mask], batch.y[batch.val_mask])
    # Log testing result after each epoch
    logs["val_loss"] = epoch_val_loss.mean
    logs["val_acc"] = epoch_val_acc.value
    print(
        "Epoch {}, Valid Loss {:.4f}, Valid Accuracy {:.4f}".format(
            epoch, logs["val_loss"], logs["val_acc"]
        )
    )

### Test the Model

In [None]:
gat.eval()
acc = Accuracy()
for batch in test_loader:
    batch.to(device)
    with torch.no_grad():
        pred = gat(batch).argmax(dim=1)
        acc.update(pred[batch.test_mask], batch.y[batch.test_mask])
print("Accuracy: {:.4f}".format(acc.value))

### Save the Trained Model Weights

In [None]:
torch.save(gat.state_dict(), "{}/model.pth".format(source_directory))

## Setup the Inference Container for Azure ML
Using the Azure ML tools, we will define the inference container. The Azure ML libraries will take care of building the container, but we need to define the custom inference script and environment.

### Define Scoring File
Azure ML requires a file called `score.py` to run inference. This file will load the model, model configuration, and model weights, and process requests.

In [None]:
%%writefile $source_directory/score.py

import pyTigerGraph as tg
import torch
import json
import os
import sys


def init():
    # Configure device usage
    device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
    
    # Load configuration JSON file
    with open(os.path.join(os.getenv("AZUREML_SOURCE_DIR"), "config.json")) as json_file:
        data = json.load(json_file)
        model_config = data["model_config"]
        connection_config = data["connection_config"]
        loader_config = data["infer_loader_config"]
        model_name = data["model_name"]

    sys.path.append(os.getenv("AZUREML_SOURCE_DIR"))

    # Load model definition
    import model
    global mdl
    mdl = getattr(model, model_name)(**model_config)

    # Setup Connection to TigerGraph Database
    global conn
    conn = tg.TigerGraphConnection(**connection_config)

    # Load the trained model weights
    with open(os.path.join(os.getenv("AZUREML_MODEL_DIR"), "model.pth"), 'rb') as f:
        mdl.load_state_dict(torch.load(f))
    mdl.to(device).eval()

    # Configure data loader
    global infer_loader
    infer_loader = conn.gds.neighborLoader(**loader_config)


def run(request):
    # load data from JSON request
    input_data = json.loads(request)
    
    # fetch subgraphs for JSON requests
    sub_graphs = infer_loader.fetch(input_data["vertices"])

    # move data to device
    device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
    mdl.to(device)
    sub_graphs.to(device)

    # run inference
    with torch.no_grad():
        output = mdl(sub_graphs)

    # process and return results
    returnJSON = {}
    for i in range(len(input_data["vertices"])):
        returnJSON[input_data["vertices"][i]["primary_id"]] = list(output[i].tolist())
    return returnJSON


### Create Azure ML Workspace
First, we are going to create an Azure ML workspace. This requires an Azure subscription id and a resource group name. You can define these in a `config.py` file that follows the same format as `config-temp.py` in this repository. The `location` parameter in the `Workspace.create()` function will need to be changed to your own Azure instance location.

In [None]:
from azureml.core import Workspace
import config as cfg

ws = Workspace.create(name="gat_cora",
                      subscription_id=cfg.subscription_id,
                      resource_group=cfg.resource_group,
                      create_resource_group=True,
                      location="eastus")

### Register Model
Here, we register the model with Azure ML.

In [None]:
from azureml.core.model import Model
from azureml.core import Environment
from azureml.core.model import InferenceConfig
from azureml.core.environment import CondaDependencies

model = Model.register(ws,
                       model_name="cora-gat",
                       model_path="{}/model.pth".format(source_directory))

### Setup Environment
Here, we define the environment variables and the Python packages that will be used for the inference process.

In [None]:
env = Environment(name='cora-gat')
conda_dep = CondaDependencies()
conda_dep.add_pip_package("pyTigerGraph")

conda_dep.add_channel("pytorch")
conda_dep.add_conda_package("cpuonly")

conda_dep.add_channel("pyg")
conda_dep.add_conda_package("pyg")

env.environment_variables = {'AZUREML_SOURCE_DIR': source_directory}
env.python.conda_dependencies = conda_dep

### Define Inference Config
Here, we define the configuration of the inference container. We will pass in the environment, source directory, and the scoring file.

In [None]:
inference_config = InferenceConfig(environment=env, 
                                   source_directory=source_directory,
                                   entry_script='./score.py')

## Deploy Model Locally

Using the Azure ML tools, we will first deploy the trained model locally. This will allow us to test the model before deploying it to Azure. This may take a while due to needing to download and build a Docker image. **Prequisite:** You must have Docker installed on your machine.

In [None]:
from azureml.core.webservice import LocalWebservice
deployment_config = LocalWebservice.deploy_configuration(port=6789)



service = Model.deploy(
    ws,
    "cora-gat",
    [model],
    inference_config,
    deployment_config,
    overwrite=True,
)

In [None]:
service.wait_for_deployment(show_output=True)

print(service.get_logs())
print("URL: {}".format(service.scoring_uri))

## Call Local Model
Here, we call the local REST endpoint to make predictions. Note that we return the raw logits output from the GAT model, which are not normalized. We could run the logits through a Softmax function to get the probabilities, but the index with the greatest number is the most likely class.

In [None]:
import requests

uri = "http://localhost:6789/score" # URL of local web service
requests.get("http://localhost:6789")
headers = {"Content-Type": "application/json"}

# The scoring function assumes a JSON file with a list of vertices with their types.
data = {"vertices": [{"primary_id": "100", "type": "Paper"},
                    {"primary_id": "55", "type": "Paper"}]}

data = json.dumps(data)
response = requests.post(uri, data=data, headers=headers)
print(response.json())

## Deploy to Azure ML
Once we verify the model locally, we can deploy it to Azure ML.

Azure ML can provide authentication on the REST endpoint. If desired, you can change the `auth` boolean variable below to `True` to enable authentication.

In [None]:
auth = False

from azureml.core.webservice import AciWebservice

deployment_config = AciWebservice.deploy_configuration(
    cpu_cores=2, memory_gb=8, auth_enabled=auth
)

service = Model.deploy(
    ws,
    "cora-gat",
    [model],
    inference_config,
    deployment_config,
    overwrite=True
)

service.wait_for_deployment(show_output=True)

## Call Model on Azure ML
Once deployed, we can call the endpoint hosted on Azure ML. Once again, note that we return the raw logits output from the GAT model, which are not normalized. We could run the logits through a Softmax function to get the probabilities, but the index with the greatest number is the most likely class.

In [None]:
from azureml.core import Webservice

service = Webservice(workspace=ws, name="cora-gat")
scoring_uri = service.scoring_uri
    
# Set the appropriate headers
headers = {"Content-Type": "application/json"}

# If the service is authenticated, set the key or token
if auth:
    key, _ = service.get_keys()
    headers["Authorization"] = f"Bearer {key}"

# Make the request and display the response and logs
data = {"vertices": [{"primary_id": "100", "type": "Paper"},
                    {"primary_id": "55", "type": "Paper"}]}

data = json.dumps(data)
resp = requests.post(scoring_uri, data=data, headers=headers)
print(resp.text)