# CoolGraph usage Examples

1. [Multitarget prediction](#p1)
2. [Predict_proba](#p2)
3. [Edge attributes (dataset YelpChi)](#p3)
4. [HypeRunner example](#p4)

# 1. <a id="p1"> Multitarget prediction </a> 

### Training `Runner` on `Multitarget 10k` dataset which has 4 targets

In [2]:
# importing loader of dataset
from cool_graph.datasets.multitarget import Multitarget
import torch
import pandas as pd
from torch_geometric.data import Data
# initializing loader and downloading dataset heterogeneous Multitarget 50k
dataset = Multitarget(root="./data", name="50k")
hetero_data = dataset.data
# Take just <node_1> node type
# So we use homogenious data (standard)
data_50k = Data(**hetero_data['node_1'], **hetero_data[('node_1', 'to', 'node_1')])
data_50k

Using existing file ./data/50k/50k_data.pt


Data(x=[5860353, 162], edge_index=[2, 4444748], edge_attr=[4444748, 44], y=[5860353, 4], label_3=[5860353], label_4=[5860353], label_5=[5860353], label_6=[5860353], label_mask=[5860353], index=[5860353])

In [3]:
def dataset_info(data):
    n_features = data.x.shape[1]
    n_nodes = data.x.shape[0]
    n_edges = data.edge_index.shape[1]
    if len(data.y.shape) == 1:
        print(f'# nodes    {n_nodes} \n# features {n_features} \n# edges    {n_edges} \n# classes  {len(data.y.unique())}')
    else:
        print(f'# nodes    {n_nodes} \n# features {n_features} \n# edges    {n_edges} \n# tasks    {data.y.shape[1]}')

In [4]:
dataset_info(data_50k)

# nodes    5860353 
# features 162 
# edges    4444748 
# tasks    4


#### As we see, data_50k  has 4 tasks

In [38]:
from cool_graph.runners import Runner
# initializing runner
runner = Runner(data_50k, 
                metrics=["accuracy", "roc_auc"],
                overrides=["training.n_epochs=20"], #using overrides to change config parameters e.g num of epochs
                use_edge_attr=True,
                verbose=False) # clear output 
# running
result = runner.run()

In [39]:
result["best_loss"]["tasks"]

{'y0': {'accuracy': 0.7876106194690266, 'roc_auc': 0.6368211610486891},
 'y1': {'accuracy': 0.8382789317507419, 'roc_auc': 0.7481407810343428},
 'y2': {'accuracy': 0.8130030959752322, 'roc_auc': 0.7789663762662671},
 'y3': {'accuracy': 0.9183564567769477, 'roc_auc': 0.772605226479513}}

#### Success! Runner returned result for each of 4 tasks

# 2. Predict_proba

#### Now when `Runner` is trained on `Multitarget 50k` let's try to make predictions of probabilities on its smaller version `Multitarget 10k` using `predict_proba()` function

In [40]:
# initializing loader and downloading dataset heterogeneous Multitarget 50k
dataset_10k = Multitarget(root="./data", name="10k")
hetero_data_10k = dataset.data
# Take just <node_1> node type
# So we use homogenious data (standard)
data_10k = Data(**hetero_data_10k['node_1'], **hetero_data_10k[('node_1', 'to', 'node_1')])
data_10k

Using existing file ./data/10k/10k_data.pt


Data(x=[5860353, 162], edge_index=[2, 4444748], edge_attr=[4444748, 44], y=[5860353, 4], label_3=[5860353], label_4=[5860353], label_5=[5860353], label_6=[5860353], label_mask=[5860353], index=[5860353])

In [41]:
# getting prediction of probabilities of nodes from dataset 10k belonging to each of 2 classes 
# for each of 4 tasks
preds, indices = runner.predict_proba(data_10k)

In [42]:
preds

{'y0': array([[0.9017693 , 0.09823073],
        [0.7370497 , 0.26295033],
        [0.8555638 , 0.14443615],
        ...,
        [0.47313455, 0.5268655 ],
        [0.8173959 , 0.18260406],
        [0.73850167, 0.26149836]], dtype=float32),
 'y1': array([[0.9371028 , 0.06289722],
        [0.69243616, 0.30756384],
        [0.9101161 , 0.08988392],
        ...,
        [0.5783753 , 0.4216247 ],
        [0.9145268 , 0.08547316],
        [0.8037039 , 0.1962961 ]], dtype=float32),
 'y2': array([[0.9348664 , 0.06513356],
        [0.68636143, 0.31363857],
        [0.9185518 , 0.08144818],
        ...,
        [0.5885619 , 0.41143805],
        [0.8848064 , 0.11519361],
        [0.84842014, 0.15157993]], dtype=float32),
 'y3': array([[0.9589429 , 0.04105709],
        [0.7329957 , 0.26700428],
        [0.9534491 , 0.04655091],
        ...,
        [0.68484616, 0.3151538 ],
        [0.93870944, 0.06129053],
        [0.89100754, 0.10899248]], dtype=float32)}

In [43]:
#lets try to calculate accuracy on our predictions with calc_metrics function
from cool_graph.train.metrics import calc_metrics
# passing data_10k as our true labels, predictions from predict_proba and indices of nodes from predict_proba
metrics = calc_metrics(data_10k, preds, metrics=["roc_auc", "accuracy"], indices=indices)
metrics

{'y0': {'roc_auc': 0.7670744391065326, 'accuracy': 0.7940828402366864},
 'y1': {'roc_auc': 0.7991345626270197, 'accuracy': 0.8475287251504651},
 'y2': {'roc_auc': 0.8403112528995542, 'accuracy': 0.8343036978756885},
 'y3': {'roc_auc': 0.8375078288720048, 'accuracy': 0.914281946459185}}

#### Success!

# 3. <a id="p3"> Edge attributes (dataset YelpChi) </a>

#### Training `Runner` on `YelpChi` dataset which has edge attributes <br> Firstly without using edge_attributes <br> Secondly with using edge attributes

In [26]:
# importing loader for YelpChi dataset
from cool_graph.datasets.antifraud import AntiFraud
# initializing dataset
dataset = AntiFraud(root="./data", name="YelpChi")
data = dataset.data
data

Using existing file ./data/yelpchi/YelpChi_data.pt


Data(x=[45954, 32], edge_index=[2, 7693958], edge_attr=[7693958, 12], y=[45954])

#### As we see, data has edge_attributes

### Running with flag `use_edge_attr=False`

In [19]:
seed = 42 # using same seed for both runs
runner_no_edge = Runner(data, 
                        use_edge_attr=False,
                        metrics = ['roc_auc','accuracy', 'f1'], # defining metrics
                        seed=seed,
                        verbose=False) # clear output 
result_no_edge = runner_no_edge.run()

### Running with flag `use_edge_attr=True`

In [27]:
runner_with_edge = Runner(data, 
                        use_edge_attr=True,
                        metrics = ['roc_auc','accuracy', 'f1'], # defining metrics
                        seed=seed,
                        verbose=False) # clear output 
result_with_edge = runner_with_edge.run()

In [30]:
print("with no edge attributes:", result_no_edge["best_loss"]["roc_auc"])
print("with using edge attributes:", result_with_edge["best_loss"]["roc_auc"])

with no edge attributes: 0.869
with using edge attributes: 0.906


### Roc_auc 0.869 vs 0.906 on same seed, so using edge attributes makes predictions better

# 4. <a id="p4"> HypeRunner example </a>

### Unlike Runner, HypeRunner selects hyperparameters with `Optuna`, it has `optimize_run()` instead of `run()` and the rest of the usage is no different from the runner

In [2]:
from torch_geometric import datasets
import torch
import pandas as pd
from torch_geometric.data import Data
# use simple Amazon dataset with Computers
dataset = datasets.Amazon(root='./data/Amazon', name='Computers')
data = dataset.data
data

Data(x=[13752, 767], edge_index=[2, 491722], y=[13752])

In [None]:
from cool_graph.runners import HypeRunner
# initializing hyperunner
hyperunner = HypeRunner(data, 
                seed=42,
                verbose=False)
hyperunner_result = hyperunner.optimize_run(n_trials=10)

In [None]:
hyperunner_result