# Оглавление <a name="toc"></a>
1. [Default](#p1)
2. [Integration with MLFLOW](#p2)
3. [Multitarget](#p3)
4. [Command line mode](#p4)
5. [Optuna](#p5)


In [1]:
def dataset_info(data):
    n_features = data.x.shape[1]
    n_nodes = data.x.shape[0]
    n_edges = data.edge_index.shape[1]
    if len(data.y.shape) == 1:
        print(f'# nodes    {n_nodes} \n# features {n_features} \n# edges    {n_edges} \n# classes  {len(data.y.unique())}')
    else:
        print(f'# nodes    {n_nodes} \n# features {n_features} \n# edges    {n_edges} \n# tasks    {data.y.shape[1]}')

# 1. Default <a name="p1"></a>

In [2]:
import cool_graph
from cool_graph.runners import Runner
from torch_geometric import datasets

  from .autonotebook import tqdm as notebook_tqdm


In [9]:
# use simple Amazon dataset with Computers
dataset = datasets.Amazon(root='./data/Amazon', name='Computers')
data = dataset.data
data

Data(x=[13752, 767], edge_index=[2, 491722], y=[13752])

In [11]:
dataset_info(data)

# nodes    13752 
# features 767 
# edges    491722 
# classes  10


In [15]:
runner = Runner(data)

In [16]:
%%time
result = runner.run()

Sample data: 100%|██████████| 42/42 [00:03<00:00, 12.23it/s]
Sample data: 100%|██████████| 14/14 [00:01<00:00, 10.52it/s]
2023-11-20 10:50:39.324 | INFO     | cool_graph.train.trainer:train:230 - 
Epoch 000: 
2023-11-20 10:50:39.698 | INFO     | cool_graph.train.helpers:eval_epoch:179 - test:
 {'accuracy': 0.456, 'cross_entropy': 1.466, 'f1_weighted': 0.382, 'calc_time': 0.006, 'main_metric': 0.456}
2023-11-20 10:50:39.700 | INFO     | cool_graph.train.trainer:train:257 - Epoch 000: 
2023-11-20 10:50:40.497 | INFO     | cool_graph.train.helpers:eval_epoch:179 - train:
 {'accuracy': 0.453, 'cross_entropy': 1.51, 'f1_weighted': 0.381, 'calc_time': 0.013, 'main_metric': 0.453}
2023-11-20 10:50:47.532 | INFO     | cool_graph.train.trainer:train:230 - 
Epoch 005: 
2023-11-20 10:50:47.804 | INFO     | cool_graph.train.helpers:eval_epoch:179 - test:
 {'accuracy': 0.884, 'cross_entropy': 0.322, 'f1_weighted': 0.868, 'calc_time': 0.005, 'main_metric': 0.884}
2023-11-20 10:50:47.805 | INFO     |

CPU times: user 42min 10s, sys: 2min 6s, total: 44min 17s
Wall time: 37.8 s


In [17]:
result['best_loss']

{'accuracy': 0.935,
 'cross_entropy': 0.217,
 'f1_weighted': 0.935,
 'calc_time': 0.004,
 'main_metric': 0.935,
 'epoch': 10}

In [18]:
result['test_metric']

Unnamed: 0,accuracy,cross_entropy,f1_weighted,calc_time,main_metric,epoch
0,0.456,1.466,0.382,0.006,0.456,0
1,0.884,0.322,0.868,0.005,0.884,5
2,0.935,0.217,0.935,0.004,0.935,10
3,0.935,0.222,0.935,0.006,0.935,15
4,0.932,0.234,0.931,0.004,0.932,20


In [19]:
# local logging 
str(runner.chkpt_dir)

'checkpoints/2023-11-20 10:50:32'

# 2. Integration with MLFLOW <a name="p2"></a>

In [20]:
# using MLflow

In [21]:
dataset = datasets.Amazon(root = './data/Amazon',name='Computers')
data = dataset.data

In [22]:
dataset_info(data)

# nodes    13752 
# features 767 
# edges    491722 
# classes  10


In [29]:
# fill the data of MLFlow in config logging/in_memory_data or if you use cli mode - logging/default

# 3. Multitarget <a name="p3"></a>

In [33]:
from torch_geometric import datasets
from cool_graph.runners import Runner

dataset = datasets.Yelp(root='../data/Yelp')
data = dataset.data
data.y = data.y.long()

In [22]:
dataset_info(data)

# nodes    716847 
# features 300 
# edges    13954819 
# tasks    100


In [23]:
# you can override default params 

In [34]:
config_path = "../config/full.yaml'"

In [24]:
runner = Runner(data, metrics=['roc_auc','accuracy', 'f1'], batch_size='auto', train_size=0.1, test_size=0.02,
                overrides=['training.n_epochs=1'], config_path=config_path)

In [25]:
%%time
results = runner.run()

Sample data: 100%|██████████| 135/135 [00:42<00:00,  3.19it/s]
Sample data: 100%|██████████| 27/27 [00:08<00:00,  3.26it/s]
2023-04-03 17:35:28.882 | INFO     | cool_graph.train.trainer:train:199 - 
Epoch 000: 
2023-04-03 17:35:35.135 | INFO     | cool_graph.train.helpers:eval_epoch:157 - test:
 {'roc_auc': 0.7849970598018029, 'accuracy': 0.9154697635488594, 'f1': 0.09385270380622077, 'calc_time': 0.10418558915456136, 'main_metric': 0.7849970598018029}
2023-04-03 17:35:36.353 | INFO     | cool_graph.train.trainer:train:226 - Epoch 000: 
2023-04-03 17:36:05.703 | INFO     | cool_graph.train.helpers:eval_epoch:157 - train:
 {'roc_auc': 0.7854606437487387, 'accuracy': 0.9154900675185539, 'f1': 0.0908856985764578, 'calc_time': 0.4891481280326843, 'main_metric': 0.7854606437487387}


CPU times: user 2h 13min 53s, sys: 9min 1s, total: 2h 22min 54s
Wall time: 3min 14s


# 4. Command line mode <a name="p4"></a>

- feats on edges - training attention nn
- 2+ groups in nodes with different feats on it
- multitarget
- categorical feats 

In [36]:
!head -n 25 ../cool_graph/config/data/default.yaml

# this config is using  for data where nodes have groups
# path to train/validation data in parquet
train:
  nodes_path: ../../tests/sample_data/nodes.parquet
  edges_path: ../../tests/sample_data/edges.parquet
  labels_path: ../../tests/sample_data/labels.parquet
validation:
  nodes_path: ../../tests/sample_data/nodes.parquet
  edges_path: ../../tests/sample_data/edges.parquet
  labels_path: ../../tests/sample_data/labels.parquet
read_edge_attr: True # set True if you read data from disk and it has feats on edges
group_mask_col: node_feature_1 # column for mask in groups
label_mask_col: label_2 # column for mask in labels
label_index_col: index # index column in labels data
# columns with indices (edges between nodes)
edge_index_cols: 
  - index2
  - index1
# names in node groups 
groups_names: 
  1: group_2
  0: group_1
# target columns 
label_cols: 
  - index


In [41]:
coolrun --config /cool_graph/config/data/full.yaml 

# 5. Optuna <a name="p5"></a>

 - **Searching the best params for the models**
 
 - HyperRunner:
 * 1st trial - deafault params from config 
 * 2nd trial - you can add own trial as argument enqueue_trial in optimazire_run method, and next trial optuna optimize model params randomly, if None -> randomly after 1st default trial
 
 returns a DataFrame with trials and metrics 

In [42]:
from cool_graph.runners import HypeRunner
from torch_geometric import datasets

In [43]:
data = datasets.Amazon(root='./data/Amazon/', name='Computers').data

In [44]:
runner = HypeRunner(data)

In [45]:
# own dict with model params for trial
my_params1 = {'conv_type': 'GraphConv',
 'activation': 'relu',
 'lin_prep_dropout_rate': 0.4,
 'lin_prep_len': 1,
 'lin_prep_size_common': 512,
 'lin_prep_sizes': [256],
 'lin_prep_weight_norm_flag': True,
 'graph_conv_weight_norm_flag': True,
 'n_hops': 2,
 'conv1_aggrs': {'mean': 64, 'max': 32, 'add': 16},
 'conv1_dropout_rate': 0.2,
 'conv2_aggrs': {'mean': 32, 'max': 16, 'add': 8},
 'conv2_dropout_rate': 0.2}

my_params2 = {'conv_type': 'GraphConv',
 'activation': 'prelu',
 'lin_prep_dropout_rate': 0.5,
 'lin_prep_len': 1,
 'lin_prep_size_common': 512,
 'lin_prep_sizes': [256],
 'lin_prep_weight_norm_flag': False,
 'graph_conv_weight_norm_flag': True,
 'n_hops': 2,
 'conv1_aggrs': {'mean': 64, 'max': 32, 'add': 16},
 'conv1_dropout_rate': 0.2,
 'conv2_aggrs': {'mean': 32, 'max': 16, 'add': 8},
 'conv2_dropout_rate': 0.2}

In [46]:
result = runner.optimize_run(n_trials=5, enqueue_trial=[my_params1, my_params2])

Sample data: 100%|██████████| 42/42 [00:04<00:00,  9.93it/s]
Sample data: 100%|██████████| 14/14 [00:01<00:00, 10.15it/s]
[32m[I 2023-11-20 11:04:50,648][0m A new study created in memory with name: no-name-b21ba7cf-0d18-49fb-9b2d-0860c880a72d[0m
  self.study.enqueue_trial(trial_params)
  create_trial(state=TrialState.WAITING, system_attrs={"fixed_params": params})
  self.add_trial(
  self.study.enqueue_trial(user_params)
2023-11-20 11:04:52.607 | INFO     | cool_graph.train.trainer:train:230 - 
Epoch 000: 
2023-11-20 11:04:52.819 | INFO     | cool_graph.train.helpers:eval_epoch:179 - test:
 {'accuracy': 0.459, 'cross_entropy': 1.47, 'f1_weighted': 0.335, 'calc_time': 0.004, 'main_metric': 0.459}
2023-11-20 11:04:52.820 | INFO     | cool_graph.train.trainer:train:257 - Epoch 000: 
2023-11-20 11:04:53.484 | INFO     | cool_graph.train.helpers:eval_epoch:179 - train:
 {'accuracy': 0.479, 'cross_entropy': 1.415, 'f1_weighted': 0.359, 'calc_time': 0.011, 'main_metric': 0.479}
2023-11-20 

Study statistics: 
  Number of finished trials:  5
  Number of complete trials:  5
Best trial:
  Value:  0.916
  Params: 
{'conv_type': 'GraphConv', 'activation': 'leakyrelu', 'lin_prep_len': 1, 'lin_prep_dropout_rate': 0.4, 'lin_prep_weight_norm_flag': True, 'lin_prep_size_common': 512, 'lin_prep_sizes': [256], 'n_hops': 2, 'conv1_aggrs': {'mean': 128, 'max': 64, 'add': 32}, 'conv1_dropout_rate': 0.2, 'conv2_aggrs': {'mean': 64, 'max': 32, 'add': 16}, 'conv2_dropout_rate': 0.2, 'graph_conv_weight_norm_flag': True}


In [47]:
result

Unnamed: 0,number,value,datetime_start,datetime_complete,duration,system_attrs_fixed_params,state,conv_type,activation,lin_prep_len,lin_prep_dropout_rate,lin_prep_weight_norm_flag,lin_prep_size_common,lin_prep_sizes,n_hops,conv1_aggrs,conv1_dropout_rate,conv2_aggrs,conv2_dropout_rate,graph_conv_weight_norm_flag
0,0,0.916,2023-11-20 11:04:50.651106,2023-11-20 11:05:23.925253,0 days 00:00:33.274147,"{'activation': 'leakyrelu', 'lin_prep_len': 1,...",COMPLETE,GraphConv,leakyrelu,1,0.4,True,512,[256],2,"{'mean': 128, 'max': 64, 'add': 32}",0.2,"{'mean': 64, 'max': 32, 'add': 16}",0.2,True
1,1,0.913,2023-11-20 11:05:23.933594,2023-11-20 11:05:55.882016,0 days 00:00:31.948422,"{'activation': 'relu', 'lin_prep_len': 1, 'lin...",COMPLETE,GraphConv,relu,1,0.4,True,512,[256],2,"{'mean': 64, 'max': 32, 'add': 16}",0.2,"{'mean': 32, 'max': 16, 'add': 8}",0.2,True
2,2,0.91,2023-11-20 11:05:55.885662,2023-11-20 11:06:29.517902,0 days 00:00:33.632240,"{'activation': 'prelu', 'lin_prep_len': 1, 'li...",COMPLETE,GraphConv,prelu,1,0.5,False,512,[256],2,"{'mean': 64, 'max': 32, 'add': 16}",0.2,"{'mean': 32, 'max': 16, 'add': 8}",0.2,True
3,3,0.787,2023-11-20 11:06:29.520728,2023-11-20 11:06:56.705546,0 days 00:00:27.184818,,COMPLETE,GraphConv,relu,2,0.492288,False,64,"[20, 16]",2,"{'mean': 8, 'max': 5, 'add': 2}",0.476272,"{'mean': 8, 'max': 7, 'add': 3}",0.035856,True
4,4,0.907,2023-11-20 11:06:56.707726,2023-11-20 11:07:31.458560,0 days 00:00:34.750834,,COMPLETE,GraphConv,gelu,2,0.033315,False,934,"[586, 134]",2,"{'mean': 47, 'max': 10, 'add': 9}",0.308789,"{'mean': 39, 'max': 27, 'add': 4}",0.169701,True


In [48]:
runner2 = HypeRunner(data)

In [49]:
# you can optimize params with no dict, optuna makes it randomly
result2 = runner2.optimize_run(n_trials=3)

Sample data: 100%|██████████| 42/42 [00:03<00:00, 12.13it/s]
Sample data: 100%|██████████| 14/14 [00:00<00:00, 15.77it/s]
[32m[I 2023-11-20 11:07:36,220][0m A new study created in memory with name: no-name-ca69d684-d645-47ca-937a-0ce95d9dfa29[0m
  self.study.enqueue_trial(trial_params)
  create_trial(state=TrialState.WAITING, system_attrs={"fixed_params": params})
  self.add_trial(
2023-11-20 11:07:38.563 | INFO     | cool_graph.train.trainer:train:230 - 
Epoch 000: 
2023-11-20 11:07:38.889 | INFO     | cool_graph.train.helpers:eval_epoch:179 - test:
 {'accuracy': 0.572, 'cross_entropy': 1.332, 'f1_weighted': 0.526, 'calc_time': 0.005, 'main_metric': 0.572}
2023-11-20 11:07:38.890 | INFO     | cool_graph.train.trainer:train:257 - Epoch 000: 
2023-11-20 11:07:39.772 | INFO     | cool_graph.train.helpers:eval_epoch:179 - train:
 {'accuracy': 0.582, 'cross_entropy': 1.318, 'f1_weighted': 0.537, 'calc_time': 0.015, 'main_metric': 0.582}
2023-11-20 11:07:46.621 | INFO     | cool_graph.tr

Study statistics: 
  Number of finished trials:  3
  Number of complete trials:  3
Best trial:
  Value:  0.924
  Params: 
{'conv_type': 'GraphConv', 'activation': 'leakyrelu', 'lin_prep_len': 1, 'lin_prep_dropout_rate': 0.4, 'lin_prep_weight_norm_flag': True, 'lin_prep_size_common': 512, 'lin_prep_sizes': [256], 'n_hops': 2, 'conv1_aggrs': {'mean': 128, 'max': 64, 'add': 32}, 'conv1_dropout_rate': 0.2, 'conv2_aggrs': {'mean': 64, 'max': 32, 'add': 16}, 'conv2_dropout_rate': 0.2, 'graph_conv_weight_norm_flag': True}


In [50]:
result2

Unnamed: 0,number,value,datetime_start,datetime_complete,duration,system_attrs_fixed_params,state,conv_type,activation,lin_prep_len,lin_prep_dropout_rate,lin_prep_weight_norm_flag,lin_prep_size_common,lin_prep_sizes,n_hops,conv1_aggrs,conv1_dropout_rate,conv2_aggrs,conv2_dropout_rate,graph_conv_weight_norm_flag
0,0,0.924,2023-11-20 11:07:36.222776,2023-11-20 11:08:08.834058,0 days 00:00:32.611282,"{'activation': 'leakyrelu', 'lin_prep_len': 1,...",COMPLETE,GraphConv,leakyrelu,1,0.4,True,512,[256],2,"{'mean': 128, 'max': 64, 'add': 32}",0.2,"{'mean': 64, 'max': 32, 'add': 16}",0.2,True
1,1,0.868,2023-11-20 11:08:08.836232,2023-11-20 11:08:37.657511,0 days 00:00:28.821279,,COMPLETE,GraphConv,relu,2,0.492288,False,64,"[20, 16]",2,"{'mean': 8, 'max': 5, 'add': 2}",0.476272,"{'mean': 8, 'max': 7, 'add': 3}",0.035856,True
2,2,0.922,2023-11-20 11:08:37.660058,2023-11-20 11:09:14.773085,0 days 00:00:37.113027,,COMPLETE,GraphConv,gelu,2,0.033315,False,934,"[586, 134]",2,"{'mean': 47, 'max': 10, 'add': 9}",0.308789,"{'mean': 39, 'max': 27, 'add': 4}",0.169701,True


# 6. Multitarget + groups of nodes (heterogeneous graph) in jupiter for Data <a name="p5"></a>

In [51]:
from cool_graph.runners import MultiRunner
import torch

In [52]:
data = torch.load("../tests/sample_data/sample_of_graph")

In [55]:
runner = MultiRunner(data)

In [56]:
result = runner.run()

Sample data: 100%|██████████| 863/863 [01:23<00:00, 10.34it/s]
Sample data: 100%|██████████| 288/288 [00:26<00:00, 10.99it/s]
2023-11-20 11:43:30.952 | INFO     | cool_graph.train.trainer:train:230 - 
Epoch 000: 
2023-11-20 11:43:54.536 | INFO     | cool_graph.train.helpers:eval_epoch:179 - test:
 {'label_3__accuracy__group_1': 0.928, 'label_3__accuracy__group_2': 0.944, 'label_3__cross_entropy__group_1': 0.257, 'label_3__cross_entropy__group_2': 0.208, 'label_3__f1_weighted__group_1': 0.894, 'label_3__f1_weighted__group_2': 0.917, 'label_3__roc_auc__group_1': 0.591, 'label_3__roc_auc__group_2': 0.69, 'label_4__accuracy__group_1': 0.952, 'label_4__accuracy__group_2': 0.969, 'label_4__cross_entropy__group_1': 0.194, 'label_4__cross_entropy__group_2': 0.13, 'label_4__f1_weighted__group_1': 0.928, 'label_4__f1_weighted__group_2': 0.954, 'label_4__roc_auc__group_1': 0.593, 'label_4__roc_auc__group_2': 0.747, 'label_5__accuracy__group_1': 0.944, 'label_5__accuracy__group_2': 0.96, 'label_5_

In [58]:
result["best_loss"]

{'label_3__accuracy__group_1': 0.928,
 'label_3__accuracy__group_2': 0.944,
 'label_3__cross_entropy__group_1': 0.257,
 'label_3__cross_entropy__group_2': 0.208,
 'label_3__f1_weighted__group_1': 0.894,
 'label_3__f1_weighted__group_2': 0.917,
 'label_3__roc_auc__group_1': 0.591,
 'label_3__roc_auc__group_2': 0.69,
 'label_4__accuracy__group_1': 0.952,
 'label_4__accuracy__group_2': 0.969,
 'label_4__cross_entropy__group_1': 0.194,
 'label_4__cross_entropy__group_2': 0.13,
 'label_4__f1_weighted__group_1': 0.928,
 'label_4__f1_weighted__group_2': 0.954,
 'label_4__roc_auc__group_1': 0.593,
 'label_4__roc_auc__group_2': 0.747,
 'label_5__accuracy__group_1': 0.944,
 'label_5__accuracy__group_2': 0.96,
 'label_5__cross_entropy__group_1': 0.204,
 'label_5__cross_entropy__group_2': 0.161,
 'label_5__f1_weighted__group_1': 0.916,
 'label_5__f1_weighted__group_2': 0.94,
 'label_5__roc_auc__group_1': 0.708,
 'label_5__roc_auc__group_2': 0.755,
 'label_6__accuracy__group_1': 0.98,
 'label_6__ac

In [59]:
result["test_metric"]

Unnamed: 0,label_3__accuracy__group_1,label_3__accuracy__group_2,label_3__cross_entropy__group_1,label_3__cross_entropy__group_2,label_3__f1_weighted__group_1,label_3__f1_weighted__group_2,label_3__roc_auc__group_1,label_3__roc_auc__group_2,label_4__accuracy__group_1,label_4__accuracy__group_2,...,label_6__accuracy__group_2,label_6__cross_entropy__group_1,label_6__cross_entropy__group_2,label_6__f1_weighted__group_1,label_6__f1_weighted__group_2,label_6__roc_auc__group_1,label_6__roc_auc__group_2,calc_time,main_metric,epoch
0,0.928,0.944,0.257,0.208,0.894,0.917,0.591,0.69,0.952,0.969,...,0.982,0.093,0.079,0.971,0.973,0.686,0.801,0.393,0.957,0
1,0.928,0.944,0.251,0.201,0.894,0.917,0.661,0.731,0.952,0.969,...,0.982,0.092,0.078,0.971,0.973,0.695,0.807,0.291,0.957,5
