# graph features usage examples

1. [Node Features](#p1)
2. [Edge Features](#p2)
3. [Use The Node Features when the data.x is empty](#p3)

In [2]:
from cool_graph.runners import Runner, HypeRunner
from cool_graph.datasets import AntiFraud, S_FFSD

# 1. Node Features <a class="anchor" id="p1"></a>

You can use some properties of nodes in the graph as node attributes <br>
You can do this using the use_graph_node_features flag <br>
By default, networks.degree_centrality and networkx.pagerank are added, processed using quantile transformation

In [3]:
amazonfraud = AntiFraud(root='./data', name='Amazon')
amazonfraud.data

Using existing file ./data/amazon/Amazon_data.pt


Data(x=[11944, 25], edge_index=[2, 8835152], edge_attr=[8835152, 12], y=[11944])

In [4]:
runner = Runner(
    amazonfraud.data,
    use_graph_node_features=True,
    overridses=['training.n_epochs=20']
)
runner.data

Data(x=[11944, 27], edge_index=[2, 8835152], edge_attr=[8835152, 12], y=[11944], group_mask=[11944], label_mask=[11944])

The calculation of features may take some time on large datasets

In [5]:
result=runner.run()
print(result['best_loss'])

Sample data: 100%|██████████| 36/36 [00:08<00:00,  4.50it/s]
Sample data: 100%|██████████| 12/12 [00:02<00:00,  4.69it/s]
2024-07-26 17:25:46 - epoch 0 test:            
 {'accuracy': 0.93, 'cross_entropy': 0.284, 'f1_weighted': 0.896, 'calc_time': 0.007, 'main_metric': 0.93}
2024-07-26 17:25:47 - epoch 0 train:           
 {'accuracy': 0.932, 'cross_entropy': 0.282, 'f1_weighted': 0.899, 'calc_time': 0.018, 'main_metric': 0.932}
2024-07-26 17:26:01 - epoch 5 test:            
 {'accuracy': 0.98, 'cross_entropy': 0.085, 'f1_weighted': 0.979, 'calc_time': 0.009, 'main_metric': 0.98}
2024-07-26 17:26:03 - epoch 5 train:           
 {'accuracy': 0.98, 'cross_entropy': 0.086, 'f1_weighted': 0.979, 'calc_time': 0.032, 'main_metric': 0.98}
2024-07-26 17:26:16 - epoch 10 test:           
 {'accuracy': 0.981, 'cross_entropy': 0.079, 'f1_weighted': 0.98, 'calc_time': 0.009, 'main_metric': 0.981}
2024-07-26 17:26:18 - epoch 10 train:          
 {'accuracy': 0.98, 'cross_entropy': 0.079, 'f1_weig

{'accuracy': 0.982, 'cross_entropy': 0.078, 'f1_weighted': 0.981, 'calc_time': 0.012, 'main_metric': 0.982, 'tasks': {'y': {'accuracy': 0.9815807099799062, 'cross_entropy': 0.07787884771823883, 'f1_weighted': 0.9807472823965868}}, 'epoch': 15}


# 2. Edge Features <a class="anchor" id="p2"></a>

You can also calculate the edge features <br>
There is a use_graph_edge_features flag for this <br>
By default, the total degree of nodes connected by an edge and the number of common neighbors of these nodes are used, processed using quantile transformation

In [6]:
s_ffsd = S_FFSD(root='./data')
s_ffsd.data

Using existing file ./data/S-FFSD_data.pt


Data(x=[77881, 126], edge_index=[2, 233164], y=[77881])

In [7]:
runner = Runner(
    s_ffsd.data,
    use_graph_edge_features=True,
    use_edge_attr=True,
    overrides=['training.n_epochs=20']
)
runner.data

Data(x=[77881, 126], edge_index=[2, 233164], y=[77881], edge_attr=[233164, 2], group_mask=[77881], label_mask=[77881])

In [8]:
result = runner.run()
print(result['best_loss'])

Sample data: 100%|██████████| 89/89 [00:00<00:00, 181.35it/s]
Sample data: 100%|██████████| 30/30 [00:00<00:00, 162.36it/s]
2024-07-26 17:27:01 - epoch 0 test:            
 {'accuracy': 0.887, 'cross_entropy': 0.334, 'f1_weighted': 0.866, 'calc_time': 0.014, 'main_metric': 0.887}
2024-07-26 17:27:04 - epoch 0 train:           
 {'accuracy': 0.887, 'cross_entropy': 0.331, 'f1_weighted': 0.866, 'calc_time': 0.041, 'main_metric': 0.887}
2024-07-26 17:27:31 - epoch 5 test:            
 {'accuracy': 0.893, 'cross_entropy': 0.294, 'f1_weighted': 0.875, 'calc_time': 0.01, 'main_metric': 0.893}
2024-07-26 17:27:33 - epoch 5 train:           
 {'accuracy': 0.891, 'cross_entropy': 0.29, 'f1_weighted': 0.874, 'calc_time': 0.038, 'main_metric': 0.891}
2024-07-26 17:28:02 - epoch 10 test:           
 {'accuracy': 0.896, 'cross_entropy': 0.275, 'f1_weighted': 0.88, 'calc_time': 0.014, 'main_metric': 0.896}
2024-07-26 17:28:04 - epoch 10 train:          
 {'accuracy': 0.894, 'cross_entropy': 0.266, '

{'accuracy': 0.896, 'cross_entropy': 0.275, 'f1_weighted': 0.88, 'calc_time': 0.014, 'main_metric': 0.896, 'tasks': {'y': {'accuracy': 0.8956955876399946, 'cross_entropy': 0.275429368019104, 'f1_weighted': 0.880179528219482}}, 'epoch': 10}




# 3. Use The Node Features when the data.x is empty <a class="anchor" id="p3"></a>
You can use node features even when there is data.x is empty <br>
In this case, data.x will be created <br>

In [9]:
amazonfraud = AntiFraud(root='./data', name='Amazon')

Using existing file ./data/amazon/Amazon_data.pt


In [10]:
amazonfraud.data.x = None

In [11]:
hyperunner = HypeRunner(
    amazonfraud.data,
    use_graph_node_features=True,
    overridses=['training.n_epochs=20']
)
hyperunner.data

Data(edge_index=[2, 8835152], edge_attr=[8835152, 12], y=[11944], x=[11944, 2], group_mask=[11944], label_mask=[11944])

In [12]:
result = hyperunner.optimize_run(n_trials=1)

Sample data: 100%|██████████| 36/36 [00:09<00:00,  3.81it/s]
Sample data: 100%|██████████| 12/12 [00:02<00:00,  4.98it/s]
[32m[I 2024-07-26 17:31:07,890][0m A new study created in memory with name: no-name-a8169872-d463-47e0-9842-31f2b9cd3461[0m
2024-07-26 17:31:17 - epoch 0 test:            
 {'accuracy': 0.928, 'cross_entropy': 0.217, 'f1_weighted': 0.894, 'calc_time': 0.005, 'main_metric': 0.928}
2024-07-26 17:31:18 - epoch 0 train:           
 {'accuracy': 0.932, 'cross_entropy': 0.216, 'f1_weighted': 0.9, 'calc_time': 0.016, 'main_metric': 0.932}
2024-07-26 17:31:30 - epoch 5 test:            
 {'accuracy': 0.928, 'cross_entropy': 0.236, 'f1_weighted': 0.894, 'calc_time': 0.01, 'main_metric': 0.928}
2024-07-26 17:31:31 - epoch 5 train:           
 {'accuracy': 0.932, 'cross_entropy': 0.238, 'f1_weighted': 0.9, 'calc_time': 0.024, 'main_metric': 0.932}
2024-07-26 17:31:45 - epoch 10 test:           
 {'accuracy': 0.928, 'cross_entropy': 0.223, 'f1_weighted': 0.894, 'calc_time': 

Study statistics: 
  Number of finished trials:  1
  Number of complete trials:  1
Best trial:
  Value:  0.928
  Params: 
{'conv_type': 'GraphConv', 'activation': 'leakyrelu', 'lin_prep_len': 1, 'lin_prep_dropout_rate': 0.4, 'lin_prep_weight_norm_flag': True, 'lin_prep_size_common': 512, 'lin_prep_sizes': [256], 'n_hops': 2, 'conv1_aggrs': {'mean': 128, 'max': 64, 'add': 32}, 'conv1_dropout_rate': 0.2, 'conv2_aggrs': {'mean': 64, 'max': 32, 'add': 16}, 'conv2_dropout_rate': 0.2, 'graph_conv_weight_norm_flag': True}


Even without node features, the accuracy is high