# How to work with datasets in CoolGraph <a name="toc"></a>

1. [Uploading a dataset](#p1)
2. [Using data from a dataset](#p2)
3. [Uploading a dataset using a name](#p2)

CoolGraph implements some graph datasets that are not available in other libraries \
The syntax for working with them is very similar to the syntax of PyG

# Uploading a dataset <a name="p1"></a>
all datasets are located in the cool_graph.datasets module. To load a dataset, you need to import a class with it.

In [2]:
from cool_graph.datasets import S_FFSD, AntiFraud
from cool_graph.runners import Runner, HypeRunner

When creating a dataset, you need to specify the folder where the file with it will be saved. \
If you are creating a dataset for the first time, the raw file is downloaded from the Internet.

In [3]:
s_ffsd = S_FFSD(root='./data/s_ffsd')

Downloading https://drive.usercontent.google.com/download?id=1pODQWJFS7-dwUmnwl6YNFYQ17241j26b&export=download&confirm=t
Preprocessing 
Source: 100%|██████████| 30346/30346 [00:05<00:00, 5136.23it/s]
Target: 100%|██████████| 886/886 [00:00<00:00, 1555.96it/s]
Location: 100%|██████████| 296/296 [00:00<00:00, 658.03it/s]
Type: 100%|██████████| 166/166 [00:00<00:00, 301.12it/s]
dataset saved as ./data/s_ffsd/S-FFSD_data.pt


When you re-create the dataset using the same path, the downloaded raw file will be used

In [4]:
s_ffsd_copy = S_FFSD(root='./data/s_ffsd')

Using existing file ./data/s_ffsd/S-FFSD_data.pt


# 2. Using data from a dataset <a name="p2"></a>

Graph in the format "torch_geometric.data.data.Data" is located in the data field of the dataset

In [5]:
data = s_ffsd.data

In [6]:
runner = Runner(data)
result = runner.run()

Sample data: 100%|██████████| 89/89 [00:00<00:00, 455.38it/s]
Sample data: 100%|██████████| 30/30 [00:00<00:00, 459.25it/s]
2024-07-26 03:12:11 - epoch 0 test:            
 {'accuracy': 0.878, 'cross_entropy': 0.33, 'f1_weighted': 0.855, 'calc_time': 0.009, 'main_metric': 0.878}
2024-07-26 03:12:12 - epoch 0 train:           
 {'accuracy': 0.889, 'cross_entropy': 0.311, 'f1_weighted': 0.868, 'calc_time': 0.026, 'main_metric': 0.889}
2024-07-26 03:12:30 - epoch 5 test:            
 {'accuracy': 0.884, 'cross_entropy': 0.293, 'f1_weighted': 0.867, 'calc_time': 0.007, 'main_metric': 0.884}
2024-07-26 03:12:31 - epoch 5 train:           
 {'accuracy': 0.896, 'cross_entropy': 0.268, 'f1_weighted': 0.881, 'calc_time': 0.021, 'main_metric': 0.896}
2024-07-26 03:12:49 - epoch 10 test:           
 {'accuracy': 0.884, 'cross_entropy': 0.284, 'f1_weighted': 0.866, 'calc_time': 0.009, 'main_metric': 0.884}
2024-07-26 03:12:50 - epoch 10 train:          
 {'accuracy': 0.9, 'cross_entropy': 0.248, '

# 3. Uploading a dataset using a name <a name="p3"></a>

If there are several datasets in the class, specify the name of the dataset you want to work with in the arguments.

In [7]:
yelpchi = AntiFraud(root='./data/yelpchi', name='YelpChi')

Using existing file ./data/yelpchi/yelpchi/YelpChi_data.pt


In [8]:
runner = HypeRunner(yelpchi.data)
result = runner.optimize_run(n_trials=1)

Sample data: 100%|██████████| 138/138 [00:33<00:00,  4.13it/s]
Sample data: 100%|██████████| 46/46 [00:11<00:00,  4.12it/s]
[32m[I 2024-07-26 03:14:07,905][0m A new study created in memory with name: no-name-9983a2a8-a5e3-4607-b99e-aadfc94bed2c[0m
2024-07-26 03:14:17 - epoch 0 test:              
 {'accuracy': 0.854, 'cross_entropy': 0.359, 'f1_weighted': 0.787, 'calc_time': 0.022, 'main_metric': 0.854}
2024-07-26 03:14:20 - epoch 0 train:             
 {'accuracy': 0.855, 'cross_entropy': 0.354, 'f1_weighted': 0.788, 'calc_time': 0.045, 'main_metric': 0.855}
2024-07-26 03:14:53 - epoch 5 test:              
 {'accuracy': 0.872, 'cross_entropy': 0.324, 'f1_weighted': 0.856, 'calc_time': 0.023, 'main_metric': 0.872}
2024-07-26 03:14:55 - epoch 5 train:             
 {'accuracy': 0.872, 'cross_entropy': 0.322, 'f1_weighted': 0.855, 'calc_time': 0.045, 'main_metric': 0.872}
2024-07-26 03:15:25 - epoch 10 test:             
 {'accuracy': 0.884, 'cross_entropy': 0.294, 'f1_weighted': 0.8

Study statistics: 
  Number of finished trials:  1
  Number of complete trials:  1
Best trial:
  Value:  0.889
  Params: 
{'conv_type': 'GraphConv', 'activation': 'leakyrelu', 'lin_prep_len': 1, 'lin_prep_dropout_rate': 0.4, 'lin_prep_weight_norm_flag': True, 'lin_prep_size_common': 512, 'lin_prep_sizes': [256], 'n_hops': 2, 'conv1_aggrs': {'mean': 128, 'max': 64, 'add': 32}, 'conv1_dropout_rate': 0.2, 'conv2_aggrs': {'mean': 64, 'max': 32, 'add': 16}, 'conv2_dropout_rate': 0.2, 'graph_conv_weight_norm_flag': True}
