# Graph Neural Network based Regression


The goal of this notebook is to provide a few examples for model training and testing. The codebase is written as a python script with argparse, and thus can be run from the terminal shell. 

```
python main.py
```

## Setting up the directory

The final directory should have the following structure.

```
.
├── ckpt/
├── data
│   ├── top_gun_opendata_0.parquet
│   ├── top_gun_opendata_1.parquet
│   ├── top_gun_opendata_2.parquet
│   ├── top_gun_opendata_3.parquet
│   ├── top_gun_opendata_4.parquet
│   ├── top_gun_opendata_5.parquet
│   └── top_gun_opendata_6.parquet
├── dataset.py
├── dataset_utils.py
├── example.ipynb
├── gps_layer.py
├── layers.py
├── main.py
├── model.py
├── README.md
├── run_tests.py
├── tester.py
├── tests.py
├── trainer.py
├── train_utils.py
└── transforms.py
```

The `data` directory will be populated in the next steps, the remaining structure should be correct. The `ckpt` directory will automatically be created by the python script.

In [None]:
!mkdir data
%cd data

## Downloading/Extracting the dataset

In [None]:
!wget https://cernbox.cern.ch/index.php/s/cmVxUG4GJzRWKWV/download

In [None]:
!tar -xvf download
!rm download

In [None]:
%cd ..

## Installing the requirements

In [None]:
!pip install -r requirements.txt

## Running the training script

### Login using the Wandb account

In [None]:
!wandb login

The training is called from the `main.py`. Since, it is based on argparse the `help` will provide all the arguments of the training and a description of the argument.

In [None]:
!python main.py --help

### Training best (GraphGPS+Performer+GatedGCN) network

In [None]:
!python main.py --model gps --gps_mpnn_type gatedgcn --gps_global_type performer --scale_histogram --use_pe --num_pe_scales 10 --num_gps_layers 5 --lr 5e-4 --optim adamw --name gps+perf+gatedgcn_5_full --train_batch_size 16 --val_batch_size 16 --test_batch_size 16 --num_epochs 36 --sched_type ca_wm

Explanation: 
- `model` specifies the model type. Here `gps` for the GraphGPS
- `gps_mpnn_type` specifies the local GNN model type used by GraphGPS. Here `gatedgcn` for the ResGatedGCN module
- `gps_global_type` specifies the global self attention model type used by GraphGPS. Here `perfomer` stands for the Performer Self Attention
- `scale_histogram` is to scale all inputs and output to $(0-1]$ 
- `use_pe` is to use Positional Encoding on the inputs
- `num_pe_scales` specifies the number of power scales to use in positional encoding
- `num_gps_layers` specifies the number of GraphGPS layers to stack. 
- `lr` specifies the starting learning rate to be used for training
- `optim` specifies the optimizer type
- `name` specifies the name to be used to log on Weights and Biases. To prevent logging on Weights and Biases `--debug` can be used
- `train/val/test_batch_size` each specify the batch size to use for train/val/test
- `num_epochs` specifies the number of epoch to use for training
- `sched_type` specifies the learning rate scheduler to use. Here `ca_wm` for cosine annealing with warm restarts.

The final model is saved in `ckpt` directory.