You can run this notebook directly on Google Colab

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/DaniAffCH/Vessel-Geometric-Transformers/blob/main/main.ipynb)

In [1]:
import sys
import warnings

warnings.filterwarnings('ignore')

COLAB_RUNTIME = 'google.colab' in sys.modules
!nvidia-smi &> /dev/null || echo -e "\e[31mWarning: No GPU found. Please check your runtime settings.\e[0m"
if COLAB_RUNTIME:
    !git config --global init.defaultBranch main
    !git init
    !git remote add origin https://github.com/DaniAffCH/Vessel-Geometric-Transformers.git
    !git pull origin main
    !pip install -q -r requirements.txt
else: # Development mode, setting precommit checks 
    !pip install -r requirements.txt
    !pre-commit autoupdate
    !pre-commit install


[https://github.com/psf/black] already up to date!
[https://github.com/pycqa/isort] already up to date!
[https://github.com/PyCQA/flake8] already up to date!
[https://github.com/pre-commit/mirrors-mypy] already up to date!
pre-commit installed at .git/hooks/pre-commit


Loading the configuration (customizable changing config.yaml)

In [2]:
from src.utils import load_config
import os

config_path = os.path.join("config","config.yaml")
config = load_config(config_path)

---

Loading the dataset and showing statistics

In [3]:
from src.data import VesselDataModule
from src.utils.data_analysis import data_info

data = VesselDataModule(config.dataset)

data_info(data)

Train size: 2999
Val size: 599
Test size: 401
One Sample: Data(pos=[9736, 3], wss=[9736, 3], pressure=[9736], face=[3, 19468], inlet_index=[248], label=Category.Single)


100%|██████████| 3999/3999 [00:02<00:00, 1398.55it/s]


Unnamed: 0,Mean,Median,Std,Min,Max
WSS,13011.76019,11387.0,4271.667184,5466,24800
POS,13011.76019,11387.0,4271.667184,5466,24800
FACE,26019.52038,22770.0,8543.334368,10928,49596
PRESSURE,13011.76019,11387.0,4271.667184,5466,24800


Showing label distribution to check whether train, validation and test set are balanced

In [None]:
import seaborn as sns
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

train_labels = data.train_set.getLabels()
val_labels = data.val_set.getLabels()
test_labels = data.test_set.getLabels()

combined_labels = np.concatenate([train_labels, val_labels, test_labels])
subsets = ['train'] * len(train_labels) + ['val'] * len(val_labels) + ['test'] * len(test_labels)

df = pd.DataFrame({'label': combined_labels, 'subset': subsets})

df['count'] = df.groupby(['subset', 'label'])['label'].transform('count')
df['total'] = df.groupby('subset')['label'].transform('count')
df['frequency'] = df['count'] / df['total']

df_normalized = df.drop_duplicates(subset=['label', 'subset'])

sns.set_theme(style='whitegrid')

plt.figure(figsize=(12, 6))

sns.barplot(x='subset', y='frequency', hue='label', data=df_normalized)

plt.title('Normalized Label Distribution Across Train, Validation, and Test Sets')
plt.xlabel('Dataset Subset')
plt.ylabel('Normalized Frequency')
plt.legend(title='Label', loc='upper right')
plt.xticks(rotation=0)
plt.tight_layout()
plt.show()


Plotting the data distribution projected in 2D using Principal Component Analysis.

In [None]:
from src.utils.definitions import Feature, Category
from src.utils.data_analysis import plot_data

wss, labels = data.extract_feature(Feature.WSS)
pos, labels = data.extract_feature(Feature.POS)
pressure, labels = data.extract_feature(Feature.PRESSURE)
face, labels = data.extract_feature(Feature.FACE)
plot_data(pos, labels, Category, "Position")
plot_data(wss, labels, Category, "Wall Shear Stress")
plot_data(pressure, labels, Category, "Pressure")
plot_data(face, labels, Category, "Face")

---

Performing equivariance check using random samples from the dataset.  
Geometric Layer is expected to fail due to lack of distance-aware dot product

In [None]:
from src.lib.geometricAlgebraElements import GeometricAlgebraBase
from src.test.test_equivariance import TestEquivariance
import unittest

dl = data.train_dataloader()

batch = next(iter(dl)).data[0]
batch = batch.view(-1, GeometricAlgebraBase.GA_size)[:10]
TestEquivariance.INPUT_DATA = batch

suite = unittest.TestSuite()
suite.addTests(unittest.TestLoader().loadTestsFromTestCase(TestEquivariance))
test_runner = unittest.TextTestRunner(verbosity=0)
restResult = test_runner.run(suite)

---

# MLP Baseline

In [None]:
from src.trainer import VesselTrainer
from src.models import BaselineMLP

model = BaselineMLP(config.mlp)
trainer = VesselTrainer(config.trainer, "mlp")
trainer.fit(model, data)

[34m[1mwandb[0m: Using wandb-core as the SDK backend. Please refer to https://wandb.me/wandb-core for more information.
[34m[1mwandb[0m: Currently logged in as: [33mneverorfrog[0m ([33mneverorfrog-sapienza[0m). Use [1m`wandb login --relogin`[0m to force relogin
[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /home/neverorfrog/.netrc


GPU available: False, used: False
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs

  | Name           | Type              | Params | Mode 
-------------------------------------------------------------
0 | loss_fn        | BCEWithLogitsLoss | 0      | train
1 | train_accuracy | BinaryAccuracy    | 0      | train
2 | val_accuracy   | BinaryAccuracy    | 0      | train
3 | test_accuracy  | BinaryAccuracy    | 0      | train
4 | train_f1       | BinaryF1Score     | 0      | train
5 | val_f1         | BinaryF1Score     | 0      | train
6 | test_f1        | BinaryF1Score     | 0      | train
7 | fc             | Sequential        | 307 K  | train
-------------------------------------------------------------
307 K     Trainable params
0         Non-trainable params
307 K     Total params
1.229     Total estimated model params size (MB)
13        Modules in train mode
0         Modules in eval mode


Epoch 0: 100%|██████████| 375/375 [00:04<00:00, 78.67it/s, v_num=9e0d, val/loss=0.156, val/acc=0.970, val/f1=0.970, train/loss=0.383, train/acc=0.900, train/f1=0.897]

Metric val/loss improved. New best score: 0.156
Epoch 0, global step 375: 'val/loss' reached 0.15650 (best 0.15650), saving model to '/home/neverorfrog/code/deep-learning/gatr/ckpt/epoch=0-step=375.ckpt' as top 1


Epoch 1: 100%|██████████| 375/375 [00:05<00:00, 71.77it/s, v_num=9e0d, val/loss=0.0851, val/acc=0.980, val/f1=0.980, train/loss=0.106, train/acc=0.980, train/f1=0.981]

Metric val/loss improved by 0.071 >= min_delta = 1e-05. New best score: 0.085
Epoch 1, global step 750: 'val/loss' reached 0.08509 (best 0.08509), saving model to '/home/neverorfrog/code/deep-learning/gatr/ckpt/epoch=1-step=750.ckpt' as top 1


Epoch 2: 100%|██████████| 375/375 [00:04<00:00, 87.90it/s, v_num=9e0d, val/loss=0.0851, val/acc=0.980, val/f1=0.980, train/loss=0.106, train/acc=0.980, train/f1=0.981]

Exception ignored in: <function _releaseLock at 0x7da8e2fb0ea0>
Traceback (most recent call last):
  File "/home/neverorfrog/.miniconda3/envs/gatr/lib/python3.11/logging/__init__.py", line 237, in _releaseLock
    def _releaseLock():
    
KeyboardInterrupt: 


Epoch 2: 100%|██████████| 375/375 [00:05<00:00, 70.30it/s, v_num=9e0d, val/loss=0.0661, val/acc=0.980, val/f1=0.980, train/loss=0.0685, train/acc=0.983, train/f1=0.983]

Metric val/loss improved by 0.019 >= min_delta = 1e-05. New best score: 0.066
Epoch 2, global step 1125: 'val/loss' reached 0.06606 (best 0.06606), saving model to '/home/neverorfrog/code/deep-learning/gatr/ckpt/epoch=2-step=1125.ckpt' as top 1


Epoch 3: 100%|██████████| 375/375 [00:05<00:00, 69.42it/s, v_num=9e0d, val/loss=0.0571, val/acc=0.982, val/f1=0.981, train/loss=0.0517, train/acc=0.985, train/f1=0.986]

Metric val/loss improved by 0.009 >= min_delta = 1e-05. New best score: 0.057
Epoch 3, global step 1500: 'val/loss' reached 0.05706 (best 0.05706), saving model to '/home/neverorfrog/code/deep-learning/gatr/ckpt/epoch=3-step=1500.ckpt' as top 1


Epoch 4: 100%|██████████| 375/375 [00:04<00:00, 86.99it/s, v_num=9e0d, val/loss=0.0571, val/acc=0.982, val/f1=0.981, train/loss=0.0517, train/acc=0.985, train/f1=0.986]

In [None]:
trainer.test(model, data)

# Attention based Baseline

Running hyperparameter optimization to find the best hyperparameters maximizing the validation accuracy

In [None]:
from src.utils.hpo import baseline_hpo

baseline_hpo(config, data) # Hyperparameter optimization: writes the config file with the best hyperparameters

Training the model using the best hyperparameters

In [None]:
from src.trainer import VesselTrainer
from src.models import BaselineTransformer

model = BaselineTransformer(config.baseline)
trainer = VesselTrainer(config.trainer, "transformer")
trainer.fit(model, data)

Test the model performance on unseen test data.

In [None]:
trainer.test(model, data)

---

# GATr

Running hyperparameter optimization to find the best hyperparameters maximizing the validation accuracy

In [None]:
from src.models import Gatr
from src.utils.hpo import gatr_hpo

gatr_hpo(config, data) # Hyperparameter optimization: writes the config file with the best hyperparameters

Training the model using the best hyperparameters

In [None]:
from src.trainer import VesselTrainer

model = Gatr(config.gatr)
trainer = VesselTrainer(config.trainer, "gatr")
trainer.fit(model, data)

Test the model performance on unseen test data.

In [None]:
trainer.test(model, data)

----