# CPRD Notebook:
## Evaluation of fine-tuning the pre-trained SurvivEHR-CR model on a supervised cohort study.

Cohort study: predicting Cardiovascular Disease in a Type 2 Diabetes Mellitus population.

This notebook quantifies the performance obtained when fine-tuning the pre-trained model to a sub-population.

In [1]:
import os
from pathlib import Path
import sys
node_type = os.getenv('BB_CPU')
venv_dir = f'/rds/homes/g/gaddcz/Projects/CPRD/virtual-envTorch2.0-{node_type}'
venv_site_pkgs = Path(venv_dir) / 'lib' / f'python{sys.version_info.major}.{sys.version_info.minor}' / 'site-packages'
if venv_site_pkgs.exists():
    sys.path.insert(0, str(venv_site_pkgs))
    print(f"Added path '{venv_site_pkgs}' at start of search paths.")
else:
    print(f"Path '{venv_site_pkgs}' not found. Check that it exists and/or that it exists for node-type '{node_type}'.")

%load_ext autoreload
%autoreload 2

print(os.getcwd())

Added path '/rds/homes/g/gaddcz/Projects/CPRD/virtual-envTorch2.0-icelake/lib/python3.10/site-packages' at start of search paths.
/rds/homes/g/gaddcz/Projects/CPRD/examples/modelling/SurvStreamGPT/notebooks/CompetingRisk/CVD


In [2]:
import torch
import numpy as np
import matplotlib.pyplot as plt
import logging
import wandb
from tqdm import tqdm
import pickle
from hydra import compose, initialize
from omegaconf import OmegaConf
from CPRD.examples.modelling.SurvStreamGPT.run_experiment import run
from CPRD.data.foundational_loader import FoundationalDataModule
from CPRD.src.models.survival.task_heads.causal import SurvStreamGPTForCausalModelling

import time
import pyarrow.dataset as ds
import pyarrow.parquet as pq
import os
import polars as pl
pl.Config.set_tbl_rows(10000)
import pandas as pd
pd.options.display.max_rows = 10000

torch.manual_seed(1337)
torch.set_float32_matmul_precision('medium')

logging.basicConfig(level=logging.INFO)
device = 'cuda' if torch.cuda.is_available() else 'cpu'
# device = "cpu"    # if more informative debugging statements are needed
print(f"Using device: {device}.")

 # TODO: define an env variable to fix for a local hpc environment issue, this shouldn't be needed
%env SLURM_NTASKS_PER_NODE=28   

Using device: cuda.
env: SLURM_NTASKS_PER_NODE=28


# Fine-tuning on full dataset
The default configuration is for pre-training. Here we modify as necesssary

Here we choose to load in the configuration for a small **pre-trained** 11.4M parameter model, named "CR_11M". We specfiy the `fine-tune` experiment type, which will lead to running the ```SupervisedExperiment```. 

We tell this experiment that we want to perform training (true by default). Additionally, we do choose to perform testing (true by default). As this is a supervised model, this tests the ability to predict the outcomes of interest. In this notebook, this is chosen to be those of the cohort study for predicting Cardiovascular Disease in a Type 2 Diabetes Mellitus population, and we add the folder containing this dataset to the configuration. 

```Note: As this is a supervised dataset, we need to tell the DataModule that the last event observed is a target and must be stripped. This is done by passing a list of targets to the configuration, overriding the null default. This lets the DataModule know that it should process batches as supervised.```

We set the number of workers to be appropriate for the number of CPUs available to reduce bottlenecking, and tell the experiment that we do not want to limit the number of testing batches. In addition, we specify where we want any checkpoints to be saved to avoid bloating the repository.

We design a new optimisation strategy for fine-tuning. Pre-training was achieved with a warmup and cosine annealing, with rates which are no appropriate for much smaller dataset sizes seen in clinical prediction models (CPMs). We here choose a simpler strategy: of ReduceOnPlateau with no warmup, increasing the number of epochs (default is 1) and reduced validation intervals, and the addition of early stopping. Additionally, as this is not a causal model we can increase the batch size. Finally, as this CPM is not trying to predict the value of any outcomes, we set the value weight to zero allowing the model to focus entirely on optimising survival outcome prediction.

In [4]:
pre_trained_model_ids = ['SurvivEHR-cr-small', 'SurvivEHR-cr-small-v1', 'SurvivEHR-cr', 'SurvivEHR-cr-v1', 'SurvivEHR-cr-v1-v1', 'SurvivEHR-cr-384', 'SurvivEHR-cr-384-v1',]
# pre_trained_models = ['SurvivEHR-cr-v1-v1', ] # SurvivEHR-cr-small  notebook-test3 [""]   # , "NULL-CR" "CR_11M_24_11_01_big_posencscale_"
experiments = ["cvd"] #, "hypertension"] 
experiment_types = ["fine-tune-cr"]#, "fine-tune-sr"]

sweep = [#[1, 6, 126], 
         #[6, 1, 128], 
         #[1, 6, 384],
         #[6, 1, 384],
         [2, 2, 256]
        ]
names = ['SCRATCH_SWEEP ' + str(i) for i in sweep]

# for pre_trained_model in pre_trained_model_ids[0:1]:
for sweep_i in range(len(sweep)):
    pre_trained_model = names[sweep_i]

    for experiment in experiments:
    
        for experiment_type in experiment_types:

            wandb.finish()
            # load the configuration file, override any settings 
            with initialize(version_base=None, config_path="../../../confs", job_name="testing_notebook"):
                cfg = compose(config_name="config_CompetingRisk11M", 
                              overrides=[# Experiment setup
                                         f"experiment.type='{experiment_type}'",
                                         f"experiment.run_id='{pre_trained_model}'",
                                         f"experiment.fine_tune_id='{experiment}-{experiment_type}-notebook'",
                                         "experiment.train=True",
                                         "experiment.test=True",
                                         "experiment.notes=Testing run stopped early",
                                         # Dataloader
                                         "data.batch_size=256",
                                         "data.meta_information_path=/rds/projects/g/gokhalkm-optimal/OPTIMAL_MASTER_DATASET/data/FoundationalModel/PreTrain/meta_information_QuantJenny.pickle",
                                         "data.min_workers=3",
                                         # Optimiser
                                         "optim.num_epochs=20",
                                         "optim.limit_test_batches=null",
                                         "optim.scheduler=ReduceOnPlateau",
                                         "optim.scheduler_warmup=False",
                                         "optim.learning_rate=1e-3",
                                         "optim.val_check_interval=50",
                                         "optim.early_stop=True",
                                         "optim.early_stop_patience=4",
                                         "optim.limit_val_batches=0.035",
                                         # Model
                                         f"transformer.n_layer={sweep[sweep_i][0]}",  # 2, 1, 1, 6,
                                         f"transformer.n_head={sweep[sweep_i][1]}", # 2, 6, 6, 1,
                                         f"transformer.n_embd={sweep[sweep_i][2]}", # 128, 384, 126. 128
                                         "transformer.block_size=512", # 512, 512
                                         # "transformer.n_embd=1024",
                                        ]
                             )
            
            
            match experiment.lower():
                case "cvd":
                    cfg.data.path_to_ds="/rds/projects/g/gokhalkm-optimal/OPTIMAL_MASTER_DATASET/data/FoundationalModel/FineTune_CVD/"
                    cfg.experiment.fine_tune_outcomes=["IHDINCLUDINGMI_OPTIMALV2", "ISCHAEMICSTROKE_V2", "MINFARCTION", "STROKEUNSPECIFIED_V2", "STROKE_HAEMRGIC"]
                case "hypertension":
                    cfg.data.path_to_ds="/rds/projects/g/gokhalkm-optimal/OPTIMAL_MASTER_DATASET/data/FoundationalModel/FineTune_Hypertension/"
                    cfg.experiment.fine_tune_outcomes=["HYPERTENSION"]
            
            
            model, dm = run(cfg)
            print(f"Loaded model with {sum(p.numel() for p in model.parameters())/1e6} M parameters")
            wandb.finish()


INFO:root:Running cr on 72 CPUs and 1 GPUs
INFO:root:# Loading DataModule for dataset /rds/projects/g/gokhalkm-optimal/OPTIMAL_MASTER_DATASET/data/FoundationalModel/FineTune_CVD/. This will be loaded in supervised form.
INFO:root:Creating supervised collator for DataModule
INFO:root:Using meta information from /rds/projects/g/gokhalkm-optimal/OPTIMAL_MASTER_DATASET/data/FoundationalModel/PreTrain/meta_information_QuantJenny.pickle
INFO:root:Using train file-row count dictionary from /rds/projects/g/gokhalkm-optimal/OPTIMAL_MASTER_DATASET/data/FoundationalModel/FineTune_CVD/file_row_count_dict_train.pickle
INFO:root:Using test file-row count dictionary from /rds/projects/g/gokhalkm-optimal/OPTIMAL_MASTER_DATASET/data/FoundationalModel/FineTune_CVD/file_row_count_dict_test.pickle
INFO:root:Using val file-row count dictionary from /rds/projects/g/gokhalkm-optimal/OPTIMAL_MASTER_DATASET/data/FoundationalModel/FineTune_CVD/file_row_count_dict_val.pickle
INFO:root:Tokenzier created based on 

/rds/bear-apps/2022a/EL8-ice/software/PyTorch-Lightning/2.1.0-foss-2022a-CUDA-11.7.0/lib/python3.10/site-packages/pytorch_lightning/callbacks/model_checkpoint.py:630: Checkpoint directory /rds/projects/s/subramaa-mum-predict/CharlesGadd_Oxford/FoundationModelOutput/checkpoints exists and is not empty.
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
INFO:root:Using ReduceLROnPlateau scheduler
INFO:root:Not using warm-up in scheduler

  | Name       | Type                            | Params
---------------------------------------------------------------
0 | model      | SurvStreamGPTForCausalModelling | 1.9 M 
1 | surv_layer | ODESurvCompetingRiskLayer       | 19.0 K
---------------------------------------------------------------
1.9 M     Trainable params
30        Non-trainable params
1.9 M     Total params
7.687     Total estimated model params size (MB)
SLURM auto-requeueing enabled. Setting signal handlers.


Sanity Checking: |          | 0/? [00:00<?, ?it/s]

Training: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Metric val_loss improved. New best score: 0.576
Epoch 0, global step 50: 'val_loss' reached 0.57608 (best 0.57608), saving model to '/rds/projects/s/subramaa-mum-predict/CharlesGadd_Oxford/FoundationModelOutput/checkpoints/SCRATCH_SWEEP [2, 2, 256]_cvd-fine-tune-cr-notebook.ckpt' as top 1


Validation: |          | 0/? [00:00<?, ?it/s]

Metric val_loss improved by 0.003 >= min_delta = 0. New best score: 0.573
Epoch 0, global step 100: 'val_loss' reached 0.57281 (best 0.57281), saving model to '/rds/projects/s/subramaa-mum-predict/CharlesGadd_Oxford/FoundationModelOutput/checkpoints/SCRATCH_SWEEP [2, 2, 256]_cvd-fine-tune-cr-notebook.ckpt' as top 1


Validation: |          | 0/? [00:00<?, ?it/s]

Metric val_loss improved by 0.008 >= min_delta = 0. New best score: 0.564
Epoch 0, global step 150: 'val_loss' reached 0.56444 (best 0.56444), saving model to '/rds/projects/s/subramaa-mum-predict/CharlesGadd_Oxford/FoundationModelOutput/checkpoints/SCRATCH_SWEEP [2, 2, 256]_cvd-fine-tune-cr-notebook.ckpt' as top 1


Validation: |          | 0/? [00:00<?, ?it/s]

Metric val_loss improved by 0.002 >= min_delta = 0. New best score: 0.563
Epoch 0, global step 200: 'val_loss' reached 0.56272 (best 0.56272), saving model to '/rds/projects/s/subramaa-mum-predict/CharlesGadd_Oxford/FoundationModelOutput/checkpoints/SCRATCH_SWEEP [2, 2, 256]_cvd-fine-tune-cr-notebook.ckpt' as top 1


Validation: |          | 0/? [00:00<?, ?it/s]

Epoch 0, global step 250: 'val_loss' was not in top 1


Validation: |          | 0/? [00:00<?, ?it/s]

Epoch 0, global step 300: 'val_loss' was not in top 1


Validation: |          | 0/? [00:00<?, ?it/s]

Epoch 0, global step 350: 'val_loss' was not in top 1


Validation: |          | 0/? [00:00<?, ?it/s]

Metric val_loss improved by 0.002 >= min_delta = 0. New best score: 0.561
Epoch 0, global step 400: 'val_loss' reached 0.56093 (best 0.56093), saving model to '/rds/projects/s/subramaa-mum-predict/CharlesGadd_Oxford/FoundationModelOutput/checkpoints/SCRATCH_SWEEP [2, 2, 256]_cvd-fine-tune-cr-notebook.ckpt' as top 1


Validation: |          | 0/? [00:00<?, ?it/s]

Epoch 0, global step 450: 'val_loss' was not in top 1


Validation: |          | 0/? [00:00<?, ?it/s]

Metric val_loss improved by 0.004 >= min_delta = 0. New best score: 0.557
Epoch 0, global step 500: 'val_loss' reached 0.55715 (best 0.55715), saving model to '/rds/projects/s/subramaa-mum-predict/CharlesGadd_Oxford/FoundationModelOutput/checkpoints/SCRATCH_SWEEP [2, 2, 256]_cvd-fine-tune-cr-notebook.ckpt' as top 1


Validation: |          | 0/? [00:00<?, ?it/s]

Epoch 0, global step 550: 'val_loss' was not in top 1


Validation: |          | 0/? [00:00<?, ?it/s]

Epoch 0, global step 600: 'val_loss' was not in top 1


Validation: |          | 0/? [00:00<?, ?it/s]

Epoch 0, global step 650: 'val_loss' was not in top 1


Validation: |          | 0/? [00:00<?, ?it/s]

Monitored metric val_loss did not improve in the last 4 records. Best score: 0.557. Signaling Trainer to stop.
Epoch 0, global step 700: 'val_loss' was not in top 1
INFO:root:Re-loading from best cached checkpoint /rds/projects/s/subramaa-mum-predict/CharlesGadd_Oxford/FoundationModelOutput/checkpoints/SCRATCH_SWEEP [2, 2, 256]_cvd-fine-tune-cr-notebook.ckpt
INFO:root:Using Temporal Positional Encoding. This module uses the patient's age at an event within their time series.
INFO:root:Using Competing-Risk DeSurv head.
INFO:root:In generation forwarding DeSurv on the grid between [0.0, 1.0] with 1000 intervals
INFO:root:Training all Transformer parameters
INFO:root:Using Competing-Risk DeSurv head.
INFO:root:In generation forwarding DeSurv on the grid between [0.0, 1.0] with 1000 intervals
INFO:root:Testing model.
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
SLURM auto-requeueing enabled. Setting signal handlers.


Testing: |          | 0/? [00:00<?, ?it/s]

Loaded model with 1.921646 M parameters


VBox(children=(Label(value='19.185 MB of 19.220 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=0.9981…

0,1
Scheduler,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
Test:OutcomePerformanceMetricsctd,▁
Test:OutcomePerformanceMetricsibs,▁
Test:OutcomePerformanceMetricsinbll,▁
Val:OutcomePerformanceMetricsctd,▁▂▄▇▆▃▅▆▇█▅▅▂▁
Val:OutcomePerformanceMetricsibs,█▄▅▃▄▅▅▃▂▁▄▂▅▅
Val:OutcomePerformanceMetricsinbll,█▄▄▃▄▅▅▃▁▁▄▂▄▄
epoch,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█
test_loss,▁
train_loss,▆▅▃▆▅▆▅▁▂▄▅▃▅▄▄▄▁▄▃▄▅▄▄▆▃▄▂▄█▃▃▄▂▆▃

0,1
Scheduler,0.001
Test:OutcomePerformanceMetricsctd,0.64082
Test:OutcomePerformanceMetricsibs,0.03375
Test:OutcomePerformanceMetricsinbll,0.14452
Val:OutcomePerformanceMetricsctd,0.62932
Val:OutcomePerformanceMetricsibs,0.0364
Val:OutcomePerformanceMetricsinbll,0.15587
epoch,1.0
test_loss,0.57203
train_loss,0.52162


In [13]:
wandb.finish()
# [_i if _i == _i.upper() else 0 for _i in dm.train_set.tokenizer._stoi.keys()]

# Fine-tuning on sub-set of data

In [4]:
pre_trained_models = ["CR_11M_24_11_01_big_posencscale_"]   # , "NULL-CR"
experiments = ["cvd"]  # "hypertension"
experiment_types = ["fine-tune-sr"] #"fine-tune-cr", 

for pre_trained_model in pre_trained_models:

    for experiment in experiments:
    
        for experiment_type in experiment_types:

            for sample_size in [3000, 12500, 30000, 60000, 100000]: # 600, 1200, 

                wandb.finish()
                # load the configuration file, override any settings 
                with initialize(version_base=None, config_path="../../../confs", job_name="testing_notebook"):
                    cfg = compose(config_name="config_CompetingRisk37M", 
                                  overrides=[# Experiment setup
                                             f"experiment.type='{experiment_type}'",
                                             f"experiment.run_id='{pre_trained_model}'",
                                             f"experiment.fine_tune_id='{experiment}-{experiment_type}-{sample_size}-notebook'",
                                             "experiment.train=True",
                                             "experiment.test=True",
                                             # Dataloader
                                             "data.batch_size=256",
                                             "data.meta_information_path=/rds/projects/g/gokhalkm-optimal/OPTIMAL_MASTER_DATASET/data/FoundationalModel/PreTrain/meta_information_QuantJenny.pickle",
                                             "data.min_workers=12",
                                             f"data.subsample_training={sample_size}",
                                             # Optimiser
                                             "optim.num_epochs=500",
                                             "optim.scheduler=ReduceOnPlateau",
                                             "optim.scheduler_warmup=False",
                                             "optim.learning_rate=1e-3",
                                             "optim.val_check_interval=1.0",
                                             "optim.limit_val_batches=0.25",
                                             "optim.limit_test_batches=null",
                                             "optim.early_stop=True",
                                             "optim.early_stop_patience=1",
                                            ]
                                 )
                
                
                match experiment.lower():
                    case "cvd":
                        cfg.data.path_to_ds="/rds/projects/g/gokhalkm-optimal/OPTIMAL_MASTER_DATASET/data/FoundationalModel/FineTune_CVD/"
                        cfg.experiment.fine_tune_outcomes=["IHDINCLUDINGMI_OPTIMALV2", "ISCHAEMICSTROKE_V2", "MINFARCTION", "STROKEUNSPECIFIED_V2", "STROKE_HAEMRGIC"]
                    case "hypertension":
                        cfg.data.path_to_ds="/rds/projects/g/gokhalkm-optimal/OPTIMAL_MASTER_DATASET/data/FoundationalModel/FineTune_Hypertension/"
                        cfg.experiment.fine_tune_outcomes=["HYPERTENSION"]
                
                
                model, dm = run(cfg)
                print(f"Loaded model with {sum(p.numel() for p in model.parameters())/1e6} M parameters")
                wandb.finish()


INFO:root:Running cr on 72 CPUs and 1 GPUs
INFO:root:# Loading DataModule for dataset /rds/projects/g/gokhalkm-optimal/OPTIMAL_MASTER_DATASET/data/FoundationalModel/FineTune_CVD/. This will be loaded in supervised form.
INFO:root:Creating supervised collator for DataModule
INFO:root:Using meta information from /rds/projects/g/gokhalkm-optimal/OPTIMAL_MASTER_DATASET/data/FoundationalModel/PreTrain/meta_information_QuantJenny.pickle
INFO:root:Using train file-row count dictionary from /rds/projects/g/gokhalkm-optimal/OPTIMAL_MASTER_DATASET/data/FoundationalModel/FineTune_CVD/file_row_count_dict_train.pickle
INFO:root:Using test file-row count dictionary from /rds/projects/g/gokhalkm-optimal/OPTIMAL_MASTER_DATASET/data/FoundationalModel/FineTune_CVD/file_row_count_dict_test.pickle
INFO:root:Using val file-row count dictionary from /rds/projects/g/gokhalkm-optimal/OPTIMAL_MASTER_DATASET/data/FoundationalModel/FineTune_CVD/file_row_count_dict_val.pickle
INFO:root:Tokenzier created based on 

/rds/bear-apps/2022a/EL8-ice/software/PyTorch-Lightning/2.1.0-foss-2022a-CUDA-11.7.0/lib/python3.10/site-packages/pytorch_lightning/callbacks/model_checkpoint.py:630: Checkpoint directory /rds/projects/s/subramaa-mum-predict/CharlesGadd_Oxford/FoundationModelOutput/checkpoints exists and is not empty.
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
INFO:root:Using ReduceLROnPlateau scheduler
INFO:root:Not using warm-up in scheduler

  | Name       | Type                            | Params
---------------------------------------------------------------
0 | model      | SurvStreamGPTForCausalModelling | 129 M 
1 | surv_layer | ODESurvSingleRiskLayer          | 34.0 K
2 | dropout    | Dropout                         | 0     
---------------------------------------------------------------
33.9 K    Trainable params
129 M     Non-trainable params
129 M     Total params
517.561   Total estimated model params size (MB)
SLURM auto-requeueing enabled. Setting signal handlers.


Sanity Checking: |          | 0/? [00:00<?, ?it/s]

/rds/bear-apps/2022a/EL8-ice/software/PyTorch-Lightning/2.1.0-foss-2022a-CUDA-11.7.0/lib/python3.10/site-packages/pytorch_lightning/loops/fit_loop.py:293: The number of training batches (12) is smaller than the logging interval Trainer(log_every_n_steps=20). Set a lower value for log_every_n_steps if you want to see logs for the training epoch.


Training: |          | 0/? [00:00<?, ?it/s]

/rds/bear-apps/2022a/EL8-ice/software/PyTorch-Lightning/2.1.0-foss-2022a-CUDA-11.7.0/lib/python3.10/site-packages/pytorch_lightning/loops/training_epoch_loop.py:377: ReduceLROnPlateau conditioned on metric val_loss which is not available but strict is set to `False`. Skipping learning rate update.


Validation: |          | 0/? [00:00<?, ?it/s]

Metric val_loss improved. New best score: 0.806
Epoch 0, global step 12: 'val_loss' reached 0.80616 (best 0.80616), saving model to '/rds/projects/s/subramaa-mum-predict/CharlesGadd_Oxford/FoundationModelOutput/checkpoints/CR_11M_24_11_01_big_posencscale__cvd-fine-tune-sr-3000-notebook.ckpt' as top 1


Validation: |          | 0/? [00:00<?, ?it/s]

Metric val_loss improved by 0.297 >= min_delta = 0. New best score: 0.509
Epoch 1, global step 24: 'val_loss' reached 0.50874 (best 0.50874), saving model to '/rds/projects/s/subramaa-mum-predict/CharlesGadd_Oxford/FoundationModelOutput/checkpoints/CR_11M_24_11_01_big_posencscale__cvd-fine-tune-sr-3000-notebook.ckpt' as top 1


Validation: |          | 0/? [00:00<?, ?it/s]

Metric val_loss improved by 0.119 >= min_delta = 0. New best score: 0.389
Epoch 2, global step 36: 'val_loss' reached 0.38925 (best 0.38925), saving model to '/rds/projects/s/subramaa-mum-predict/CharlesGadd_Oxford/FoundationModelOutput/checkpoints/CR_11M_24_11_01_big_posencscale__cvd-fine-tune-sr-3000-notebook.ckpt' as top 1


Validation: |          | 0/? [00:00<?, ?it/s]

Metric val_loss improved by 0.003 >= min_delta = 0. New best score: 0.387
Epoch 3, global step 48: 'val_loss' reached 0.38666 (best 0.38666), saving model to '/rds/projects/s/subramaa-mum-predict/CharlesGadd_Oxford/FoundationModelOutput/checkpoints/CR_11M_24_11_01_big_posencscale__cvd-fine-tune-sr-3000-notebook.ckpt' as top 1


Validation: |          | 0/? [00:00<?, ?it/s]

Metric val_loss improved by 0.009 >= min_delta = 0. New best score: 0.377
Epoch 4, global step 60: 'val_loss' reached 0.37741 (best 0.37741), saving model to '/rds/projects/s/subramaa-mum-predict/CharlesGadd_Oxford/FoundationModelOutput/checkpoints/CR_11M_24_11_01_big_posencscale__cvd-fine-tune-sr-3000-notebook.ckpt' as top 1


Validation: |          | 0/? [00:00<?, ?it/s]

Metric val_loss improved by 0.003 >= min_delta = 0. New best score: 0.375
Epoch 5, global step 72: 'val_loss' reached 0.37466 (best 0.37466), saving model to '/rds/projects/s/subramaa-mum-predict/CharlesGadd_Oxford/FoundationModelOutput/checkpoints/CR_11M_24_11_01_big_posencscale__cvd-fine-tune-sr-3000-notebook.ckpt' as top 1


Validation: |          | 0/? [00:00<?, ?it/s]

Metric val_loss improved by 0.003 >= min_delta = 0. New best score: 0.372
Epoch 6, global step 84: 'val_loss' reached 0.37205 (best 0.37205), saving model to '/rds/projects/s/subramaa-mum-predict/CharlesGadd_Oxford/FoundationModelOutput/checkpoints/CR_11M_24_11_01_big_posencscale__cvd-fine-tune-sr-3000-notebook.ckpt' as top 1


Validation: |          | 0/? [00:00<?, ?it/s]

Metric val_loss improved by 0.002 >= min_delta = 0. New best score: 0.370
Epoch 7, global step 96: 'val_loss' reached 0.37037 (best 0.37037), saving model to '/rds/projects/s/subramaa-mum-predict/CharlesGadd_Oxford/FoundationModelOutput/checkpoints/CR_11M_24_11_01_big_posencscale__cvd-fine-tune-sr-3000-notebook.ckpt' as top 1


Validation: |          | 0/? [00:00<?, ?it/s]

Metric val_loss improved by 0.001 >= min_delta = 0. New best score: 0.369
Epoch 8, global step 108: 'val_loss' reached 0.36922 (best 0.36922), saving model to '/rds/projects/s/subramaa-mum-predict/CharlesGadd_Oxford/FoundationModelOutput/checkpoints/CR_11M_24_11_01_big_posencscale__cvd-fine-tune-sr-3000-notebook.ckpt' as top 1


Validation: |          | 0/? [00:00<?, ?it/s]

Metric val_loss improved by 0.001 >= min_delta = 0. New best score: 0.368
Epoch 9, global step 120: 'val_loss' reached 0.36843 (best 0.36843), saving model to '/rds/projects/s/subramaa-mum-predict/CharlesGadd_Oxford/FoundationModelOutput/checkpoints/CR_11M_24_11_01_big_posencscale__cvd-fine-tune-sr-3000-notebook.ckpt' as top 1


Validation: |          | 0/? [00:00<?, ?it/s]

Metric val_loss improved by 0.001 >= min_delta = 0. New best score: 0.368
Epoch 10, global step 132: 'val_loss' reached 0.36785 (best 0.36785), saving model to '/rds/projects/s/subramaa-mum-predict/CharlesGadd_Oxford/FoundationModelOutput/checkpoints/CR_11M_24_11_01_big_posencscale__cvd-fine-tune-sr-3000-notebook.ckpt' as top 1


Validation: |          | 0/? [00:00<?, ?it/s]

Metric val_loss improved by 0.001 >= min_delta = 0. New best score: 0.367
Epoch 11, global step 144: 'val_loss' reached 0.36694 (best 0.36694), saving model to '/rds/projects/s/subramaa-mum-predict/CharlesGadd_Oxford/FoundationModelOutput/checkpoints/CR_11M_24_11_01_big_posencscale__cvd-fine-tune-sr-3000-notebook.ckpt' as top 1


Validation: |          | 0/? [00:00<?, ?it/s]

Metric val_loss improved by 0.001 >= min_delta = 0. New best score: 0.366
Epoch 12, global step 156: 'val_loss' reached 0.36631 (best 0.36631), saving model to '/rds/projects/s/subramaa-mum-predict/CharlesGadd_Oxford/FoundationModelOutput/checkpoints/CR_11M_24_11_01_big_posencscale__cvd-fine-tune-sr-3000-notebook.ckpt' as top 1


Validation: |          | 0/? [00:00<?, ?it/s]

Metric val_loss improved by 0.001 >= min_delta = 0. New best score: 0.366
Epoch 13, global step 168: 'val_loss' reached 0.36570 (best 0.36570), saving model to '/rds/projects/s/subramaa-mum-predict/CharlesGadd_Oxford/FoundationModelOutput/checkpoints/CR_11M_24_11_01_big_posencscale__cvd-fine-tune-sr-3000-notebook.ckpt' as top 1


Validation: |          | 0/? [00:00<?, ?it/s]

Metric val_loss improved by 0.001 >= min_delta = 0. New best score: 0.365
Epoch 14, global step 180: 'val_loss' reached 0.36515 (best 0.36515), saving model to '/rds/projects/s/subramaa-mum-predict/CharlesGadd_Oxford/FoundationModelOutput/checkpoints/CR_11M_24_11_01_big_posencscale__cvd-fine-tune-sr-3000-notebook.ckpt' as top 1


Validation: |          | 0/? [00:00<?, ?it/s]

Metric val_loss improved by 0.000 >= min_delta = 0. New best score: 0.365
Epoch 15, global step 192: 'val_loss' reached 0.36482 (best 0.36482), saving model to '/rds/projects/s/subramaa-mum-predict/CharlesGadd_Oxford/FoundationModelOutput/checkpoints/CR_11M_24_11_01_big_posencscale__cvd-fine-tune-sr-3000-notebook.ckpt' as top 1


Validation: |          | 0/? [00:00<?, ?it/s]

Metric val_loss improved by 0.000 >= min_delta = 0. New best score: 0.365
Epoch 16, global step 204: 'val_loss' reached 0.36458 (best 0.36458), saving model to '/rds/projects/s/subramaa-mum-predict/CharlesGadd_Oxford/FoundationModelOutput/checkpoints/CR_11M_24_11_01_big_posencscale__cvd-fine-tune-sr-3000-notebook.ckpt' as top 1


Validation: |          | 0/? [00:00<?, ?it/s]

Metric val_loss improved by 0.000 >= min_delta = 0. New best score: 0.364
Epoch 17, global step 216: 'val_loss' reached 0.36408 (best 0.36408), saving model to '/rds/projects/s/subramaa-mum-predict/CharlesGadd_Oxford/FoundationModelOutput/checkpoints/CR_11M_24_11_01_big_posencscale__cvd-fine-tune-sr-3000-notebook.ckpt' as top 1


Validation: |          | 0/? [00:00<?, ?it/s]

Metric val_loss improved by 0.000 >= min_delta = 0. New best score: 0.364
Epoch 18, global step 228: 'val_loss' reached 0.36366 (best 0.36366), saving model to '/rds/projects/s/subramaa-mum-predict/CharlesGadd_Oxford/FoundationModelOutput/checkpoints/CR_11M_24_11_01_big_posencscale__cvd-fine-tune-sr-3000-notebook.ckpt' as top 1


Validation: |          | 0/? [00:00<?, ?it/s]

Metric val_loss improved by 0.000 >= min_delta = 0. New best score: 0.363
Epoch 19, global step 240: 'val_loss' reached 0.36349 (best 0.36349), saving model to '/rds/projects/s/subramaa-mum-predict/CharlesGadd_Oxford/FoundationModelOutput/checkpoints/CR_11M_24_11_01_big_posencscale__cvd-fine-tune-sr-3000-notebook.ckpt' as top 1


Validation: |          | 0/? [00:00<?, ?it/s]

Metric val_loss improved by 0.000 >= min_delta = 0. New best score: 0.363
Epoch 20, global step 252: 'val_loss' reached 0.36334 (best 0.36334), saving model to '/rds/projects/s/subramaa-mum-predict/CharlesGadd_Oxford/FoundationModelOutput/checkpoints/CR_11M_24_11_01_big_posencscale__cvd-fine-tune-sr-3000-notebook.ckpt' as top 1


Validation: |          | 0/? [00:00<?, ?it/s]

Metric val_loss improved by 0.000 >= min_delta = 0. New best score: 0.363
Epoch 21, global step 264: 'val_loss' reached 0.36324 (best 0.36324), saving model to '/rds/projects/s/subramaa-mum-predict/CharlesGadd_Oxford/FoundationModelOutput/checkpoints/CR_11M_24_11_01_big_posencscale__cvd-fine-tune-sr-3000-notebook.ckpt' as top 1


Validation: |          | 0/? [00:00<?, ?it/s]

Metric val_loss improved by 0.000 >= min_delta = 0. New best score: 0.363
Epoch 22, global step 276: 'val_loss' reached 0.36300 (best 0.36300), saving model to '/rds/projects/s/subramaa-mum-predict/CharlesGadd_Oxford/FoundationModelOutput/checkpoints/CR_11M_24_11_01_big_posencscale__cvd-fine-tune-sr-3000-notebook.ckpt' as top 1


Validation: |          | 0/? [00:00<?, ?it/s]

Metric val_loss improved by 0.000 >= min_delta = 0. New best score: 0.363
Epoch 23, global step 288: 'val_loss' reached 0.36297 (best 0.36297), saving model to '/rds/projects/s/subramaa-mum-predict/CharlesGadd_Oxford/FoundationModelOutput/checkpoints/CR_11M_24_11_01_big_posencscale__cvd-fine-tune-sr-3000-notebook.ckpt' as top 1


Validation: |          | 0/? [00:00<?, ?it/s]

Monitored metric val_loss did not improve in the last 1 records. Best score: 0.363. Signaling Trainer to stop.
Epoch 24, global step 300: 'val_loss' was not in top 1
INFO:root:Re-loading from best cached checkpoint /rds/projects/s/subramaa-mum-predict/CharlesGadd_Oxford/FoundationModelOutput/checkpoints/CR_11M_24_11_01_big_posencscale__cvd-fine-tune-sr-3000-notebook.ckpt
INFO:root:Using Temporal Positional Encoding. This module uses the patient's age at an event within their time series.
INFO:root:Using Competing-Risk DeSurv head.
INFO:root:In generation forwarding DeSurv on the grid between [0.0, 1.0] with 1000 intervals
INFO:root:Fixing Transformer parameters and training only new head.
INFO:root:Testing model.
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
SLURM auto-requeueing enabled. Setting signal handlers.


Testing: |          | 0/? [00:00<?, ?it/s]

Loaded model with 129.390253 M parameters


0,1
Scheduler,██████████████▁
"Test:OutcomePerformanceMetrics_[95, 41, 67, 65, 28]ctd",▁
"Test:OutcomePerformanceMetrics_[95, 41, 67, 65, 28]ibs",▁
"Test:OutcomePerformanceMetrics_[95, 41, 67, 65, 28]inbll",▁
Test:OutcomePerformanceMetricsctd,▁
Test:OutcomePerformanceMetricsibs,▁
Test:OutcomePerformanceMetricsinbll,▁
"Val:OutcomePerformanceMetrics_[95, 41, 67, 65, 28]ctd",▁▁▁▁▁▂▃▄▄▅▅▆▆▆▇▇▇▇▇██████
"Val:OutcomePerformanceMetrics_[95, 41, 67, 65, 28]ibs",█▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
"Val:OutcomePerformanceMetrics_[95, 41, 67, 65, 28]inbll",█▃▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁

0,1
Scheduler,0.0009
"Test:OutcomePerformanceMetrics_[95, 41, 67, 65, 28]ctd",0.60921
"Test:OutcomePerformanceMetrics_[95, 41, 67, 65, 28]ibs",0.03382
"Test:OutcomePerformanceMetrics_[95, 41, 67, 65, 28]inbll",0.14567
Test:OutcomePerformanceMetricsctd,0.60921
Test:OutcomePerformanceMetricsibs,0.03382
Test:OutcomePerformanceMetricsinbll,0.14567
"Val:OutcomePerformanceMetrics_[95, 41, 67, 65, 28]ctd",0.60929
"Val:OutcomePerformanceMetrics_[95, 41, 67, 65, 28]ibs",0.03269
"Val:OutcomePerformanceMetrics_[95, 41, 67, 65, 28]inbll",0.14184


INFO:root:Running cr on 72 CPUs and 1 GPUs
INFO:root:# Loading DataModule for dataset /rds/projects/g/gokhalkm-optimal/OPTIMAL_MASTER_DATASET/data/FoundationalModel/FineTune_CVD/. This will be loaded in supervised form.
INFO:root:Creating supervised collator for DataModule
INFO:root:Using meta information from /rds/projects/g/gokhalkm-optimal/OPTIMAL_MASTER_DATASET/data/FoundationalModel/PreTrain/meta_information_QuantJenny.pickle
INFO:root:Using train file-row count dictionary from /rds/projects/g/gokhalkm-optimal/OPTIMAL_MASTER_DATASET/data/FoundationalModel/FineTune_CVD/file_row_count_dict_train.pickle
INFO:root:Using test file-row count dictionary from /rds/projects/g/gokhalkm-optimal/OPTIMAL_MASTER_DATASET/data/FoundationalModel/FineTune_CVD/file_row_count_dict_test.pickle
INFO:root:Using val file-row count dictionary from /rds/projects/g/gokhalkm-optimal/OPTIMAL_MASTER_DATASET/data/FoundationalModel/FineTune_CVD/file_row_count_dict_val.pickle
INFO:root:Tokenzier created based on 

VBox(children=(Label(value='Waiting for wandb.init()...\r'), FloatProgress(value=0.016667957014093796, max=1.0…

/rds/bear-apps/2022a/EL8-ice/software/PyTorch-Lightning/2.1.0-foss-2022a-CUDA-11.7.0/lib/python3.10/site-packages/pytorch_lightning/callbacks/model_checkpoint.py:630: Checkpoint directory /rds/projects/s/subramaa-mum-predict/CharlesGadd_Oxford/FoundationModelOutput/checkpoints exists and is not empty.
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
INFO:root:Using ReduceLROnPlateau scheduler
INFO:root:Not using warm-up in scheduler

  | Name       | Type                            | Params
---------------------------------------------------------------
0 | model      | SurvStreamGPTForCausalModelling | 129 M 
1 | surv_layer | ODESurvSingleRiskLayer          | 34.0 K
2 | dropout    | Dropout                         | 0     
---------------------------------------------------------------
33.9 K    Trainable params
129 M     Non-trainable params
129 M     Total params
517.561   Total estimated model params size (MB)
SLURM auto-requeueing enabled. Setting signal handlers.


Sanity Checking: |          | 0/? [00:00<?, ?it/s]

Training: |          | 0/? [00:00<?, ?it/s]

/rds/bear-apps/2022a/EL8-ice/software/PyTorch-Lightning/2.1.0-foss-2022a-CUDA-11.7.0/lib/python3.10/site-packages/pytorch_lightning/loops/training_epoch_loop.py:377: ReduceLROnPlateau conditioned on metric val_loss which is not available but strict is set to `False`. Skipping learning rate update.


Validation: |          | 0/? [00:00<?, ?it/s]

Metric val_loss improved. New best score: 0.379
Epoch 0, global step 49: 'val_loss' reached 0.37908 (best 0.37908), saving model to '/rds/projects/s/subramaa-mum-predict/CharlesGadd_Oxford/FoundationModelOutput/checkpoints/CR_11M_24_11_01_big_posencscale__cvd-fine-tune-sr-12500-notebook.ckpt' as top 1


Validation: |          | 0/? [00:00<?, ?it/s]

Metric val_loss improved by 0.008 >= min_delta = 0. New best score: 0.371
Epoch 1, global step 98: 'val_loss' reached 0.37114 (best 0.37114), saving model to '/rds/projects/s/subramaa-mum-predict/CharlesGadd_Oxford/FoundationModelOutput/checkpoints/CR_11M_24_11_01_big_posencscale__cvd-fine-tune-sr-12500-notebook.ckpt' as top 1


Validation: |          | 0/? [00:00<?, ?it/s]

Metric val_loss improved by 0.001 >= min_delta = 0. New best score: 0.370
Epoch 2, global step 147: 'val_loss' reached 0.36976 (best 0.36976), saving model to '/rds/projects/s/subramaa-mum-predict/CharlesGadd_Oxford/FoundationModelOutput/checkpoints/CR_11M_24_11_01_big_posencscale__cvd-fine-tune-sr-12500-notebook.ckpt' as top 1


Validation: |          | 0/? [00:00<?, ?it/s]

Metric val_loss improved by 0.004 >= min_delta = 0. New best score: 0.366
Epoch 3, global step 196: 'val_loss' reached 0.36601 (best 0.36601), saving model to '/rds/projects/s/subramaa-mum-predict/CharlesGadd_Oxford/FoundationModelOutput/checkpoints/CR_11M_24_11_01_big_posencscale__cvd-fine-tune-sr-12500-notebook.ckpt' as top 1


Validation: |          | 0/? [00:00<?, ?it/s]

Metric val_loss improved by 0.001 >= min_delta = 0. New best score: 0.365
Epoch 4, global step 245: 'val_loss' reached 0.36529 (best 0.36529), saving model to '/rds/projects/s/subramaa-mum-predict/CharlesGadd_Oxford/FoundationModelOutput/checkpoints/CR_11M_24_11_01_big_posencscale__cvd-fine-tune-sr-12500-notebook.ckpt' as top 1


Validation: |          | 0/? [00:00<?, ?it/s]

Metric val_loss improved by 0.000 >= min_delta = 0. New best score: 0.365
Epoch 5, global step 294: 'val_loss' reached 0.36509 (best 0.36509), saving model to '/rds/projects/s/subramaa-mum-predict/CharlesGadd_Oxford/FoundationModelOutput/checkpoints/CR_11M_24_11_01_big_posencscale__cvd-fine-tune-sr-12500-notebook.ckpt' as top 1


Validation: |          | 0/? [00:00<?, ?it/s]

Metric val_loss improved by 0.001 >= min_delta = 0. New best score: 0.364
Epoch 6, global step 343: 'val_loss' reached 0.36402 (best 0.36402), saving model to '/rds/projects/s/subramaa-mum-predict/CharlesGadd_Oxford/FoundationModelOutput/checkpoints/CR_11M_24_11_01_big_posencscale__cvd-fine-tune-sr-12500-notebook.ckpt' as top 1


Validation: |          | 0/? [00:00<?, ?it/s]

Monitored metric val_loss did not improve in the last 1 records. Best score: 0.364. Signaling Trainer to stop.
Epoch 7, global step 392: 'val_loss' was not in top 1
INFO:root:Re-loading from best cached checkpoint /rds/projects/s/subramaa-mum-predict/CharlesGadd_Oxford/FoundationModelOutput/checkpoints/CR_11M_24_11_01_big_posencscale__cvd-fine-tune-sr-12500-notebook.ckpt
INFO:root:Using Temporal Positional Encoding. This module uses the patient's age at an event within their time series.
INFO:root:Using Competing-Risk DeSurv head.
INFO:root:In generation forwarding DeSurv on the grid between [0.0, 1.0] with 1000 intervals
INFO:root:Fixing Transformer parameters and training only new head.
INFO:root:Testing model.
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
SLURM auto-requeueing enabled. Setting signal handlers.


Testing: |          | 0/? [00:00<?, ?it/s]

Loaded model with 129.390253 M parameters


0,1
Scheduler,███▇▆▅▅▅▄▄▃▃▃▂▂▂▁▁▁
"Test:OutcomePerformanceMetrics_[95, 41, 67, 65, 28]ctd",▁
"Test:OutcomePerformanceMetrics_[95, 41, 67, 65, 28]ibs",▁
"Test:OutcomePerformanceMetrics_[95, 41, 67, 65, 28]inbll",▁
Test:OutcomePerformanceMetricsctd,▁
Test:OutcomePerformanceMetricsibs,▁
Test:OutcomePerformanceMetricsinbll,▁
"Val:OutcomePerformanceMetrics_[95, 41, 67, 65, 28]ctd",▁▅▇▇████
"Val:OutcomePerformanceMetrics_[95, 41, 67, 65, 28]ibs",█▅▄▂▂▁▁▁
"Val:OutcomePerformanceMetrics_[95, 41, 67, 65, 28]inbll",█▄▃▂▂▁▁▁

0,1
Scheduler,0.00025
"Test:OutcomePerformanceMetrics_[95, 41, 67, 65, 28]ctd",0.60832
"Test:OutcomePerformanceMetrics_[95, 41, 67, 65, 28]ibs",0.03393
"Test:OutcomePerformanceMetrics_[95, 41, 67, 65, 28]inbll",0.14659
Test:OutcomePerformanceMetricsctd,0.60832
Test:OutcomePerformanceMetricsibs,0.03393
Test:OutcomePerformanceMetricsinbll,0.14659
"Val:OutcomePerformanceMetrics_[95, 41, 67, 65, 28]ctd",0.61829
"Val:OutcomePerformanceMetrics_[95, 41, 67, 65, 28]ibs",0.03268
"Val:OutcomePerformanceMetrics_[95, 41, 67, 65, 28]inbll",0.14189


INFO:root:Running cr on 72 CPUs and 1 GPUs
INFO:root:# Loading DataModule for dataset /rds/projects/g/gokhalkm-optimal/OPTIMAL_MASTER_DATASET/data/FoundationalModel/FineTune_CVD/. This will be loaded in supervised form.
INFO:root:Creating supervised collator for DataModule
INFO:root:Using meta information from /rds/projects/g/gokhalkm-optimal/OPTIMAL_MASTER_DATASET/data/FoundationalModel/PreTrain/meta_information_QuantJenny.pickle
INFO:root:Using train file-row count dictionary from /rds/projects/g/gokhalkm-optimal/OPTIMAL_MASTER_DATASET/data/FoundationalModel/FineTune_CVD/file_row_count_dict_train.pickle
INFO:root:Using test file-row count dictionary from /rds/projects/g/gokhalkm-optimal/OPTIMAL_MASTER_DATASET/data/FoundationalModel/FineTune_CVD/file_row_count_dict_test.pickle
INFO:root:Using val file-row count dictionary from /rds/projects/g/gokhalkm-optimal/OPTIMAL_MASTER_DATASET/data/FoundationalModel/FineTune_CVD/file_row_count_dict_val.pickle
INFO:root:Tokenzier created based on 

/rds/bear-apps/2022a/EL8-ice/software/PyTorch-Lightning/2.1.0-foss-2022a-CUDA-11.7.0/lib/python3.10/site-packages/pytorch_lightning/callbacks/model_checkpoint.py:630: Checkpoint directory /rds/projects/s/subramaa-mum-predict/CharlesGadd_Oxford/FoundationModelOutput/checkpoints exists and is not empty.
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
INFO:root:Using ReduceLROnPlateau scheduler
INFO:root:Not using warm-up in scheduler

  | Name       | Type                            | Params
---------------------------------------------------------------
0 | model      | SurvStreamGPTForCausalModelling | 129 M 
1 | surv_layer | ODESurvSingleRiskLayer          | 34.0 K
2 | dropout    | Dropout                         | 0     
---------------------------------------------------------------
33.9 K    Trainable params
129 M     Non-trainable params
129 M     Total params
517.561   Total estimated model params size (MB)
SLURM auto-requeueing enabled. Setting signal handlers.


Sanity Checking: |          | 0/? [00:00<?, ?it/s]

Training: |          | 0/? [00:00<?, ?it/s]

/rds/bear-apps/2022a/EL8-ice/software/PyTorch-Lightning/2.1.0-foss-2022a-CUDA-11.7.0/lib/python3.10/site-packages/pytorch_lightning/loops/training_epoch_loop.py:377: ReduceLROnPlateau conditioned on metric val_loss which is not available but strict is set to `False`. Skipping learning rate update.


Validation: |          | 0/? [00:00<?, ?it/s]

Metric val_loss improved. New best score: 0.365
Epoch 0, global step 118: 'val_loss' reached 0.36508 (best 0.36508), saving model to '/rds/projects/s/subramaa-mum-predict/CharlesGadd_Oxford/FoundationModelOutput/checkpoints/CR_11M_24_11_01_big_posencscale__cvd-fine-tune-sr-30000-notebook.ckpt' as top 1


Validation: |          | 0/? [00:00<?, ?it/s]

Metric val_loss improved by 0.004 >= min_delta = 0. New best score: 0.361
Epoch 1, global step 236: 'val_loss' reached 0.36093 (best 0.36093), saving model to '/rds/projects/s/subramaa-mum-predict/CharlesGadd_Oxford/FoundationModelOutput/checkpoints/CR_11M_24_11_01_big_posencscale__cvd-fine-tune-sr-30000-notebook.ckpt' as top 1


Validation: |          | 0/? [00:00<?, ?it/s]

Monitored metric val_loss did not improve in the last 1 records. Best score: 0.361. Signaling Trainer to stop.
Epoch 2, global step 354: 'val_loss' was not in top 1
INFO:root:Re-loading from best cached checkpoint /rds/projects/s/subramaa-mum-predict/CharlesGadd_Oxford/FoundationModelOutput/checkpoints/CR_11M_24_11_01_big_posencscale__cvd-fine-tune-sr-30000-notebook.ckpt
INFO:root:Using Temporal Positional Encoding. This module uses the patient's age at an event within their time series.
INFO:root:Using Competing-Risk DeSurv head.
INFO:root:In generation forwarding DeSurv on the grid between [0.0, 1.0] with 1000 intervals
INFO:root:Fixing Transformer parameters and training only new head.
INFO:root:Testing model.
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
SLURM auto-requeueing enabled. Setting signal handlers.


Testing: |          | 0/? [00:00<?, ?it/s]

Loaded model with 129.390253 M parameters


0,1
Scheduler,██████▇▇▆▅▄▃▃▂▁▁▁
"Test:OutcomePerformanceMetrics_[95, 41, 67, 65, 28]ctd",▁
"Test:OutcomePerformanceMetrics_[95, 41, 67, 65, 28]ibs",▁
"Test:OutcomePerformanceMetrics_[95, 41, 67, 65, 28]inbll",▁
Test:OutcomePerformanceMetricsctd,▁
Test:OutcomePerformanceMetricsibs,▁
Test:OutcomePerformanceMetricsinbll,▁
"Val:OutcomePerformanceMetrics_[95, 41, 67, 65, 28]ctd",▁▇█
"Val:OutcomePerformanceMetrics_[95, 41, 67, 65, 28]ibs",█▁▁
"Val:OutcomePerformanceMetrics_[95, 41, 67, 65, 28]inbll",█▂▁

0,1
Scheduler,0.00039
"Test:OutcomePerformanceMetrics_[95, 41, 67, 65, 28]ctd",0.61746
"Test:OutcomePerformanceMetrics_[95, 41, 67, 65, 28]ibs",0.03381
"Test:OutcomePerformanceMetrics_[95, 41, 67, 65, 28]inbll",0.14558
Test:OutcomePerformanceMetricsctd,0.61746
Test:OutcomePerformanceMetricsibs,0.03381
Test:OutcomePerformanceMetricsinbll,0.14558
"Val:OutcomePerformanceMetrics_[95, 41, 67, 65, 28]ctd",0.63149
"Val:OutcomePerformanceMetrics_[95, 41, 67, 65, 28]ibs",0.03257
"Val:OutcomePerformanceMetrics_[95, 41, 67, 65, 28]inbll",0.14069


INFO:root:Running cr on 72 CPUs and 1 GPUs
INFO:root:# Loading DataModule for dataset /rds/projects/g/gokhalkm-optimal/OPTIMAL_MASTER_DATASET/data/FoundationalModel/FineTune_CVD/. This will be loaded in supervised form.
INFO:root:Creating supervised collator for DataModule
INFO:root:Using meta information from /rds/projects/g/gokhalkm-optimal/OPTIMAL_MASTER_DATASET/data/FoundationalModel/PreTrain/meta_information_QuantJenny.pickle
INFO:root:Using train file-row count dictionary from /rds/projects/g/gokhalkm-optimal/OPTIMAL_MASTER_DATASET/data/FoundationalModel/FineTune_CVD/file_row_count_dict_train.pickle
INFO:root:Using test file-row count dictionary from /rds/projects/g/gokhalkm-optimal/OPTIMAL_MASTER_DATASET/data/FoundationalModel/FineTune_CVD/file_row_count_dict_test.pickle
INFO:root:Using val file-row count dictionary from /rds/projects/g/gokhalkm-optimal/OPTIMAL_MASTER_DATASET/data/FoundationalModel/FineTune_CVD/file_row_count_dict_val.pickle
INFO:root:Tokenzier created based on 

/rds/bear-apps/2022a/EL8-ice/software/PyTorch-Lightning/2.1.0-foss-2022a-CUDA-11.7.0/lib/python3.10/site-packages/pytorch_lightning/callbacks/model_checkpoint.py:630: Checkpoint directory /rds/projects/s/subramaa-mum-predict/CharlesGadd_Oxford/FoundationModelOutput/checkpoints exists and is not empty.
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
INFO:root:Using ReduceLROnPlateau scheduler
INFO:root:Not using warm-up in scheduler

  | Name       | Type                            | Params
---------------------------------------------------------------
0 | model      | SurvStreamGPTForCausalModelling | 129 M 
1 | surv_layer | ODESurvSingleRiskLayer          | 34.0 K
2 | dropout    | Dropout                         | 0     
---------------------------------------------------------------
33.9 K    Trainable params
129 M     Non-trainable params
129 M     Total params
517.561   Total estimated model params size (MB)
SLURM auto-requeueing enabled. Setting signal handlers.


Sanity Checking: |          | 0/? [00:00<?, ?it/s]

Training: |          | 0/? [00:00<?, ?it/s]

/rds/bear-apps/2022a/EL8-ice/software/PyTorch-Lightning/2.1.0-foss-2022a-CUDA-11.7.0/lib/python3.10/site-packages/pytorch_lightning/loops/training_epoch_loop.py:377: ReduceLROnPlateau conditioned on metric val_loss which is not available but strict is set to `False`. Skipping learning rate update.


Validation: |          | 0/? [00:00<?, ?it/s]

Metric val_loss improved. New best score: 0.363
Epoch 0, global step 235: 'val_loss' reached 0.36254 (best 0.36254), saving model to '/rds/projects/s/subramaa-mum-predict/CharlesGadd_Oxford/FoundationModelOutput/checkpoints/CR_11M_24_11_01_big_posencscale__cvd-fine-tune-sr-60000-notebook.ckpt' as top 1


Validation: |          | 0/? [00:00<?, ?it/s]

Metric val_loss improved by 0.002 >= min_delta = 0. New best score: 0.360
Epoch 1, global step 470: 'val_loss' reached 0.36030 (best 0.36030), saving model to '/rds/projects/s/subramaa-mum-predict/CharlesGadd_Oxford/FoundationModelOutput/checkpoints/CR_11M_24_11_01_big_posencscale__cvd-fine-tune-sr-60000-notebook.ckpt' as top 1


Validation: |          | 0/? [00:00<?, ?it/s]

Metric val_loss improved by 0.000 >= min_delta = 0. New best score: 0.360
Epoch 2, global step 705: 'val_loss' reached 0.36006 (best 0.36006), saving model to '/rds/projects/s/subramaa-mum-predict/CharlesGadd_Oxford/FoundationModelOutput/checkpoints/CR_11M_24_11_01_big_posencscale__cvd-fine-tune-sr-60000-notebook.ckpt' as top 1


Validation: |          | 0/? [00:00<?, ?it/s]

Metric val_loss improved by 0.000 >= min_delta = 0. New best score: 0.360
Epoch 3, global step 940: 'val_loss' reached 0.35968 (best 0.35968), saving model to '/rds/projects/s/subramaa-mum-predict/CharlesGadd_Oxford/FoundationModelOutput/checkpoints/CR_11M_24_11_01_big_posencscale__cvd-fine-tune-sr-60000-notebook.ckpt' as top 1


Validation: |          | 0/? [00:00<?, ?it/s]

Metric val_loss improved by 0.001 >= min_delta = 0. New best score: 0.359
Epoch 4, global step 1175: 'val_loss' reached 0.35910 (best 0.35910), saving model to '/rds/projects/s/subramaa-mum-predict/CharlesGadd_Oxford/FoundationModelOutput/checkpoints/CR_11M_24_11_01_big_posencscale__cvd-fine-tune-sr-60000-notebook.ckpt' as top 1


Validation: |          | 0/? [00:00<?, ?it/s]

Monitored metric val_loss did not improve in the last 1 records. Best score: 0.359. Signaling Trainer to stop.
Epoch 5, global step 1410: 'val_loss' was not in top 1
INFO:root:Re-loading from best cached checkpoint /rds/projects/s/subramaa-mum-predict/CharlesGadd_Oxford/FoundationModelOutput/checkpoints/CR_11M_24_11_01_big_posencscale__cvd-fine-tune-sr-60000-notebook.ckpt
INFO:root:Using Temporal Positional Encoding. This module uses the patient's age at an event within their time series.
INFO:root:Using Competing-Risk DeSurv head.
INFO:root:In generation forwarding DeSurv on the grid between [0.0, 1.0] with 1000 intervals
INFO:root:Fixing Transformer parameters and training only new head.
INFO:root:Testing model.
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
SLURM auto-requeueing enabled. Setting signal handlers.


Testing: |          | 0/? [00:00<?, ?it/s]

Loaded model with 129.390253 M parameters


0,1
Scheduler,███████▇▆▅▅▄▃▃▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
"Test:OutcomePerformanceMetrics_[95, 41, 67, 65, 28]ctd",▁
"Test:OutcomePerformanceMetrics_[95, 41, 67, 65, 28]ibs",▁
"Test:OutcomePerformanceMetrics_[95, 41, 67, 65, 28]inbll",▁
Test:OutcomePerformanceMetricsctd,▁
Test:OutcomePerformanceMetricsibs,▁
Test:OutcomePerformanceMetricsinbll,▁
"Val:OutcomePerformanceMetrics_[95, 41, 67, 65, 28]ctd",▁▆▇███
"Val:OutcomePerformanceMetrics_[95, 41, 67, 65, 28]ibs",█▃▂▁▁▁
"Val:OutcomePerformanceMetrics_[95, 41, 67, 65, 28]inbll",█▃▂▁▁▁

0,1
Scheduler,0.0001
"Test:OutcomePerformanceMetrics_[95, 41, 67, 65, 28]ctd",0.63183
"Test:OutcomePerformanceMetrics_[95, 41, 67, 65, 28]ibs",0.03376
"Test:OutcomePerformanceMetrics_[95, 41, 67, 65, 28]inbll",0.14483
Test:OutcomePerformanceMetricsctd,0.63183
Test:OutcomePerformanceMetricsibs,0.03376
Test:OutcomePerformanceMetricsinbll,0.14483
"Val:OutcomePerformanceMetrics_[95, 41, 67, 65, 28]ctd",0.63925
"Val:OutcomePerformanceMetrics_[95, 41, 67, 65, 28]ibs",0.03252
"Val:OutcomePerformanceMetrics_[95, 41, 67, 65, 28]inbll",0.14004


INFO:root:Running cr on 72 CPUs and 1 GPUs
INFO:root:# Loading DataModule for dataset /rds/projects/g/gokhalkm-optimal/OPTIMAL_MASTER_DATASET/data/FoundationalModel/FineTune_CVD/. This will be loaded in supervised form.
INFO:root:Creating supervised collator for DataModule
INFO:root:Using meta information from /rds/projects/g/gokhalkm-optimal/OPTIMAL_MASTER_DATASET/data/FoundationalModel/PreTrain/meta_information_QuantJenny.pickle
INFO:root:Using train file-row count dictionary from /rds/projects/g/gokhalkm-optimal/OPTIMAL_MASTER_DATASET/data/FoundationalModel/FineTune_CVD/file_row_count_dict_train.pickle
INFO:root:Using test file-row count dictionary from /rds/projects/g/gokhalkm-optimal/OPTIMAL_MASTER_DATASET/data/FoundationalModel/FineTune_CVD/file_row_count_dict_test.pickle
INFO:root:Using val file-row count dictionary from /rds/projects/g/gokhalkm-optimal/OPTIMAL_MASTER_DATASET/data/FoundationalModel/FineTune_CVD/file_row_count_dict_val.pickle
INFO:root:Tokenzier created based on 

VBox(children=(Label(value='Waiting for wandb.init()...\r'), FloatProgress(value=0.01666790993573765, max=1.0)…

/rds/bear-apps/2022a/EL8-ice/software/PyTorch-Lightning/2.1.0-foss-2022a-CUDA-11.7.0/lib/python3.10/site-packages/pytorch_lightning/callbacks/model_checkpoint.py:630: Checkpoint directory /rds/projects/s/subramaa-mum-predict/CharlesGadd_Oxford/FoundationModelOutput/checkpoints exists and is not empty.
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
INFO:root:Using ReduceLROnPlateau scheduler
INFO:root:Not using warm-up in scheduler

  | Name       | Type                            | Params
---------------------------------------------------------------
0 | model      | SurvStreamGPTForCausalModelling | 129 M 
1 | surv_layer | ODESurvSingleRiskLayer          | 34.0 K
2 | dropout    | Dropout                         | 0     
---------------------------------------------------------------
33.9 K    Trainable params
129 M     Non-trainable params
129 M     Total params
517.561   Total estimated model params size (MB)
SLURM auto-requeueing enabled. Setting signal handlers.


Sanity Checking: |          | 0/? [00:00<?, ?it/s]

Training: |          | 0/? [00:00<?, ?it/s]

/rds/bear-apps/2022a/EL8-ice/software/PyTorch-Lightning/2.1.0-foss-2022a-CUDA-11.7.0/lib/python3.10/site-packages/pytorch_lightning/loops/training_epoch_loop.py:377: ReduceLROnPlateau conditioned on metric val_loss which is not available but strict is set to `False`. Skipping learning rate update.
wandb: ERROR Error while calling W&B API: context deadline exceeded (<Response [500]>)


Validation: |          | 0/? [00:00<?, ?it/s]

Metric val_loss improved. New best score: 0.359
Epoch 0, global step 391: 'val_loss' reached 0.35935 (best 0.35935), saving model to '/rds/projects/s/subramaa-mum-predict/CharlesGadd_Oxford/FoundationModelOutput/checkpoints/CR_11M_24_11_01_big_posencscale__cvd-fine-tune-sr-100000-notebook.ckpt' as top 1


Validation: |          | 0/? [00:00<?, ?it/s]

Metric val_loss improved by 0.000 >= min_delta = 0. New best score: 0.359
Epoch 1, global step 782: 'val_loss' reached 0.35934 (best 0.35934), saving model to '/rds/projects/s/subramaa-mum-predict/CharlesGadd_Oxford/FoundationModelOutput/checkpoints/CR_11M_24_11_01_big_posencscale__cvd-fine-tune-sr-100000-notebook.ckpt' as top 1


Validation: |          | 0/? [00:00<?, ?it/s]

Metric val_loss improved by 0.001 >= min_delta = 0. New best score: 0.359
Epoch 2, global step 1173: 'val_loss' reached 0.35855 (best 0.35855), saving model to '/rds/projects/s/subramaa-mum-predict/CharlesGadd_Oxford/FoundationModelOutput/checkpoints/CR_11M_24_11_01_big_posencscale__cvd-fine-tune-sr-100000-notebook.ckpt' as top 1
wandb: ERROR Error while calling W&B API: context deadline exceeded (<Response [500]>)


Validation: |          | 0/? [00:00<?, ?it/s]

Metric val_loss improved by 0.000 >= min_delta = 0. New best score: 0.358
Epoch 3, global step 1564: 'val_loss' reached 0.35836 (best 0.35836), saving model to '/rds/projects/s/subramaa-mum-predict/CharlesGadd_Oxford/FoundationModelOutput/checkpoints/CR_11M_24_11_01_big_posencscale__cvd-fine-tune-sr-100000-notebook.ckpt' as top 1


Validation: |          | 0/? [00:00<?, ?it/s]

Monitored metric val_loss did not improve in the last 1 records. Best score: 0.358. Signaling Trainer to stop.
Epoch 4, global step 1955: 'val_loss' was not in top 1
INFO:root:Re-loading from best cached checkpoint /rds/projects/s/subramaa-mum-predict/CharlesGadd_Oxford/FoundationModelOutput/checkpoints/CR_11M_24_11_01_big_posencscale__cvd-fine-tune-sr-100000-notebook.ckpt
INFO:root:Using Temporal Positional Encoding. This module uses the patient's age at an event within their time series.
INFO:root:Using Competing-Risk DeSurv head.
INFO:root:In generation forwarding DeSurv on the grid between [0.0, 1.0] with 1000 intervals
INFO:root:Fixing Transformer parameters and training only new head.
INFO:root:Testing model.
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
SLURM auto-requeueing enabled. Setting signal handlers.


Testing: |          | 0/? [00:00<?, ?it/s]

Loaded model with 129.390253 M parameters


0,1
Scheduler,█████████▆▅▄▃▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
"Test:OutcomePerformanceMetrics_[95, 41, 67, 65, 28]ctd",▁
"Test:OutcomePerformanceMetrics_[95, 41, 67, 65, 28]ibs",▁
"Test:OutcomePerformanceMetrics_[95, 41, 67, 65, 28]inbll",▁
Test:OutcomePerformanceMetricsctd,▁
Test:OutcomePerformanceMetricsibs,▁
Test:OutcomePerformanceMetricsinbll,▁
"Val:OutcomePerformanceMetrics_[95, 41, 67, 65, 28]ctd",▁▆▇██
"Val:OutcomePerformanceMetrics_[95, 41, 67, 65, 28]ibs",█▃▁▁▁
"Val:OutcomePerformanceMetrics_[95, 41, 67, 65, 28]inbll",█▃▂▂▁

0,1
Scheduler,0.0001
"Test:OutcomePerformanceMetrics_[95, 41, 67, 65, 28]ctd",0.63913
"Test:OutcomePerformanceMetrics_[95, 41, 67, 65, 28]ibs",0.03372
"Test:OutcomePerformanceMetrics_[95, 41, 67, 65, 28]inbll",0.14434
Test:OutcomePerformanceMetricsctd,0.63913
Test:OutcomePerformanceMetricsibs,0.03372
Test:OutcomePerformanceMetricsinbll,0.14434
"Val:OutcomePerformanceMetrics_[95, 41, 67, 65, 28]ctd",0.64504
"Val:OutcomePerformanceMetrics_[95, 41, 67, 65, 28]ibs",0.03251
"Val:OutcomePerformanceMetrics_[95, 41, 67, 65, 28]inbll",0.13974


In [None]:
wandb.finish()

In [None]:
dm.tokenizer._event_counts