# Load your data

Before finetuning a pretrained model of the experiments we provide in our repository (or precomputed and provided [here](https://datacloud.hhi.fraunhofer.de/nextcloud/s/NCjYws3mamLrkKq)), first load your custom 100 Hz sampled 12-lead ECG signal data `X` of shape `[N,L,12]` in Millivolts (mV) and multi-hot encoded labels `y` of shape `[N,C]` as numpy arrays, where `C` is the number of classes and `N` the number of total samples in this dataset. Although PTB-XL comes with fixed `L=1000` (i,e. 10 seconds), it is not required to be fixed, **BUT** the shortest sample must be longer than `input_size` of the specific model (e.g. 2.5 seconds for our fastai-models).

For proper tinetuning split your data into four numpy arrays: `X_train`,`y_train`,`X_val` and `y_val`

### Example: finetune model trained on all (71) on superdiagnostic (5)
Below we provide an example for loading [PTB-XL](https://physionet.org/content/ptb-xl/1.0.1/) aggregated at the `superdiagnostic` level, where we use the provided folds for train-validation-split:

In [1]:
!git clone https://github.com/helme/ecg_ptbxl_benchmarking.git

Cloning into 'ecg_ptbxl_benchmarking'...
remote: Enumerating objects: 185, done.[K
remote: Counting objects: 100% (106/106), done.[K
remote: Compressing objects: 100% (76/76), done.[K
remote: Total 185 (delta 64), reused 27 (delta 27), pack-reused 79 (from 2)[K
Receiving objects: 100% (185/185), 46.07 MiB | 9.83 MiB/s, done.
Resolving deltas: 100% (71/71), done.
Updating files: 100% (51/51), done.


In [3]:
%pwd

'/content'

In [4]:
!rm -rf /ecg_ptbxl_benchmarking/data

In [5]:
%cd ecg_ptbxl_benchmarking

/content/ecg_ptbxl_benchmarking


In [6]:
!./get_datasets.sh

[1;30;43mStreaming output truncated to the last 5000 lines.[0m
Saving to: ‘physionet.org/files/ptb-xl/1.0.3/records100/01000/01947_lr.hea’


2025-06-18 16:15:20 (365 MB/s) - ‘physionet.org/files/ptb-xl/1.0.3/records100/01000/01947_lr.hea’ saved [609/609]

--2025-06-18 16:15:20--  https://physionet.org/files/ptb-xl/1.0.3/records100/01000/01948_lr.dat
Reusing existing connection to physionet.org:443.
HTTP request sent, awaiting response... 200 OK
Length: 24000 (23K) [application/octet-stream]
Saving to: ‘physionet.org/files/ptb-xl/1.0.3/records100/01000/01948_lr.dat’


2025-06-18 16:15:20 (1.22 MB/s) - ‘physionet.org/files/ptb-xl/1.0.3/records100/01000/01948_lr.dat’ saved [24000/24000]

--2025-06-18 16:15:20--  https://physionet.org/files/ptb-xl/1.0.3/records100/01000/01948_lr.hea
Reusing existing connection to physionet.org:443.
HTTP request sent, awaiting response... 200 OK
Length: 603 [text/plain]
Saving to: ‘physionet.org/files/ptb-xl/1.0.3/records100/01000/01948_lr.hea’


2025-06-

In [7]:
%cd code/

/content/ecg_ptbxl_benchmarking/code


In [9]:
!pip install wfdb

Collecting wfdb
  Downloading wfdb-4.3.0-py3-none-any.whl.metadata (3.8 kB)
Collecting pandas>=2.2.3 (from wfdb)
  Downloading pandas-2.3.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (91 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m91.2/91.2 kB[0m [31m3.7 MB/s[0m eta [36m0:00:00[0m
Downloading wfdb-4.3.0-py3-none-any.whl (163 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m163.8/163.8 kB[0m [31m8.6 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading pandas-2.3.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (12.4 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m12.4/12.4 MB[0m [31m50.6 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: pandas, wfdb
  Attempting uninstall: pandas
    Found existing installation: pandas 2.2.2
    Uninstalling pandas-2.2.2:
      Successfully uninstalled pandas-2.2.2
[31mERROR: pip's dependency resolver does not currently take into account

In [23]:
from utils import utils

sampling_frequency=100
datafolder='../data/physionet.org/files/ptb-xl/1.0.3'
task='superdiagnostic'
outputfolder='../output/'

# Load PTB-XL data
data, raw_labels = utils.load_dataset(datafolder, sampling_frequency)


UnboundLocalError: cannot access local variable 'X' where it is not associated with a value

In [19]:
# Preprocess label data
labels = utils.compute_label_aggregations(raw_labels, datafolder, task)
# Select relevant data and convert to one-hot
data, labels, Y, _ = utils.select_data(data, labels, task, min_samples=0, outputfolder=outputfolder)

# 1-9 for training
X_train = data[labels.strat_fold < 10]
y_train = Y[labels.strat_fold < 10]
# 10 for validation
X_val = data[labels.strat_fold == 10]
y_val = Y[labels.strat_fold == 10]

num_classes = 5         # <=== number of classes in the finetuning dataset
input_shape = [1000,12] # <=== shape of samples, [None, 12] in case of different lengths

X_train.shape, y_train.shape, X_val.shape, y_val.shape

UnboundLocalError: cannot access local variable 'X' where it is not associated with a value

# Train or download models
There are two possibilities:
   1. Run the experiments as described in README. Afterwards you find trained in models in `output/expX/models/`
   2. Download the precomputed `output`-folder with all experiments and models from [here]((https://datacloud.hhi.fraunhofer.de/nextcloud/s/NCjYws3mamLrkKq))

# Load pretrained model

For loading a pretrained model:
   1. specify `modelname` which can be seen in `code/configs/` (e.g. `modelname='fastai_xresnet1d101'`)
   2. provide `experiment` to build the path `pretrainedfolder` (here: `exp0` refers to the experiment with `all` 71 SCP-statements)
   
This returns the pretrained model where the classification is replaced by a random initialized head with the same number of outputs as the number of classes.

In [None]:
from models.fastai_model import fastai_model

experiment = 'exp0'
modelname = 'fastai_xresnet1d101'
pretrainedfolder = '../output/'+experiment+'/models/'+modelname+'/'
mpath='../output/' # <=== path where the finetuned model will be stored
n_classes_pretrained = 71 # <=== because we load the model from exp0, this should be fixed because this depends the experiment

model = fastai_model(
    modelname,
    num_classes,
    sampling_frequency,
    mpath,
    input_shape=input_shape,
    pretrainedfolder=pretrainedfolder,
    n_classes_pretrained=n_classes_pretrained,
    pretrained=True,
    epochs_finetuning=2,
)

# Preprocess data with pretrained Standardizer

Since we standardize inputs to zero mean and unit variance, your custom data needs to be standardized with the respective mean and variance. This is also provided in the respective experiment folder `output/expX/data/standard_scaler.pkl`

In [None]:
import pickle

standard_scaler = pickle.load(open('../output/'+experiment+'/data/standard_scaler.pkl', "rb"))

X_train = utils.apply_standardizer(X_train, standard_scaler)
X_val = utils.apply_standardizer(X_val, standard_scaler)



RuntimeError: The reset parameter is False but there is no n_features_in_ attribute. Is this estimator fitted?

# Finetune model

Calling `model.fit` of a model with `pretrained=True` will perform finetuning as proposed in our work i.e. **gradual unfreezing and discriminative learning rates**.

In [None]:
model.fit(X_train, y_train, X_val, y_val)

Finetuning...
model: fastai_xresnet1d101


LR Finder is complete, type {learner_name}.recorder.plot() to see the graph.


epoch,train_loss,valid_loss,time
0,0.27171,0.271859,00:27
1,0.237324,0.268521,00:24


LR Finder is complete, type {learner_name}.recorder.plot() to see the graph.


epoch,train_loss,valid_loss,time
0,0.230869,0.270421,00:33
1,0.230395,0.268627,00:33


# Evaluate model on validation data

In [None]:
y_val_pred = model.predict(X_val)
utils.evaluate_experiment(y_val, y_val_pred)

aggregating predictions...


Unnamed: 0,macro_auc,Fmax
0,0.931458,0.827961
