<a href="https://colab.research.google.com/github/bchenley/TorchTimeSeries/blob/main/Baroreflex/notebooks/SubjectDataCollection.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

The purpose of this notebook is to create a dataset of subjects from the publicly available PhysioNet [website](https://physionet.org/content/autonomic-aging-cardiovascular/1.0.0/) for the user should they wish to collect a different set other than the one provided in the [Baroreflex](https://github.com/bchenley/Baroreflex) repository. If you'd like to use the data already available in the repo (three subjects), it is available [here](https://github.com/bchenley/Baroreflex/blob/main/data/cv_data.pkl).

To create your own dataset, you'll need to have the file containing the .hea data already downloaded on your computer.

To begin, clone the Baroreflex repo.

In [None]:
!git clone https://github.com/bchenley/Baroreflex.git

Next, install the WFDB library.

In [None]:
!pip install wfdb

Import the necessary libraries.

In [None]:
import wfdb, glob, random, os
import pandas as pd
import numpy as np
import pickle

If you have the file containing the data on your Google Drive, then mount your Drive:

In [None]:
from google.colab import drive
drive.mount('/content/drive')

The script below will loop through 10 randomly selected subjects in the file. You may increase (up to 1121) or decrease the number of subjects, however be aware of your memory. Subjects with data found to be too corrupt for preprocessing are ignored. However, there may be more, so visualize the data to check. To save space, only six minutes of data is kept for each subject. Again, you may increase or dicrease as desired (range 8 to 45 minutes across all subjects).

In [None]:
# Load the .hea file (Replace with the path to your .hea file)
record_path = '/content/drive/MyDrive/BIG_IDEAs_Lab/autonomic-aging-a-dataset-to-quantify-changes-of-cardiovascular-autonomic-function-during-healthy-aging-1.0.0/'

hea_files = glob.glob(record_path + '/*.hea')
subject_info_csv = glob.glob(record_path + '/*.csv')[0]
subject_info_df = pd.read_csv(subject_info_csv)

num_subjects = len(hea_files)
num_subjects_used = 10

random.shuffle(hea_files)

# some data you should ignore because the recordings are bad
ignore_ids = [963, 587, 633, 554, 793, 96, 41, 16, 653, 209, 31, 1060, 332, 936,
              559, 140, 186, 584, 365]

dt = 1/1000

max_minute = 6.0
offset = 30/60

cv_data = [] # dictionary containing the data from all the selected subjects
n = -1

while len(cv_data) < num_subjects_used:
  n += 1

  file_path = hea_files[n]

  if file_path.endswith('.hea'): file_path = file_path[:-4]

  id_n = int(os.path.splitext(file_path)[0].split('/')[-1])

  if os.path.exists(file_path + '.dat') & (id_n not in ignore_ids):

    info_n = subject_info_df[subject_info_df['ID'] == id_n]

    record = wfdb.rdrecord(file_path)
    signal = record.p_signal

    dict_n = {'id': id_n}
    t = np.arange(signal.shape[0])*dt
    signal = signal[(t/60 > offset) & (t/60 < (max_minute+offset)), :]
    t = t[(t/60 > offset) & (t/60 < (max_minute+offset))]

    dict_n['ecg'], dict_n['abp'] = signal[:, 0], signal[:, -1]
    dict_n['t'] = t

    if (np.sum(dict_n['abp'] < 50)/dict_n['abp'].shape[0] > 0.05) \
       | (np.sum(dict_n['abp'] > 200)/dict_n['abp'].shape[0] > 0.05):
      print(f"Subject {id_n} has bad abp")
    else:
      dict_n['age'] = int(info_n['Age_group']) if pd.notna(info_n['Age_group'].item()) else np.nan
      dict_n['sex'] = bool(int(info_n['Sex'])) if pd.notna(info_n['Sex'].item()) else np.nan
      dict_n['length'] = info_n['Length'] if pd.notna(info_n['Length'].item()) else np.nan
      dict_n['device'] = info_n['Device'] if pd.notna(info_n['Device'].item()) else np.nan
      cv_data.append(dict_n)

  else:
    print(f"Subject {id_n} does not have a .dat file.")

  print(f"Subject {id_n} ({len(cv_data)}/{num_subjects_used})")

You may save your data dictionary in your desired location.

In [None]:
file_path = "/content/cv_data.pkl"
with open(file_path, "wb") as file:
  pickle.dump(cv_data, file)