이 데이터 세트는 비침습적 혈압 예측에 사용되는 MIMIC-III 데이터 세트의 하위 집합입니다. PPG 및 ABP 데이터는 7초 길이의 윈도우(875개 데이터 포인트)로 나뉩니다. 수축기 및 이완기 혈압 값은 ABP 윈도우에서 파생되었습니다. 데이터 세트의 각 샘플은 PPG 신호와 혈압 값, 그리고 고유한 피험자 식별자로 구성됩니다. 이 파일은 세 개의 데이터 세트로 구성됩니다.

PPG: 905,400 x 875 크기의 PPG 데이터
레이블: 905,400 x 2 크기의 BP 데이터
subject_idx: 각 샘플의 피험자 소속(크기 905,400 x 1)

In [15]:
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers, models, Input, Model

# cnn model vary kernel size
from numpy import mean
from numpy import std
from numpy import dstack
from pandas import read_csv
from matplotlib import pyplot

from tensorflow.keras.utils import to_categorical
import tensorflow.keras

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv1D, Dense, Dropout, Flatten, Reshape, LSTM, Conv1D, MaxPooling1D, Bidirectional, ConvLSTM1D, GRU
from tensorflow.keras.layers import Input, TimeDistributed, Conv1D, MaxPooling1D, BatchNormalization, GlobalAveragePooling1D, Activation, UpSampling1D
from tensorflow.python.keras.utils import np_utils
from sklearn.metrics import precision_score, recall_score, f1_score
from tensorflow.keras.layers import add, multiply, GlobalAveragePooling1D
from sklearn.linear_model import LinearRegression

from typing import Any, Dict, List, Tuple
import matplotlib.pyplot as plt
import seaborn as sns
from keras import  backend as K
import numpy as np
import pandas as pd
from tensorflow.keras.utils import to_categorical
import matplotlib.pyplot as plt
import seaborn as sns


In [2]:
import h5py

# 잠금 없이 파일 열기
with h5py.File('MIMIC-III_ppg_dataset.h5', 'r', locking=False) as f:
    dataset_names = list(f.keys())
    print("Available datasets:", dataset_names)

Available datasets: ['label', 'ppg', 'subject_idx']


In [3]:
# .h5 파일 열기
with h5py.File('MIMIC-III_ppg_dataset.h5', 'r', locking=False) as f:
    # 특정 데이터셋 불러오기 (예: 'label' 데이터셋)
    data_label = f['label'][:]
    data_ppg = f['ppg'][:]
    data_subject_idx = f['subject_idx'][:]
    
    print(data_label)
    print(data_ppg)
    print(data_subject_idx)

[[ 98  57]
 [100  56]
 [115  60]
 ...
 [143  66]
 [119  63]
 [146  74]]
[[-0.5025518  -0.5326032  -0.5624843  ... -0.28123745 -0.27767116
  -0.2727553 ]
 [-0.32402784 -0.3677722  -0.41105923 ...  0.660843    0.5616221
   0.4570674 ]
 [ 0.0789502  -0.02330504 -0.12383569 ... -0.04361962 -0.03981035
  -0.03410682]
 ...
 [-0.22356206 -0.27264422 -0.32297283 ... -0.0673175  -0.06447504
  -0.06026017]
 [-0.03491007 -0.1202258  -0.20588434 ...  0.28017786  0.28330782
   0.2905485 ]
 [ 0.38536417  0.43365255  0.48704892 ... -0.30071005 -0.29895812
  -0.2964506 ]]
[[   0]
 [   0]
 [   0]
 ...
 [4554]
 [4554]
 [4554]]


In [4]:
print(data_label.shape)
print(data_ppg.shape)
print(data_subject_idx)

(9054000, 2)
(9054000, 875)
[[   0]
 [   0]
 [   0]
 ...
 [4554]
 [4554]
 [4554]]


data_label 값 중 앞부분은 수축기 혈압(SBP), 뒷 부분은 이완기 혈압(DBP)

In [5]:
# 각각의 데이터를 DataFrame으로 변환
df_label = pd.DataFrame(data_label, columns=['SBP', 'DBP'])
df_ppg = pd.DataFrame(data_ppg)
df_subject_idx = pd.DataFrame(data_subject_idx, columns=['subject_idx'])

In [6]:
# 세 개의 DataFrame을 하나로 합치기
df = pd.concat([df_subject_idx, df_label, df_ppg], axis=1)

# 결과 확인
print(df.head())

   subject_idx  SBP  DBP         0         1         2         3         4  \
0            0   98   57 -0.502552 -0.532603 -0.562484 -0.591756 -0.620017   
1            0  100   56 -0.324028 -0.367772 -0.411059 -0.453427 -0.494457   
2            0  115   60  0.078950 -0.023305 -0.123836 -0.221000 -0.313295   
3            0   83   55  0.233394  0.251590  0.267639  0.279521  0.285665   
4            0  109   58 -0.173036 -0.190641 -0.208551 -0.226650 -0.244931   

          5         6  ...       865       866       867       868       869  \
0 -0.646934 -0.672269  ... -0.257694 -0.267066 -0.274259 -0.279373 -0.282512   
1 -0.533794 -0.571164  ...  1.077976  1.055641  1.019408  0.970138  0.908652   
2 -0.399417 -0.478317  ... -0.012682 -0.025805 -0.034969 -0.040991 -0.044514   
3  0.285092  0.277481  ... -0.217937 -0.226153 -0.233065 -0.238201 -0.241181   
4 -0.263568 -0.282928  ... -0.305273 -0.304013 -0.303961 -0.304851 -0.306394   

        870       871       872       873       87

In [7]:
# 데이터 샘플링 (예: 10%)
df_sample = df.sample(frac=0.1, random_state=42)

# 특징과 타겟 설정
features = df_sample.drop(['subject_idx', 'SBP', 'DBP'], axis=1)
target_sbp = df_sample['SBP']
target_dbp = df_sample['DBP']

In [9]:
# 특징(features)과 타겟(target) 변수 설정
'''
features = df.drop(['subject_idx', 'SBP', 'DBP'], axis=1)  # 'subject_idx', 'SBP', 'DBP'를 제외한 나머지가 특징
target_sbp = df['SBP']  # SBP 예측
target_dbp = df['DBP']  # DBP 예측
'''

"\nfeatures = df.drop(['subject_idx', 'SBP', 'DBP'], axis=1)  # 'subject_idx', 'SBP', 'DBP'를 제외한 나머지가 특징\ntarget_sbp = df['SBP']  # SBP 예측\ntarget_dbp = df['DBP']  # DBP 예측\n"

In [11]:
# 학습 데이터와 테스트 데이터로 분리
from sklearn.model_selection import train_test_split
X_train_sbp, X_test_sbp, y_train_sbp, y_test_sbp = train_test_split(features, target_sbp, test_size=0.2, random_state=42)
X_train_dbp, X_test_dbp, y_train_dbp, y_test_dbp = train_test_split(features, target_dbp, test_size=0.2, random_state=42)

In [16]:
# SBP 회귀 모델
from sklearn.linear_model import LinearRegression
model_sbp = LinearRegression()
model_sbp.fit(X_train_sbp, y_train_sbp)

# DBP 회귀 모델
model_dbp = LinearRegression()
model_dbp.fit(X_train_dbp, y_train_dbp)

In [18]:
# 예측하기
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.metrics import confusion_matrix

predictions_sbp = model_sbp.predict(X_test_sbp)
predictions_dbp = model_dbp.predict(X_test_dbp)

# 평가하기
mse_sbp = mean_squared_error(y_test_sbp, predictions_sbp)
r2_sbp = r2_score(y_test_sbp, predictions_sbp)

mse_dbp = mean_squared_error(y_test_dbp, predictions_dbp)
r2_dbp = r2_score(y_test_dbp, predictions_dbp)

print(f"SBP Mean Squared Error: {mse_sbp}, R^2: {r2_sbp}")
print(f"DBP Mean Squared Error: {mse_dbp}, R^2: {r2_dbp}")

SBP Mean Squared Error: 580.9122290038129, R^2: -0.0003182634359701897
DBP Mean Squared Error: 156.2720063188497, R^2: -0.000467449777701745
