<a href="https://colab.research.google.com/github/KimJisanER/medical_ai/blob/main/%5Bopen%5D_02_lowbp_preprocessing.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 의료인공지능 전문가 양성과정 2022
## VitalDB Tutorial <br> Low blood pressure prediction using arterial wave - preprocessing
- Date : Sep. 03, 2022
- Author : **Hyun-Lim Yang, Ph.D.**<br>
Research Assistant Professor @
Seoul National University Hospital <br>
Department of Anesthesiology and Pain Medicine
- E-mail : hlyang{_at_}snu{_dot_}ac{_dot_}kr
***

In [1]:
from IPython.display import HTML
style_warn = "<style>div.warn { background-color: #fcf2f2;border-color: #dFb5b4; border-left: 5px solid #dfb5b4; padding: 0.5em;}</style>"
HTML(style_warn)

### Import packages

<div class="warn">**Warning!** : use your directory at `download_directory`</div>


> **Wargning** <br>
> colab 환경을 위한 google drive import 코드가 포함되어 있습니다. <br>
> 로컬 환경에서 실행 시 colab을 위한 import function들을 comment out 한 뒤 실행하세요. 

In [2]:
!pip install vitaldb

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting vitaldb
  Downloading vitaldb-1.2.3-py3-none-any.whl (51 kB)
[K     |████████████████████████████████| 51 kB 399 kB/s 
Collecting wfdb
  Downloading wfdb-4.0.0-py3-none-any.whl (161 kB)
[K     |████████████████████████████████| 161 kB 8.1 MB/s 
Collecting s3fs
  Downloading s3fs-2022.8.2-py3-none-any.whl (27 kB)
Collecting aiobotocore~=2.4.0
  Downloading aiobotocore-2.4.0-py3-none-any.whl (65 kB)
[K     |████████████████████████████████| 65 kB 4.2 MB/s 
Collecting fsspec==2022.8.2
  Downloading fsspec-2022.8.2-py3-none-any.whl (140 kB)
[K     |████████████████████████████████| 140 kB 68.2 MB/s 
[?25hCollecting botocore<1.27.60,>=1.27.59
  Downloading botocore-1.27.59-py3-none-any.whl (9.1 MB)
[K     |████████████████████████████████| 9.1 MB 68.9 MB/s 
Collecting aioitertools>=0.5.1
  Downloading aioitertools-0.10.0-py3-none-any.whl (23 kB)
Collecting jmespath<2.0.0,>=0.7

In [3]:
from google.colab import drive  # for colab
drive.mount('/content/gdrive/')  # for colab

Mounted at /content/gdrive/


In [4]:
import os
cloud_directory = '/content/gdrive/My Drive/KOHI_2022_CNN_data_open/'
os.listdir(cloud_directory) # for colab, check cloud directory mount

['sample_csv',
 'datasets',
 '__init__.py',
 '__pycache__',
 'kohi_CNN_model_archi.png',
 'kohi_preprocessor.py']

In [5]:
import sys
#download_directory = os.getcwd() # for local environments
#sys.path.append(download_directory) # for local environments
sys.path.append(cloud_directory) # for colab

In [6]:
import os
import sys
# download_directory = os.getcwd() # for local environments
# sys.path.append(download_directory) # for local environments

In [10]:
import numpy as np
import pandas as pd
import glob
import kohi_preprocessor as pre
import vitaldb
from tqdm import tqdm
import warnings
warnings.filterwarnings(action='ignore')

download_directory = cloud_directory # for colab

### Data loading
샘플 파일을 vitaldb 서버로 부터 직접 다운받아 로드

> **TODO:** `00010.vital`을 100hz로 변환하여 `chart_pd_01`에 DataFrame으로 저장하기

In [11]:
track_names = ["SNUADC/ART", "Solar8000/ART_MBP"]
### =========== Your code here ====================

vitalfile = vitaldb.VitalFile(10, track_names=track_names)
chart_pd_01 = vitalfile.to_pandas(track_names, interval=1/100)

### ===============================================


In [12]:
chart_pd_01

Unnamed: 0,SNUADC/ART,Solar8000/ART_MBP
0,,
1,,
2,,
3,,
4,,
...,...,...
2099215,,
2099216,,
2099217,,
2099218,,


In [13]:
col_mbp = 'Solar8000/ART_MBP'
col_art = 'SNUADC/ART'

### mean blood pressure 데이터 추출
> **TODO:** ART_MBP 변수들 중 값이 존재하는 것만 별도의 변수 `mbp_data_pd`에 저장

In [14]:
# mbp 데이터 추출
### =========== Your code here ====================

mbp_data_pd = chart_pd_01[col_mbp][chart_pd_01[col_mbp].notnull()]

### ===============================================

mbp_index = mbp_data_pd.index.values
print(mbp_data_pd.head())

708    -20.0
899    -20.0
1099   -20.0
1299   -20.0
1499   -20.0
Name: Solar8000/ART_MBP, dtype: float32


### arterial wave 데이터 추출 및 nan 지우기
> **TODO:** ART 변수들 중 값이 존재하지 않는 모든 부분을 0으로 채운 뒤 `art_full_pd`에 저장

In [15]:
# art 데이터 전체 추출 및 nan value 채우기
art_full_pd = chart_pd_01[col_art]

### =========== Your code here ====================

art_full_pd = art_full_pd.fillna(0)

### =========== Your code here ====================
print(art_full_pd.head())

0    0.0
1    0.0
2    0.0
3    0.0
4    0.0
Name: SNUADC/ART, dtype: float32


### 데이터셋 만들기
필요한 파라미터들 정의

In [17]:
# 필요한 파라미터들 정의
mdelay = 1 
srate = 100 
length = 20
max_limit_mbp = 250
min_limit_mbp = 0

입력 데이터 길이가 20s이고 black out (여기서는 mdelay) 기간이 1분 이니, mbp가 1분 20초 이내에 등장하는 것은 무시함

In [18]:
# mbp index를 1분 20초 뒤로 미룸
mbp_points = mbp_index[mbp_index > (mdelay*60*srate + length*srate)]
print(mbp_points)

[   8100    8300    8500 ... 2092959 2093159 2093359]


### Arterial wave segment 추출
> **TODO:** 현재 mbp 값을 기준으로, 1분 20초 전 ~ 1분 전 데이터 (즉, 입력 데이터 20s)를 arterial wave segment로 추출하여 `art_seg_list`에 저장하기

In [19]:
# 실습: arterial segment 추출하기
mbp_values_list = [mbp_data_pd[idx] for idx in mbp_points]

### =========== Your code here ====================

art_seg_list = [art_full_pd[idx-(mdelay*60*srate) - (length*srate) : 
                            idx - (mdelay*60*srate)].values 
                for idx in mbp_points]
### ===============================================

In [20]:
mbp_values_np = np.array(mbp_values_list)
art_seg_np = np.array(art_seg_list)
print(mbp_values_np.shape)
print(art_seg_np.shape)

(10428,)
(10428, 2000)


### 조건에 따라 filter들 정의

In [21]:
# 필터들 선언
# mbp min-max filter
mbp_max_filter = mbp_values_np < max_limit_mbp
mbp_min_filter = mbp_values_np > min_limit_mbp
mbp_filter = mbp_max_filter & mbp_min_filter

# abp range filter
art_filter_list = []
for seg in art_seg_np:
    filter_value = (np.array(seg) > 25.0).all() and (np.array(seg) < 250.0).all()
    art_filter_list.append(filter_value)
art_filter = np.array(art_filter_list)

# mstds 필터
mstds_values_list = []
for seg in tqdm(art_seg_np):
    if (np.array(seg) < 0.).any():
        mstds_values_list.append(float(0.))
    else:
        mstd_val, _ = pre.process_beat(seg)
        mstds_values_list.append(mstd_val)
mstds_filter = np.array(mstds_values_list) > 0.

100%|██████████| 10428/10428 [01:42<00:00, 101.67it/s]


전체 필터 하나로 만들기

In [22]:
all_filters = mbp_filter & art_filter & mstds_filter

### 필터 적용하여 데이터 추출
> **TODO:** `all_filters`를 이용하여 필터 통과한 데이터들만 추출해 `mbp_filtered`와 `art_filtered`에 저장

In [23]:
### =========== Your code here ====================

mbp_filtered = mbp_values_np[all_filters]
art_filtered = art_seg_np[all_filters]

### ===============================================

print(mbp_filtered.shape)
print(art_filtered.shape)

(8110,)
(8110, 2000)


데이터셋 정의

In [24]:
mbp_filtered[mbp_filtered < 60].shape

(640,)

### Binary label 정의
mbp가 61 미만인 데이터들의 label을 1로 지정하여 학습을 위한 y_label 생성

In [25]:
label_tmp = np.zeros(len(mbp_filtered))
label_tmp[mbp_filtered<61] = 1.

In [26]:
x_data = art_filtered
y_label = label_tmp

In [27]:
x_data.shape

(8110, 2000)

In [28]:
y_label.shape

(8110,)