## Intro

This notebook reads all the CSV files that were generated by notebook "a02_import_mitdb_data" and combines them all in 3 new CSV files that can be used for training your model. More specifically the following is done:

- Read all CSV files generated by the "02_import_mitdb_data" notebook.
- Shuffle all records randomly.
- Save 60% of the records in a training CSV file.
- Save 20% of the records in a validation CSV file.
- Save 20% of the records in a test CSV file.

In [1]:
from glob import glob
import numpy as np

## Load Data

In [3]:
alldata = np.empty(shape=[0, 188])
print(alldata.shape)
paths = glob('../data/processed/mitdb/*.csv')
for path in paths:
    print('Loading ', path)
    csvrows = np.loadtxt(path, delimiter=',')
    alldata = np.append(alldata, csvrows, axis=0)

(0, 188)
Loading  ../data/processed/mitdb/213_MLII.csv
Loading  ../data/processed/mitdb/207_MLII.csv
Loading  ../data/processed/mitdb/217_V1.csv
Loading  ../data/processed/mitdb/103_MLII.csv
Loading  ../data/processed/mitdb/220_V1.csv
Loading  ../data/processed/mitdb/232_V1.csv
Loading  ../data/processed/mitdb/228_V1.csv
Loading  ../data/processed/mitdb/200_V1.csv
Loading  ../data/processed/mitdb/121_MLII.csv
Loading  ../data/processed/mitdb/208_MLII.csv
Loading  ../data/processed/mitdb/104_V5.csv
Loading  ../data/processed/mitdb/100_V5.csv
Loading  ../data/processed/mitdb/111_V1.csv
Loading  ../data/processed/mitdb/115_V1.csv
Loading  ../data/processed/mitdb/217_MLII.csv
Loading  ../data/processed/mitdb/101_MLII.csv
Loading  ../data/processed/mitdb/115_MLII.csv
Loading  ../data/processed/mitdb/112_V1.csv
Loading  ../data/processed/mitdb/101_V1.csv
Loading  ../data/processed/mitdb/203_MLII.csv
Loading  ../data/processed/mitdb/209_MLII.csv
Loading  ../data/processed/mitdb/118_V1.csv
Loa

## Shuffle and Separate

All records are shuffled randomly, then split into training, validation, and testing groups.

In [4]:
np.random.shuffle(alldata)
totrows = len(alldata)
trainrows = int((totrows * 3 / 5) + 0.5) # 60%
testrows = int((totrows * 1 / 5) + 0.5) # 20%
validaterows = totrows - trainrows - testrows # 20%
mark1 = trainrows
mark2 = mark1 + testrows

## Save Data

Data is saved in 3 separate CSV files: training, validation, and testing.

In [6]:
with open('../data/interim/mitdb/train.csv', "wb") as fin:
    np.savetxt(fin, alldata[:mark1], delimiter=",", fmt='%f')

In [7]:
with open('../data/interim/mitdb/test.csv', "wb") as fin:
    np.savetxt(fin, alldata[mark1:mark2], delimiter=",", fmt='%f')

In [8]:
with open('../data/interim/mitdb/validate.csv', "wb") as fin:
    np.savetxt(fin, alldata[mark2:], delimiter=",", fmt='%f')