# Intro
This notebook reads all the CSV files that were generated by notebook "02_import_mitdb_data" and combines them all in 3 new CSV files that can be used for training your model. More specifically the following is done:
* Read all CSV files generated by the "02_import_mitdb_data" notebook.
* Shuffle all records randomly.
* Save 60% of the records in a training CSV file.
* Save 20% of the records in a validation CSV file.
* Save 20% of the records in a test CSV file.

# Initialize
Import dependencies.

In [None]:
from glob import glob
import numpy as np

# Load data
Read all the CSV files into memory.

In [None]:
alldata = np.empty(shape=[0, 188])
print(alldata.shape)
paths = glob('data_ecg/*.csv')
for path in paths:
    print('Loading ', path)
    csvrows = np.loadtxt(path, delimiter=',')
    alldata = np.append(alldata, csvrows, axis=0)

# Shuffle and separate the data
All records are shuffled randomly, then split into training, validation, and testing groups.

In [None]:
# Randomly mix rows
np.random.shuffle(alldata)
totrows = len(alldata)
trainrows = int((totrows * 3 / 5) + 0.5) # 60%
testrows = int((totrows * 1 / 5) + 0.5) # 20%
validaterows = totrows - trainrows - testrows # 20%
mark1 = trainrows
mark2 = mark1 + testrows

# Save data
Data is saved in 3 separate CSV files: training, validation, testing.

In [None]:
with open('train.csv', "wb") as fin:
    np.savetxt(fin, alldata[:mark1], delimiter=",", fmt='%f')

In [None]:
with open('test.csv', "wb") as fin:
    np.savetxt(fin, alldata[mark1:mark2], delimiter=",", fmt='%f')

In [None]:
with open('validate.csv', "wb") as fin:
    np.savetxt(fin, alldata[mark2:], delimiter=",", fmt='%f')