# Run models

In Notebook 3 we learn how to train and predict built-in ECG models. We consider [fft_inceprion](https://github.com/analysiscenter/ecg/blob/unify_models/doc/fft_model.md) model as an example. This model learns to recognize atrial fibrillation (AF) from single lead ECG signal. Input of the model is ECG signal and meta, output is probability of signal being AF and non-AF. See more on ecg models [here](https://github.com/analysiscenter/ecg/blob/unify_models/doc/models.md)



Some necessary imports before to start. Note ```ModelEcgBatch``` that contains models is imported rather than plain ```EcgBatch```:

In [1]:
import sys
import os

sys.path.append(os.path.join("..", "..", ".."))

import numpy as np
from sklearn.metrics import f1_score
import ecg.dataset as ds
from ecg.batch import ModelEcgBatch

Using TensorFlow backend.


Then we create an ECG dataset (see [Notebook 1](https://github.com/analysiscenter/ecg/blob/unify_models/doc/ecg_tutorial_part_1.ipynb) for details):

In [2]:
index = ds.FilesIndex(path="/notebooks/data/ECG/training2017/*.hea", no_ext=True, sort=True)
eds = ds.Dataset(index, batch_class=ModelEcgBatch)

Now we want to divide the dataset into 2 parts that will be used for train and validation. Method ```cv_split``` do this job:

In [3]:
eds.cv_split(0.8, shuffle=True)

Now 80% of the dataset are in ```eds.train``` and the rest in ```eds.test```.

Let's define a preprocess pipeline. Here we
* load signal, meta and target labels
* drop noise signals
* replace all non-AF labels with "NO" label
* resample signal
* drop too short signals
* generate a number of segments from each signal
* binarize labels to 0 and 1
* prepare signal to expected model input format.

In [4]:
preprocess_pipeline = (ds.Pipeline()
                       .load(fmt="wfdb", components=["signal", "meta"])
                       .load(src="/notebooks/data/ECG/training2017/REFERENCE.csv",
                             fmt="csv", components="target")
                       .drop_labels(["~"])
                       .replace_labels({"N": "NO", "O": "NO"})
                       .random_resample_signals("normal", loc=300, scale=10)
                       .drop_short_signals(4000)
                       .segment_signals(3000, 3000)
                       .binarize_labels()
                       .apply(np.transpose, [0, 2, 1])
                       .ravel())

## Train pipeline

Train pipeline is preprocess pipeline plus ```train_on_batch``` action. We exploit pipeline algebra to merge two pipelines:

In [5]:
with ds.Pipeline() as p:
    fft_train_pipeline = (preprocess_pipeline +
                          p.train_on_batch('fft_inception', metrics=f1_score, average='macro'))

Then we only have to pass dataset to pipeline and start the calculation. Depending of your hardware training may take a while. Reduce ```n_epochs``` if you do not want to wait long:

In [6]:
fft_trained = (eds.train >> fft_train_pipeline).run(batch_size=500, shuffle=True,
                                                    drop_last=True, n_epochs=100)

  'precision', 'predicted', average, warn_for)


As a result we obtain ```fft_trained``` that contains trained model. Let's make a prediction!

## Predict pipeline

Predict pipeline is preprocess pipeline plus ```import_model``` action plus ```predict_on_batch``` action. Model can be imported from dump file or from pipeline in which model was trained. We show the second option since we have ```fft_trained```: 

In [102]:
fft_predict_pipeline = (ds.Pipeline()
                        .import_model('fft_inception', fft_trained)
                        .init_variable("prediction", init=list, init_on_each_run=True)   
                        .load(fmt="wfdb", components=["signal", "meta"])
                        .random_resample_signals("normal", loc=300, scale=10)
                        .drop_short_signals(4000)
                        .random_segment_signals(3000, n_segments=1)
                        .apply(np.transpose, [0, 2, 1])
                        .ravel()
                        .predict_on_batch('fft_inception'))

Note that we aslo add action ```init_variable```. It defines empty list ```prediction``` that will store output of the model.

Let's make a prediction on a sample ECG, say ECG with index "A00001". Create a dataset:

In [110]:
index = ds.FilesIndex(path="/notebooks/data/ECG/training2017/A00001.hea", no_ext=True)
sample = ds.Dataset(index, batch_class=ModelEcgBatch)

To start caclulation we pass ```sample``` into pipeline and call action ```run```:

In [111]:
predicted = (sample >> fft_predict_pipeline).run(batch_size=1, shuffle=False, n_epochs=1)

To see the output we read pipeline variable ```prediction```:

In [114]:
print(predicted.get_variable('prediction'))

[array([[  4.89614147e-04,   9.99510407e-01]], dtype=float32)]


Prediction is a list of probabilities for signal being AF and non-AF (the propabilities sum to 1). If the first value exceeds 0.5 we assign signal to be AF. In our example ```A00001``` is non-AF.

This is the end of Notebook 3. Here we learned:
* how to train models
* how to make predictions.

See previous topics in [Notebook 1](https://github.com/analysiscenter/ecg/blob/unify_models/doc/ecg_tutorial_part_1.ipynb) and [Notebook 2](https://github.com/analysiscenter/ecg/blob/unify_models/doc/ecg_tutorial_part_2.ipynb).