# Data access and visualization

The first thing one must do is to load the data and being able to visualize it.

In [None]:
import core.dataloader as crloader
import core.data_plots as crplt
import core.preprocesses as crpre

In [None]:
# Load the data
data = crloader.load_data(data_path='../data/physionet.org/files/ptb-xl/1.0.2',
                       sampling_rate=100)
print("\nData loaded.\n")

# Access a specific patient's data
patient_id = 9898
ecg_ids = crloader.get_patient_id_ecg_ids(patient_id=patient_id,
                                          annotations=data['train']['annotations'])

print(f"Patient {patient_id} has {len(ecg_ids)} ECGs.")
ecg_id, ecg_date = ecg_ids[-1]  # most recent

signals = crloader.get_signal_from_ecg_id(ecg_id=ecg_id,
                                          raw_data=data['train']['data'],
                                          channel=-1)

annots = crloader.get_annotations_from_ecg_id(ecg_id=ecg_id,
                                              annotations=data['train']['annotations'])

In [None]:
# Visualize the ECG signal for all channels and annotations
data_display = crplt.plot_ecg_channels(raw_data=data['train']['data'][ecg_id],
                                       title=f"ECG ID {ecg_id} from {ecg_date}")

print(annots)

# Data preprocess

By preprocessing the signals, one can make them smoother, remove outliers, etc.

### Filtering

A common pre-process when working with signals is smoothing/filtering. That allows to remove some outliers and noise from the signal for a better analysis.

Some of the most used signal filtering techniques are:
- Savitzky-Golay filter
- Gaussian filter
- Median filter
- Low-pass filter
- High-pass filter
- Butterworth filter (band-pass filter)
- Convolution filter

The biggest challenge of filtering is the manual tunning. Finding the right parameters is a empirical work.

In [None]:
channel = 0
original_signal = signals[:, channel]
savgol_ecg = crpre.smooth_signal_savgol(ecg_signal=original_signal,
                                        window_length=5,
                                        polyorder = 2)
crplt.plot_filtered_signal(ecg_signal=original_signal,
                           smoothed_ecg=savgol_ecg,
                           title="Savitzky-Golay filter")

In [None]:
gaussian_ecg = crpre.smooth_signal_gaussian(ecg_signal=original_signal, sigma=3)
crplt.plot_filtered_signal(ecg_signal=original_signal,
                           smoothed_ecg=gaussian_ecg,
                           title="Gaussian filter")

In [None]:
median_ecg = crpre.smooth_signal_median(ecg_signal=original_signal, kernel_size=3)
crplt.plot_filtered_signal(ecg_signal=original_signal,
                           smoothed_ecg=median_ecg,
                           title="Median filter")

In [None]:
lowcut = 45
lowpass_ecg = crpre.smooth_signal_lowpass(ecg_signal=original_signal,
                                          sample_rate=100,
                                          order_filter=5,
                                          cut=lowcut)
crplt.plot_filtered_signal(ecg_signal=original_signal,
                           smoothed_ecg=lowpass_ecg,
                           title=f"Low-pass filter at {lowcut} Hz")

In [None]:
highcut = 0.5
highpass_ecg = crpre.smooth_signal_highpass(ecg_signal=original_signal,
                                          sample_rate=100,
                                          order_filter=5,
                                          cut=highcut)
crplt.plot_filtered_signal(ecg_signal=original_signal,
                           smoothed_ecg=lowpass_ecg,
                           title=f"High-pass filter at {highcut} Hz")

In [None]:
lowcut = 0.5  # avoid the breathing noise
highcut = 45  # avoid power-line noise
band_ecg = crpre.smooth_signal_butterworth(ecg_signal=original_signal,
                                           sample_rate=100,
                                           order_filter=5,
                                           lowcut=lowcut,
                                           highcut=highcut)
crplt.plot_filtered_signal(ecg_signal=original_signal,
                           smoothed_ecg=band_ecg,
                           title=f"Butterworth filter ({lowcut}Hz - {highcut}Hz)")

In [None]:
kernel = 7
conv_ecg = crpre.smooth_signal_convolution(ecg_signal=original_signal,
                                           kernel=kernel)
crplt.plot_filtered_signal(ecg_signal=original_signal,
                           smoothed_ecg=conv_ecg,
                           title=f"Convolution filter (kernel wide {kernel})")

As mentioned, tunning a filter is hard work. As an example, I show the influence of difference frequency cuts on a low-filter.

In [None]:
lowpass_ecgs = []
cutoffs = []
for lowcut in range(40, 50, 2):
    lowpass_ecg = crpre.smooth_signal_lowpass(ecg_signal=original_signal,
                                            sample_rate=100,
                                            order_filter=5,
                                            cut=lowcut)
    lowpass_ecgs.append(lowpass_ecg)
    cutoffs.append(f"{lowcut}Hz")

crplt.plot_filtered_signals(ecg_signal=original_signal,
                           smoothed_ecgs=lowpass_ecgs,
                           labels=cutoffs,
                           title="Low-pass filter search")

An application of the filtering is to remove the baseline wander.

Baseline wander is a typical artifact that corrupts the ECG. It can be caused by a variety of noise sources including respiration, body movements, and poor electrode contact. Its spectral content is usually confined to frequencies below 0.5 Hz.

The majority of baseline wander removal techniques can change the ECG and compromise its clinical relevance. For that reason, it is not a easy process.

A very basic baseline wander estimator was implemented using a sequence of median filter with different kernel sizes. The kernel size is estimated based on the sampling rate and the window duration in seconds.

In [None]:
wander = crpre.estimate_baseline_wander(ecg_signal=original_signal,durations=[0.5, 2], sample_rate=100)
rem_wander_ecg = crpre.remove_baseline_wander(ecg_signal=original_signal,durations=[0.5, 2], sample_rate=100)

crplt.plot_filtered_signals(ecg_signal=original_signal,
                           smoothed_ecgs=[wander, rem_wander_ecg],
                           labels=['estimated wander', 'filtered'],
                           title="Remove baseline wander")

# Data analysis

Once we have the ECG signal, there is some basic analysis that one can do.
One of the most relevant information from an ECG is to look at the [QRS complex](https://en.wikipedia.org/wiki/QRS_complex).
In layman terms:
- R peak are the highest peaks
- Q peaks are the minimum peak before the R peak
- S peaks are the minimum peak after the R peak

The Q and S are estimated based on the R peaks. And from the R peaks one can estimate the heart rate.

Some of the most used detectors are:
- Pan and Tompkins
- Hamilton
- Christov
- Stationary Wavelet Transform
- Two Moving Average

And you can find an implementation [here](https://github.com/berndporr/py-ecg-detectors).
After trying it, I was not satisfied with the results. In most of the cases the R peaks were completly off.

I implemented my own [Pan and Tompkins filter](https://en.wikipedia.org/wiki/Pan%E2%80%93Tompkins_algorithm).

In [None]:
from core.pan_tompkins import PanTompkinsQRS
peak_detector = PanTompkinsQRS(signal=original_signal, sample_rate=100, window_size=0.15)

crplt.plot_signal(signal=peak_detector.band_pass_sgn,
                  xlabel="Samples",
                  ylabel="Amplitude",
                  title="Bandpassed signal")

crplt.plot_signal(signal=peak_detector.mov_win_sgn,
                  xlabel="Samples",
                  ylabel="Amplitude",
                  title="Moving window integrated signal")

peak_detector.find_r_peaks()

crplt.plot_signal_and_rpeaks(signal=original_signal,
                             rpeaks_loc=peak_detector.tuned_peaks,
                             xlabel="Samples",
                             ylabel="Amplitude",
                             title="R peaks")

heart_bpm = peak_detector.estimate_heartrate()
print(f"Heart rate: {heart_bpm:.2f} bpm")
