# Кластеризация qRS комплексов в ЭКГ

## Общий вид цикла

<img src="./SinusRhythmLabels.png">

## Экстрасистолы
Пример:

<img src="./Ventricular-extrasystole-ECG.jpg">

# Задание

Используя алгоритмы кластеризации разделить qRS комплесы ЭКГ на группы, схожие по форме.
Данные представлены в файлах:
* ecg200_samples.csv
* ecg208_samples.csv
* ecg231_samples.csv

Разметка комплексов представлена в файлах:
* ecg200_labels.csv
* ecg208 - разметки нет
* ecg231_labels.csv

Модель строится отдельно для каждого файла *samples.csv*

# Шаг 1 - Загрузка данных

см https://docs.scipy.org/doc/numpy/reference/generated/numpy.loadtxt.html

In [3]:
import pandas

ecg_200_data = pandas.read_csv('ecg_data/ecg200_samples.csv', sep='\t', header=None)
ecg_200_data

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,150,151,152,153,154,155,156,157,158,159
0,-0.00745,-0.00956,-0.01132,-0.00983,-0.00461,-0.00332,-0.00958,-0.01476,-0.01078,-0.00284,...,0.04990,0.05675,0.05876,0.05591,0.05242,0.04598,0.03500,0.02919,0.03800,0.05269
1,-0.02034,-0.02342,-0.03456,-0.03819,-0.03306,-0.03131,-0.03719,-0.04122,-0.03588,-0.02745,...,-0.00556,-0.01323,-0.02601,-0.02585,-0.00938,0.00555,0.00287,-0.01076,-0.01725,-0.01127
2,0.01703,0.01536,0.00275,-0.00381,0.00121,0.00735,0.00714,0.00419,0.00432,0.00774,...,0.06653,0.07234,0.06643,0.05492,0.04630,0.04414,0.04909,0.05742,0.06016,0.05280
3,-0.02700,-0.02565,-0.01400,-0.00044,0.00150,-0.00818,-0.01453,-0.01015,-0.00424,-0.00533,...,-0.04809,-0.03343,-0.02629,-0.02891,-0.03429,-0.03864,-0.04201,-0.04312,-0.04122,-0.04076
4,-0.01565,-0.01184,-0.00457,0.00589,0.01771,0.01986,0.00653,-0.00746,-0.00391,0.01316,...,0.07644,0.06071,0.05958,0.06995,0.07625,0.07368,0.07147,0.07439,0.07607,0.07075
5,-0.01442,-0.01895,-0.02782,-0.03053,-0.02795,-0.02435,-0.02003,-0.01788,-0.02319,-0.03245,...,-0.01144,-0.01486,-0.02578,-0.03012,-0.02213,-0.01222,-0.01081,-0.01589,-0.02297,-0.03216
6,-0.03485,-0.03484,-0.02619,-0.02210,-0.02519,-0.02848,-0.03041,-0.03229,-0.02928,-0.01894,...,0.03574,0.03772,0.03771,0.03240,0.02833,0.02838,0.02769,0.02525,0.02733,0.03348
7,-0.01815,-0.01594,-0.01719,-0.01385,-0.00760,-0.00572,-0.00587,-0.00072,0.00617,0.00323,...,-0.00916,0.00294,0.00104,-0.01077,-0.01585,-0.00974,-0.00316,-0.00233,-0.00239,-0.00178
8,-0.00409,-0.00220,0.00231,0.01138,0.02153,0.02398,0.01723,0.01250,0.01933,0.03015,...,0.00688,0.01109,0.02021,0.02200,0.01427,0.00532,-0.00025,-0.00157,0.00572,0.01854
9,-0.00842,-0.00374,0.00415,0.00915,0.00722,0.00127,-0.00105,0.00337,0.00764,0.00412,...,0.00630,0.00949,0.01333,0.00813,0.00016,-0.00008,0.00535,0.00921,0.01151,0.01321


### Шаг 1.1(Опционально) - Подготовка данных (нормализация)

array / max(array), array - matrix row

# Шаг 2 - Загрузка разметки

In [72]:
ecg_200_labels = pandas.read_csv('ecg_data/ecg200_labels.csv', sep='\t', header=None)
# ecg_200_labels
# ecg_200_data.describe()
type(ecg_200_labels.transpose().values)
# ecg_200_labels.transpose().values

numpy.ndarray

# Шаг 3 - Кластеризация комплексов

см: 
* http://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html
* http://scikit-learn.org/stable/modules/generated/sklearn.cluster.AgglomerativeClustering.html
* http://scikit-learn.org/stable/modules/generated/sklearn.cluster.DBSCAN.html

In [55]:
from sklearn.cluster import KMeans
import numpy as np

kmeans = KMeans(n_clusters=2, random_state=0).fit(ecg_200_data)
# kmeans.labels_
kmeans.cluster_centers_
kmeans.get_params


<bound method BaseEstimator.get_params of KMeans(algorithm='auto', copy_x=True, init='k-means++', max_iter=300,
    n_clusters=2, n_init=10, n_jobs=1, precompute_distances='auto',
    random_state=0, tol=0.0001, verbose=0)>

# Шаг 4 - Визуализация полученного разбиения:

In [40]:
# import matplotlib.pyplot as plt
# from sklearn.datasets.samples_generator import make_blobs
# X, y_true = make_blobs(n_samples=300, centers=4,
#                        cluster_std=0.60, random_state=0)
# plt.scatter(X[:, 0], X[:, 1], s=50);

# from sklearn.cluster import KMeans
# kmeans = KMeans(n_clusters=4)
# kmeans.fit(X)
# y_kmeans = kmeans.predict(X)



In [58]:
# plt.scatter(X[:, 0], X[:, 1], c=y_kmeans, s=50, cmap='viridis')
# plt.scatter(ecg_200_data[:, 0], ecg_200_data[:, 1], c=y_kmeans, s=50, cmap='viridis')

# centers = kmeans.cluster_centers_
# plt.scatter(centers[:, 0], centers[:, 1], c='black', s=200, alpha=0.5);

# Шаг 5 - Оценка качества работы выбранных алгоритмов 

В записимости от исходных данных выбираем метрики.

см:
* http://scikit-learn.org/stable/modules/generated/sklearn.metrics.silhouette_score.html
* http://scikit-learn.org/stable/modules/generated/sklearn.metrics.homogeneity_score.html
* http://scikit-learn.org/stable/modules/generated/sklearn.metrics.completeness_score.html
* http://scikit-learn.org/stable/modules/generated/sklearn.metrics.v_measure_score.html

In [73]:
from sklearn import metrics
print("Homogeneity: %0.3f" % metrics.homogeneity_score(ecg_200_labels.transpose().values, kmeans.labels_))

ValueError: labels_true must be 1D: shape is (1, 2568)

# Шаг 6 - Подбор параметров для алгоритмов кластеризации
- число кластеров
- функция расстояния
- и т.д.

# Шаг 7 - Оценка окончательного скора, выводы