# **Постановка задачи и данные**

Цель задачи — **бинарная классификация**: определить, является ли человек **здоровым** по результатам ЭКГ.  
Целевая переменная — **Healthy**:  
- `1` — здоров  
- `0` — потенциально аномальный

## **Данные**

Входной датасет — табличные данные с признаками, извлечёнными из сигналов ЭКГ. В него входят:

- `rr_interval` — RR интервал (мс)
- `p_onset` — начало P-волны (мс)
- `p_end` — конец P-волны (мс)
- `qrs_onset` — начало QRS-комплекса (мс)
- `qrs_end` — конец QRS-комплекса (мс)
- `t_end` — конец T-волны (мс)
- `p_axis` — ось P (°)
- `qrs_axis` — ось QRS (°)
- `t_axis` — ось T (°)
- `Healthy` — целевая переменная (0 или 1)

In [11]:
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import Dataset, DataLoader, TensorDataset
import torch.nn.functional as F
from tqdm import tqdm

from sklearn.metrics import accuracy_score, f1_score
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split

import numpy as np
import pandas as pd

# **Загрузка и обработка данных**

In [2]:
df = pd.read_csv("ecg_data.csv")

df.replace(29990, np.nan, inplace=True)

df = df.fillna(df.median(numeric_only=True))

X = df.drop(columns=["Healthy"]).values
y = df["Healthy"].values.reshape(-1, 1)

scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

X_train, X_test, y_train, y_test = train_test_split(
    X_scaled, y, test_size=0.2, random_state=42, stratify=y
)

X_train_tensor = torch.tensor(X_train, dtype=torch.float32)
y_train_tensor = torch.tensor(y_train, dtype=torch.float32)

X_test_tensor = torch.tensor(X_test, dtype=torch.float32)
y_test_tensor = torch.tensor(y_test, dtype=torch.float32)

train_dataset = TensorDataset(X_train_tensor, y_train_tensor)
test_dataset = TensorDataset(X_test_tensor, y_test_tensor)

train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=64, shuffle=False)

# **Архитектура нейронной сети**

In [8]:
model = nn.Sequential(
    nn.Linear(X_train.shape[1], 64),
    nn.ReLU(),
    nn.Dropout(0.3),
    nn.Linear(64, 1) 
)

optimizer = torch.optim.Adam(model.parameters(), lr=0.001)

In [9]:
def train(model, optimizer, train_loader, val_loader, n_epochs=10):
    for epoch in range(n_epochs):
        model.train()
        total_loss = 0

        for x_batch, y_batch in tqdm(train_loader):
            y_pred = model(x_batch).squeeze(1)
            loss = F.binary_cross_entropy_with_logits(y_pred, y_batch.squeeze())
            loss.backward()
            optimizer.step()
            optimizer.zero_grad()
            total_loss += loss.item()

        print(f"[{epoch+1}] Train Loss: {total_loss / len(train_loader):.4f}")

        # Валидация
        if (epoch + 1) % 2 == 0:
            model.eval()
            val_loss = []
            val_preds = []
            val_trues = []
            with torch.no_grad():
                for x_val, y_val in val_loader:
                    y_pred = model(x_val).squeeze(1)
                    loss = F.binary_cross_entropy_with_logits(y_pred, y_val.squeeze())
                    val_loss.append(loss.item())
                    probs = torch.sigmoid(y_pred)
                    preds = (probs >= 0.5).int()
                    val_preds.extend(preds.cpu().numpy())
                    val_trues.extend(y_val.cpu().numpy())

            acc = accuracy_score(val_trues, val_preds)
            f1 = f1_score(val_trues, val_preds)
            print(f"    Val Loss: {np.mean(val_loss):.4f}, Accuracy: {acc:.4f}, F1: {f1:.4f}")

In [12]:
train(model, optimizer, train_loader, test_loader, n_epochs=10)

100%|██████████| 8/8 [00:00<00:00, 500.45it/s]


[1] Train Loss: 0.6990


100%|██████████| 8/8 [00:00<00:00, 595.94it/s]


[2] Train Loss: 0.6438
    Val Loss: 0.6373, Accuracy: 0.7833, F1: 0.8785


100%|██████████| 8/8 [00:00<00:00, 784.62it/s]


[3] Train Loss: 0.6213


100%|██████████| 8/8 [00:00<00:00, 725.97it/s]


[4] Train Loss: 0.5928
    Val Loss: 0.5857, Accuracy: 0.8000, F1: 0.8889


100%|██████████| 8/8 [00:00<00:00, 731.94it/s]


[5] Train Loss: 0.5720


100%|██████████| 8/8 [00:00<00:00, 514.13it/s]


[6] Train Loss: 0.5494
    Val Loss: 0.5460, Accuracy: 0.8000, F1: 0.8889


100%|██████████| 8/8 [00:00<00:00, 638.56it/s]


[7] Train Loss: 0.5352


100%|██████████| 8/8 [00:00<00:00, 533.65it/s]


[8] Train Loss: 0.5215
    Val Loss: 0.5162, Accuracy: 0.8000, F1: 0.8889


100%|██████████| 8/8 [00:00<00:00, 796.54it/s]


[9] Train Loss: 0.5054


100%|██████████| 8/8 [00:00<00:00, 679.31it/s]

[10] Train Loss: 0.4873
    Val Loss: 0.4952, Accuracy: 0.8000, F1: 0.8889





In [15]:
df[df['p_axis'] < 0]

Unnamed: 0,Healthy,rr_interval,p_onset,p_end,qrs_onset,qrs_end,t_end,p_axis,qrs_axis,t_axis
6,1,606,40,156,166,296,516,-6,110,-52
23,1,833,40,162,286,376,682,-23,9,23
49,1,706,337,29999,505,613,876,-2,-31,51
60,1,1016,40,162,290,404,754,-82,60,7
71,1,1111,333,29999,504,589,954,-5,21,-3
83,1,937,40,150,190,268,590,-21,20,68
132,1,937,40,172,236,334,678,-1,-2,3
135,1,952,40,138,206,296,600,-18,36,29
137,1,789,40,124,182,270,566,-8,-25,-172
176,1,1224,40,152,198,296,688,-21,-39,-16
