# 🫀 Cardiac Arrhythmia Database

## 📌 Overview
This dataset is designed for the classification of cardiac arrhythmias based on ECG (Electrocardiogram) features. It aims to determine whether a patient has a normal heart condition or suffers from a specific type of arrhythmia.

---

## 📊 Dataset Summary

- **Number of Instances:** 452 patients
- **Number of Attributes:** 279 attributes (206 numeric, rest nominal)
- **Classes:** 16 (1 = Normal, 2–15 = specific arrhythmias, 16 = others)
- **Missing Values:** Present (`?`)
- **Data Source Date:** January 1998

---

## 🎯 Objective

The main goal is to:
- Detect the presence or absence of arrhythmia.
- Classify it into one of 16 predefined categories.
- Minimize classification errors compared to cardiologist diagnoses using machine learning techniques.

---

## 📂 Source Information

- **Original Creators:**
  - Dr. H. Altay Guvenir – Bilkent University, Dept. of Computer Engineering
  - Burak Acar, M.S. – Bilkent University, Dept. of Electrical Engineering
  - Dr. Haldun Muderrisoglu – Baskent University, School of Medicine
- **Donor:** Dr. H. Altay Guvenir
- **Location:** Ankara, Turkey

---

## 🧪 Past Research Use

> Guvenir et al. (1997), *"A Supervised Machine Learning Algorithm for Arrhythmia Analysis"*, Computers in Cardiology Conference, Lund, Sweden.
---

In [2]:
import numpy as np 
import pandas as pd 
import matplotlib.pyplot as plt

In [7]:
data = pd.read_csv(r"../data/arrhythmia.csv", header = None)
data.head(4)

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,270,271,272,273,274,275,276,277,278,279
0,75,0,190,80,91,193,371,174,121,-16,...,0.0,9.0,-0.9,0.0,0.0,0.9,2.9,23.3,49.4,8
1,56,1,165,64,81,174,401,149,39,25,...,0.0,8.5,0.0,0.0,0.0,0.2,2.1,20.4,38.8,6
2,54,0,172,95,138,163,386,185,102,96,...,0.0,9.5,-2.4,0.0,0.0,0.3,3.4,12.3,49.0,10
3,55,0,175,94,100,202,380,179,143,28,...,0.0,12.2,-2.2,0.0,0.0,0.4,2.6,34.6,61.6,1


## 💡 تفاصيل الخصائص (Attributes)

### ✅ أمثلة من الخصائص العامة:

1. `Age` — السن  
2. `Sex` — الجنس (0 = ذكر، 1 = أنثى)  
3. `Height`, `Weight` — الطول والوزن  
4. `Heart rate` — معدل ضربات القلب في الدقيقة  
5. `QRS duration`, `P-R interval`, `Q-T interval` — فترات ECG (خصائص نشاط القلب الكهربائي)

---

### ✅ خصائص قنوات ECG:

تشير القنوات مثل <br>
`DI`, `DII`, `DIII`, `AVR`, `AVL`, `AVF`, `V1`, `V6`<br>
(ECG)إلى نقاط مختلفة على الجسم تُستخدم لقياس الإشارات الكهربائية للقلب عبر جهاز  

كل قناة تحتوي على خصائص متعددة:

- **الزمن (Duration)** و**السعة (Amplitude)** للموجات:  
  - `P wave`, `Q wave`, `R wave`, `S wave`, `T wave`, `R'`, `S'`, و هكذا

- **خصائص شكلية (Morphological):**  
  - هل الموجة `diphasic`, `ragged`, `notched`, أو غير طبيعية؟  
  - وجود انقلابات في الموجات (Inverted waves)

- **مقاييس إضافية لكل قناة:**  
  - `QRSA`: المساحة تحت منحنى QRS  
  - `QRSTA`: المساحة بين QRS و T  
  - `QRS axis` و `T axis`: زاوية محور الإشارة

> ⚠️ بعض الخصائص موجودة في بعض القنوات فقط وليست كلها، وهناك خصائص مفقودة أو غير متوفرة (`?`).

---

In [14]:
with open("../data/arrhythmia_columns.md", "r", encoding="utf-8") as f:
    lines = f.readlines()

columns = [
    line.strip('- \n') for line in lines
    if line.startswith('- ') and not any(x in line for x in ['###', '##', '*', '—'])
]

columns = columns[:280]

data.columns = columns

In [15]:
data.head(4)

Unnamed: 0,age,sex,height,weight,qrs_duration,pr_interval,qt_interval,t_interval,p_interval,qrs_angle,...,q_amp_V3,r_amp_V3,s_amp_V3,r'_amp_V3,s'_amp_V3,p_amp_V3,t_amp_V3,qrsa_V3,qrsta_V3,jj_amp_V4
0,75,0,190,80,91,193,371,174,121,-16,...,0.0,9.0,-0.9,0.0,0.0,0.9,2.9,23.3,49.4,8
1,56,1,165,64,81,174,401,149,39,25,...,0.0,8.5,0.0,0.0,0.0,0.2,2.1,20.4,38.8,6
2,54,0,172,95,138,163,386,185,102,96,...,0.0,9.5,-2.4,0.0,0.0,0.3,3.4,12.3,49.0,10
3,55,0,175,94,100,202,380,179,143,28,...,0.0,12.2,-2.2,0.0,0.0,0.4,2.6,34.6,61.6,1


In [16]:
data.to_csv(r"../data/arrhythmia_with_columns.csv", index=False)