## Machine learning para predecir enfermedades cardíacas

En este trabajo realizaremos el analisis de un dataset, con datos de pacientes con y sin riesgo de renfermedades cardiacas. aplicaremos Machine Learning para intentar predecir si a partir de ciertos para metros podemos saber si una persona tiene riesgo o no de padecer una enfermedad cardáaca.

#### Parametros a analizar

- Age: age of the patient [years]
- Sex [0: Male, 1: Female]
- ChestPainType: chest pain type [0: Typical Angina, 1: Atypical Angina, 2: Non-Anginal Pain, 3: Asymptomatic]
- RestingBP: resting blood pressure [mm Hg]
- Cholesterol: serum cholesterol [mm/dl]
- FastingBS: fasting blood sugar [1: if FastingBS > 120 mg/dl, 0: otherwise]
- RestingECG: resting electrocardiogram results [0: Normal, 1: having ST-T wave abnormality (T wave inversions and/or ST elevation or depression of > 0.05 mV), 2: showing probable or definite left ventricular hypertrophy by Estes' criteria]
- MaxHR: maximum heart rate achieved [Numeric value between 60 and 202]
- ExerciseAngina: exercise-induced angina [1: Yes, 0: No]
- Oldpeak: oldpeak = ST [Numeric value measured in depression]
- ST_Slope: the slope of the peak exercise ST segment [2: upsloping, 1: flat, 0: downsloping]
- HeartDisease: output class [1: heart disease, 0: Normal]

In [1]:
# Manipulación de datos
import pandas as pd
# Operaciones numéricas
import numpy as np
# Para separar datos de entrenamiento y prueba
from sklearn.model_selection import train_test_split
# Librería para SVM
from sklearn.svm import SVC
# Medición de precisión
from sklearn.metrics import accuracy_score, confusion_matrix
# Generar gráficos
import matplotlib.pyplot as plt

In [2]:
# Leemos el set de datos y lo cargamos en la variable df, que es un DataFrame de pandas
enf_cardiaca_df = pd.read_csv('heart-csv.csv')
# Mostrar información sobre el set de datos
enf_cardiaca_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 918 entries, 0 to 917
Data columns (total 12 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   Age             918 non-null    int64  
 1   Sex             918 non-null    int64  
 2   ChestPainType   918 non-null    int64  
 3   RestingBP       918 non-null    int64  
 4   Cholesterol     918 non-null    int64  
 5   FastingBS       918 non-null    int64  
 6   RestingECG      918 non-null    int64  
 7   MaxHR           918 non-null    int64  
 8   ExerciseAngina  918 non-null    int64  
 9   Oldpeak         918 non-null    float64
 10  ST_Slope        918 non-null    int64  
 11  HeartDisease    918 non-null    int64  
dtypes: float64(1), int64(11)
memory usage: 86.2 KB


In [3]:
enf_cardiaca_df.head()

Unnamed: 0,Age,Sex,ChestPainType,RestingBP,Cholesterol,FastingBS,RestingECG,MaxHR,ExerciseAngina,Oldpeak,ST_Slope,HeartDisease
0,40,0,1,140,289,0,0,172,0,0.0,2,0
1,49,1,2,160,180,0,0,156,0,1.0,1,1
2,37,0,1,130,283,0,1,98,0,0.0,2,0
3,48,1,3,138,214,0,0,108,1,1.5,1,1
4,54,0,2,150,195,0,0,122,0,0.0,2,0


In [4]:
enf_cardiaca_df['HeartDisease'].value_counts()

1    508
0    410
Name: HeartDisease, dtype: int64

In [6]:
# Caracteristicas que vamos a utilizar como parametros de entrada
list_parametros = ['Age','Sex','ChestPainType','RestingBP','Cholesterol','FastingBS','RestingECG','MaxHR','ExerciseAngina','Oldpeak','ST_Slope']

# Obtenemos esta lista de param del df original
X = enf_cardiaca_df[list_parametros]
X.head()

Unnamed: 0,Age,Sex,ChestPainType,RestingBP,Cholesterol,FastingBS,RestingECG,MaxHR,ExerciseAngina,Oldpeak,ST_Slope
0,40,0,1,140,289,0,0,172,0,0.0,2
1,49,1,2,160,180,0,0,156,0,1.0,1
2,37,0,1,130,283,0,1,98,0,0.0,2
3,48,1,3,138,214,0,0,108,1,1.5,1
4,54,0,2,150,195,0,0,122,0,0.0,2


In [7]:
# Generamos la variable de salida con los datos de la columna que indica si padece o no una enfermedad cardiaca
list_etiqueta = ['HeartDisease']
y = enf_cardiaca_df[list_etiqueta]
y.head()

Unnamed: 0,HeartDisease
0,0
1,1
2,0
3,1
4,0


In [9]:
X_train, X_test, y_train, y_test = train_test_split(X,y)

In [10]:
X_train.shape

(688, 11)