# Heartbeat Project

## About the project

This project is about creating a Classification Model that can predict whether or not a person has presence of heart disease based on physical features of that person (age,sex, cholesterol, etc...)

## About the data

This database contains 14 physical attributes based on physical testing of a patient. Blood samples are taken and the patient also conducts a brief exercise test. The "goal" field refers to the presence of heart disease in the patient. It is integer (0 for no presence, 1 for presence). In general, to confirm 100% if a patient has heart disease can be quite an invasive process, so if we can create a model that accurately predicts the likelihood of heart disease, we can help avoid expensive and invasive procedures.

Attribute Information:

- age
- sex
- chest pain type (4 values)
- resting blood pressure
- serum cholestoral in mg/dl
- fasting blood sugar > 120 mg/dl
- resting electrocardiographic results (values 0,1,2)
- maximum heart rate achieved
- exercise induced angina
- oldpeak = ST depression induced by exercise relative to rest
- the slope of the peak exercise ST segment
- number of major vessels (0-3) colored by flourosopy
- thal: 3 = normal; 6 = fixed defect; 7 = reversable defect
- target:0 for no presence of heart disease, 1 for presence of heart disease

## Solution

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

### Part 1: Manipulating and displaying data

### Exploratory Data Analysis and Visualization

In [None]:
df = pd.read_csv('heart.csv')
df.head()

In [None]:
df['target'].unique()

In [None]:
df.info()

In [None]:
df.describe().transpose()

In [None]:
sns.countplot(data=df, x='target')

In [None]:
sns.pairplot(df[['age', 'trestbps', 'chol', 'thalach', 'target']], hue='target')

Display the correlation between the columns in a heatmap

In [None]:
plt.figure(figsize=(12, 8))
sns.heatmap(df.corr(), cmap='viridis', annot=True)

### Part 2: Machine Learning

### Train and Test Splits

In [None]:
from sklearn.model_selection import train_test_split

In [None]:
X = df.drop('target', axis=1)
y = df['target']

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.1, random_state=99)

### Scaling the data

In [None]:
from sklearn.preprocessing import StandardScaler

In [None]:
scaler = StandardScaler()

In [None]:
scaled_X_train = scaler.fit_transform(X_train)
scaled_X_test = scaler.transform(X_test)

### Creating the model

Let's create a Logistic Regression model and use cross-validation to find a well-performing C value for the hyper-parameter search

In [None]:
from sklearn.linear_model import LogisticRegressionCV 

In [None]:
log_model = LogisticRegressionCV()

In [None]:
log_model.fit(scaled_X_train, y_train)

In [None]:
log_model.C_

In [None]:
log_model.get_params()

#### Coeffecients

In [None]:
log_model.coef_

Let's also visualize the coeffiecients

In [None]:
coefs = pd.Series(data=log_model.coef_[0], index=X.columns)
coefs = coefs.sort_values()

In [None]:
plt.figure(figsize=(10, 6))
sns.barplot(x=coefs.index, y=coefs.values);

### Model Performance

In [None]:
from sklearn.metrics import confusion_matrix, classification_report, ConfusionMatrixDisplay

#### Predictions on the test data

In [None]:
y_pred = log_model.predict(scaled_X_test)

#### Confusion matrix

In [None]:
cnf_matrix = confusion_matrix(y_test, y_pred)
cnf_matrix

In [None]:
disp = ConfusionMatrixDisplay(confusion_matrix=cnf_matrix)
disp.plot()

#### Classification report

In [None]:
print(classification_report(y_test, y_pred))

### Part 3: Future prediction results

What does our model predicts for this patient? Do they have heart disease? How "sure" is our model of this prediction?

A patient with the following features has come into the medical office

In [None]:
patient = [[54., 1., 0., 122., 286., 0., 0., 116., 1., 3.2, 1., 2., 2.]]

In [None]:
X_test.iloc[-1]

In [None]:
y_test.iloc[-1]

In [None]:
log_model.predict(patient)

In [None]:
log_model.predict_proba(patient)