Heart Disease Prediction

-------------

## **Objective**

The objective of this project is to develop a machine learning model to predict the presence of heart disease in patients based on various health parameters.

## **Data Source**

The dataset used for this project is the Heart Disease dataset from the UCI Machine Learning Repository.



## **Import Library**

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report, confusion_matrix, accuracy_score


## **Import Data**

In [None]:
data = pd.read_csv('heart.csv')


## **Describe Data**

In [None]:
print(data.info())
print(data.describe())
print(data.head())


## **Data Visualization**

In [None]:
# Correlation heatmap
plt.figure(figsize=(10,8))
sns.heatmap(data.corr(), annot=True, cmap='coolwarm')
plt.title('Correlation Heatmap')
plt.show()

# Pairplot
sns.pairplot(data, hue='target')
plt.show()


## **Data Preprocessing**

In [None]:
# Check for missing values
print(data.isnull().sum())

# Standardize the feature variables
scaler = StandardScaler()
scaled_features = scaler.fit_transform(data.drop('target', axis=1))

# Create a DataFrame with the scaled features
scaled_data = pd.DataFrame(scaled_features, columns=data.columns[:-1])
scaled_data['target'] = data['target']


## **Define Target Variable (y) and Feature Variables (X)**

In [None]:
X = scaled_data.drop('target', axis=1)
y = scaled_data['target']


## **Train Test Split**

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)


## **Modeling**

In [None]:
# Initialize the model
model = LogisticRegression()

# Fit the model
model.fit(X_train, y_train)


## **Model Evaluation**

In [None]:
# Predict on the test set
y_pred = model.predict(X_test)

# Evaluate the model
print(confusion_matrix(y_test, y_pred))
print(classification_report(y_test, y_pred))
print('Accuracy:', accuracy_score(y_test, y_pred))


## **Prediction**

In [None]:
# Predict on new data (example)
new_data = np.array([[57, 1, 0, 140, 241, 0, 1, 123, 1, 0.2, 1, 0, 3]])
new_data_scaled = scaler.transform(new_data)
prediction = model.predict(new_data_scaled)
print('Prediction:', prediction)


## **Explaination**

The logistic regression model was used to predict the presence of heart disease. The model was trained on a standardized dataset, split into training and test sets. After training, the model's performance was evaluated using confusion matrix, classification report, and accuracy score. The model's predictions were then demonstrated on new data points.