# Project - Heart Disease Prediction

![dataset-cover.jpg](attachment:dataset-cover.jpg)

# Heart Disease Prediction

## Introduction

Heart disease is one of the leading causes of death worldwide. Early detection and prevention can save many lives. Machine learning can be used to predict the presence of heart disease in patients based on various health parameters.

## Objective

The objective of this project is to build a machine learning model that can predict whether a patient has heart disease based on their medical attributes.

## Dataset

We will use the Heart Disease dataset from the UCI Machine Learning Repository. The dataset contains 14 attributes, including age, sex, chest pain type, resting blood pressure, cholesterol level, fasting blood sugar, and others.

## Steps to Build the Model

1. Importing the libraries
2. Loading the dataset
3. Data exploration and preprocessing
4. Splitting the data into training and test sets
5. Building and training the model
6. Evaluating the model
7. Making predictions
8. Conclusion



# 1. Importing the Libraries

First, we need to import the necessary libraries.

In [3]:
import numpy as np
import pandas as pd

import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression

from sklearn.metrics import accuracy_score, confusion_matrix, classification_report

# Domain idea

- Class 0: Represents individuals without heart disease, generally considered healthy.
- Class 1: Indicates mild or early-stage heart disease symptoms or risk factors.
- Class 2: Reflects moderate heart disease, with more significant symptoms or risk.
- Class 3: Indicates severe heart disease, often with advanced symptoms.
- Class 4: Represents critical cases of heart disease, requiring immediate medical attention.

# Features Specification


- age: Age of the patient.

- sex: Gender of the patient (0 = female, 1 = male).

- cp: Type of chest pain experienced (0 = typical angina, 1 = atypical angina, 2 = non-anginal pain, 3 = asymptomatic).

- trestbps: Resting blood pressure when admitted to the hospital (in mm Hg).

- chol: Cholesterol levels in the blood (in mg/dl).

- fbs: Fasting blood sugar level (>120 mg/dl considered high; 1 = high, 0 = normal).

- restecg: Results of resting electrocardiogram (ECG) (0 = normal, 1 = abnormal ST-T wave, 2 = probable or definite left ventricular hypertrophy).

- thalach: Maximum heart rate achieved during exercise.

- exang: Exercise-induced angina (1 = yes, 0 = no).

- oldpeak: ST depression induced by exercise relative to rest.

- slope: Shape of the peak exercise ST segment (0 = upsloping, 1 = flat, 2 = downsloping).

- ca: Number of major blood vessels colored by fluoroscopy (0-3).

- thal: Type of thalassemia (3 = normal, 6 = fixed defect, 7 = reversible defect).

- target: Presence of heart disease (0 = no disease, 1 = disease present).

# 2. Loading the Dataset
Load the dataset into a pandas DataFrame.

In [4]:
# Load the dataset
url = "https://archive.ics.uci.edu/ml/machine-learning-databases/heart-disease/processed.cleveland.data"
column_names = [
    'age', 'sex', 'cp', 'trestbps', 'chol', 'fbs', 'restecg', 'thalach', 
    'exang', 'oldpeak', 'slope', 'ca', 'thal', 'target'
]
data = pd.read_csv(url, names=column_names)


In [6]:
data.sample()

Unnamed: 0,age,sex,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal,target
283,35.0,1.0,2.0,122.0,192.0,0.0,0.0,174.0,0.0,0.0,1.0,0.0,3.0,0


In [21]:
data.target.value_counts()

target
0.0    160
1.0     54
2.0     35
3.0     35
4.0     13
Name: count, dtype: int64

# 3. Data Exploration and Preprocessing
Explore the dataset to understand its structure and handle missing values.

In [1]:
# Display the first few rows of the dataset


In [2]:
# Display dataset information


In [3]:
# Handle missing values represented by '?'


In [4]:
# Convert columns to appropriate data types


# 4. Splitting the Data into Training and Test Sets
Split the dataset into training and test sets

# 5. Building and Training the Model
We will use Logistic Regression for this classification task.

In [6]:
# Standardize the features

# Build the model



# 6. Evaluating the Model
Evaluate the model using accuracy, confusion matrix, and classification report.

In [7]:
# Predict the target on the test set


In [8]:
# Calculate accuracy


In [9]:
# Display the confusion matrix


In [10]:
# Display the classification report


# 7. Making Predictions
Make predictions for new data points.

In [11]:
# Example new data point


# 8. Conclusion
In this project, we built a machine learning model to predict heart disease using logistic regression. The model was evaluated using accuracy, confusion matrix, and classification report. The final model can be used to make predictions on new patient data.