# Heart Failure Prediction using Support Vector Machine Classifier Algorithm

Citation:

fedesoriano. (September 2021). Heart Failure Prediction Dataset. Retrieved 11th October, 2024 from https://www.kaggle.com/fedesoriano/heart-failure-prediction.

## Problem Definition

**Objective**: 
This project aims to develop a machine learning model using a Support Vector Machine (SVM) classifier to predict heart failure in patients based on clinical and demographic data from the Kaggle dataset. By analyzing features such as age, cholesterol, and blood pressure, the model will help identify individuals at high risk for heart failure. The project will evaluate the model's performance using metrics like accuracy, precision, recall, F1-score, and ROC-AUC, and will optimize it through hyperparameter tuning and cross-validation to improve predictive accuracy and generalizability.

**Target Variable**: The target variable or outcome is the `HeartDisease`.

## Load Required Libraries

In [1]:
import pandas as pd  
import numpy as np  
import matplotlib.pyplot as plt  
from sklearn.model_selection import train_test_split  
from sklearn.preprocessing import StandardScaler  
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report  

## Load the Data

In [3]:
df_heart = pd.read_csv('heart.csv')
df_heart.head(3)

Unnamed: 0,Age,Sex,ChestPainType,RestingBP,Cholesterol,FastingBS,RestingECG,MaxHR,ExerciseAngina,Oldpeak,ST_Slope,HeartDisease
0,40,M,ATA,140,289,0,Normal,172,N,0.0,Up,0
1,49,F,NAP,160,180,0,Normal,156,N,1.0,Flat,1
2,37,M,ATA,130,283,0,ST,98,N,0.0,Up,0


Attribute Information:

* **Age**: age of the patient [years]
* **Sex**: sex of the patient [M: Male, F: Female]
* **ChestPainType**: chest pain type [TA: Typical Angina, ATA: Atypical Angina, NAP: Non-Anginal Pain, ASY: Asymptomatic]
* **RestingBP**: resting blood pressure [mm Hg]
* **Cholesterol**: serum cholesterol [mm/dl]
* **FastingBS**: fasting blood sugar [1: if FastingBS > 120 mg/dl, 0: otherwise]
* **RestingECG**: resting electrocardiogram results [Normal: Normal, ST: having ST-T wave abnormality (T wave inversions and/or ST elevation or depression of > 0.05 mV), LVH: showing probable or definite left ventricular hypertrophy by Estes' criteria]
* **MaxHR**: maximum heart rate achieved [Numeric value between 60 and 202]
* **ExerciseAngina**: exercise-induced angina [Y: Yes, N: No]
* **Oldpeak**: oldpeak = ST [Numeric value measured in depression]
* **ST_Slope**: the slope of the peak exercise ST segment [Up: upsloping, Flat: flat, Down: downsloping]
* **HeartDisease**: output class [1: heart disease, 0: Normal]


## Initial Data Preprocessing

* Handling missing values
* Removing duplicates
* Converting categorical data into numerical form 
* Basic feature selection (removing irrelevant columns)

## Exploratory Data Analysis (EDA)

* Visualize the data using histograms, scatter plots, etc.
* Identify patterns, relationships, or outliers in the data.
* Understand the distribution of features, correlations, etc.
* Feature engineering might be done based on insights from EDA (e.g., creating new features or transforming existing ones).

## Further Preprocessing

* Dealing with outliers found during EDA.
* Feature engineering
* Scaling/normalizing and creating pipeline.

## Train-Test Split

* Splitting the dataset into training and test sets.
* Training set: 70-80% of the dataset
* Testing set: 20-30% of the dataset.

## Choose the Model (Support Vector Machine)

* Kernel: linear, rbf, or poly depending on data distribution.
* Regularization Parameter (C): Controls tradeoff between margin maximization and classification error.
* Gamma: Kernel coefficient for non-linear kernels like rbf.
* Model Pipeline Setup

## Model Training

## Model Evaluation

* Accuracy Score: Measure the overall accuracy.
* Confusion Matrix: Evaluate the number of correct/incorrect predictions for each class.
* Classification Report: Get precision, recall, and F1-score metrics.

## Hyperparameter Tuning

* Use grid search or random search to find the best hyperparameters (e.g., C, gamma, kernel) to improve model performance.

## Model Interpretation and Conclusion

* Summarize the model's performance and interpret results in the context of the problem you're solving. Reflect on accuracy, misclassifications, and ways to improve the model (e.g., collecting more data, feature selection).

## Deployment (Optional)

* If the SVM classifier performs well, consider saving the model using joblib or pickle for deployment purposes.