<a href="https://colab.research.google.com/github/OsirisEscaL/Machine_Learning_Projects/blob/main/Ensemble_Learning_Voting_Classifier_for_Heart_Disease_Prediction.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Ensemble Learning: Voting Classifier for Heart Disease Prediction Using Scikit-Learn

Heart disease is a primary health concern globally, and accurate diagnosis is essential for early intervention and treatment. Ensemble learning, a potent machine learning technique, can enhance the precision and dependability of heart disease prediction models. This article will investigate how to build an ensemble model in Scikit-Learn using the Voting Classifier on the heart disease dataset. This model will demonstrate the effectiveness of ensemble techniques in analyzing medical data.

**Understanding Ensemble Learning**

Ensemble learning is a machine learning paradigm that integrates the predictions of multiple individual models to generate a more precise final prediction. The underlying concept is to utilize the collective knowledge of diverse models to reduce bias and variance and enhance overall performance.

**Voting Classifier**

The Voting Classifier is an adaptable ensemble method that combines the predictions of multiple base classifiers to produce a final prediction. It operates in two distinct modes:

**Hard Voting:** Each base classifier in the ensemble predicts a majority vote, determines a class label, and the final prediction.

**Soft Voting:** Each base classifier in soft voting provides a probability distribution over the classes, and the final prediction is based on the type with the highest average probability.

**Dataset**

We will use the Framingham Cardiovascular Disease Dataset from [Kaggle](https://www.kaggle.com/datasets/captainozlem/framingham-chd-preprocessed-data) for this project. This dataset typically includes attributes like demographic, behavioral, and medical risk factors.

**Step 1: Importing Essential Libraries**

Importing the essential Python libraries for the project will be our initial step:

In [9]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import classification_report, accuracy_score
from sklearn.ensemble import VotingClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC

**Step 2: Loading and Preprocessing the Dataset**

Once the dataset has been downloaded and extracted, it will be loaded and preprocessed.

In [2]:
# Load the dataset
data = pd.read_csv('CHD.csv')
data.head()

Unnamed: 0,male,age,education,currentSmoker,cigsPerDay,BPMeds,prevalentStroke,prevalentHyp,diabetes,totChol,sysBP,diaBP,BMI,heartRate,glucose,TenYearCHD
0,1,39,1,0,0.0,0.0,0,0,0,195.0,106.0,70.0,26.97,80.0,77.0,0
1,0,46,0,0,0.0,0.0,0,0,0,250.0,121.0,81.0,28.73,95.0,76.0,0
2,1,48,0,1,20.0,0.0,0,0,0,245.0,127.5,80.0,25.34,75.0,70.0,0
3,0,61,1,1,30.0,0.0,0,1,0,225.0,150.0,95.0,28.58,65.0,103.0,1
4,0,46,1,1,23.0,0.0,0,0,0,285.0,130.0,84.0,23.1,85.0,85.0,0


In [3]:
# Split the data into features and labels
X = data.drop('TenYearCHD', axis=1)
y = data['TenYearCHD']

# Split the data into training and testing sets, stratifying on the 'Class' label
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, stratify=y, random_state=42)

# Standardize the features
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

**Step 3: Train the Voting Classifier**

We'll use Decision Trees, Support Vector Machines (SVM), and Logical Regression to build our Voting Classifier in hard voting.

In [5]:
# Define the base classifiers
classifier1 = DecisionTreeClassifier()
classifier2 = LogisticRegression()
classifier3 = SVC()

# Create the Voting Classifier
ensemble = VotingClassifier(
    estimators=[('dt', classifier1), ('lr', classifier2), ('svc', classifier3)], voting='hard')

# Train the model
ensemble.fit(X_train, y_train)  # X_train is the training data, y_train is the corresponding labels

**Step 4: Evaluate the Voting Classifier**

Use the testing dataset to evaluate the ensemble's performance

In [10]:
# Model evaluation
y_pred = ensemble.predict(X_test)
report = classification_report(y_test, y_pred)
accuracy = accuracy_score(y_test, y_pred)

print("Voting Classifier:")
print(f'Accuracy: {accuracy}')
print(report)

Voting Classifier:
Accuracy: 0.8561064087061668
              precision    recall  f1-score   support

           0       0.86      1.00      0.92       701
           1       0.82      0.07      0.13       126

    accuracy                           0.86       827
   macro avg       0.84      0.53      0.53       827
weighted avg       0.85      0.86      0.80       827



**Conclusion**

Combining the predictions of multiple base classifiers, ensemble learning techniques such as the Voting Classifier provides a robust method to improve heart disease prediction models. By leveraging the collective wisdom of diverse models, you can make more accurate predictions in critical medical applications. Experiment with various base classifiers and ensemble configurations to determine the optimal setup for predicting cardiac disease.