<a href="https://colab.research.google.com/github/Samurarahman/CSE475/blob/main/Ensemble_learning_and_XAI.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Ensemble learning and XAI**
**Ensemble Learning** is a machine learning approach that combines predictions from multiple models to improve accuracy and reliability, using methods like Bagging (reducing variance), Boosting (reducing bias), and Stacking(combining diverse models). It enhances performance by leveraging the strengths of individual models. Explainable AI (XAI), on the other hand, aims to make AI systems transparent and understandable by providing insights into how models make decisions. Using techniques like SHAP, LIME, and feature importance analysis, XAI helps build trust, enables debugging, and ensures AI solutions align with ethical standards and user expectations.

# Loading Libraries

In [1]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier, VotingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score, classification_report
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.compose import ColumnTransformer

# Loading the Dataset

In [2]:
df = pd.read_csv('/content/drive/MyDrive/CSE475/Ensemble learning and XAI/cw_22_23_24.csv')
df.head()

Unnamed: 0,adm_type,shift_from,ssc,yr_nae,m_no,mrn,pt_name,sex,disease,D.O.A,D.O.D,status,consultant,L.O.S
0,Shift From,ER,No,1,1,21845698,Hara Bibi,F,STEMI,1-Jan-22,1-Jan-22,Discharge,Imran Khan,0
1,Shift From,ER,No,2,2,22000071,Taj Rehman,M,ADHF,1-Jan-22,5-Jan-22,Discharge,Malik Faisal,4
2,Shift From,ER,No,3,3,21838760,Bakhtawar Shah,M,ihd,1-Jan-22,10-Jan-22,Discharge,Asif Iqbal,9
3,Shift From,ER,No,4,4,22000251,Arasal Jan Bibi,F,,1-Jan-22,7-Jan-22,Discharge,Sher Bahadar,6
4,Shift From,Neu,No,5,5,21825110,Khad Mewa,F,,1-Jan-22,2-Jan-22,Discharge,Tariq Nawaz,1


In [None]:
X = df.drop(['loan_status'], axis=1)
print(X)
#This operation reduces each class label by 1, assuming
#class labels originally start from 1. If classes were
#initially numbered as 1, 2, 3, etc., they’ll now be 0, 1, 2, etc.



#Many machine learning algorithms in Python
#(especially in libraries like scikit-learn)
#expect class labels to start from 0. This
#adjustment simplifies compatibility with
#these algorithms and avoids indexing issues.
# y = df['loan_status']-1
y = df['loan_status']

#Identifying Feature Types:

# Split the features into categorical and numerical

#Selects the columns containing categorical data
categorical_features = X.select_dtypes(include=['object']).columns

#Selects columns containing numerical data
numerical_features = X.select_dtypes(include=['int64', 'float64']).columns


#Data Preprocessing:

# Standard scaling for numerical features only

#will adjust the data so that each feature has
#a mean of 0 and a standard deviation of 1.
scaler = StandardScaler()

#first learns the mean and standard deviation for
#each numerical feature in X.
#This helps ensure that all numerical features are on a similar scale.

#this process centers the data around zero and adjusts the scale,
#which often improves the performance of machine learning models
#by preventing any one feature from dominating due to its larger scale.
scaled_numerical_data = scaler.fit_transform(X[numerical_features])

# One-hot encoding for categorical features only

#The OneHotEncoder converts categorical variables
#into binary (one-hot) encoded variables.
encoder = OneHotEncoder(drop='first')
encoded_categorical_data = encoder.fit_transform(X[categorical_features])

# Concatenate the scaled numerical and encoded categorical data

#Concatenates the scaled numerical and encoded categorical arrays horizontally,
#combining them into a single dataset.
processed_data = np.hstack([scaled_numerical_data, encoded_categorical_data.toarray()])

# Convert to DataFrame with appropriate column names
final_columns = numerical_features.tolist() + encoder.get_feature_names_out(categorical_features).tolist()
final_df = pd.DataFrame(processed_data, columns=final_columns)

#Stores the preprocessed features in data_X for modeling
data_X = final_df.copy()