<a href="https://colab.research.google.com/github/amrahmani/ML/blob/main/Ch4_Stacking.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Problem:** Using **stacking** in ensemble methods, predict whether a student will pass (1) or fail (0) based on 'Assignments Completed', 'Attendance Rate', 'Previous GPA', 'Study Hours', 'Previous Passes'.


Dataset = 'https://github.com/amrahmani/ML/blob/main/student_data.csv'.

**Implementing Base Models**:

o	Select at least three different machine learning algorithms (e.g., Decision Tree, Logistic Regression, Naïve Bayes, etc.) to serve as base models.

o	Train each model on the dataset and evaluate performance using appropriate metrics (e.g., accuracy for classification, RMSE for regression).

o	Record the predictions made by each base model.

**Implementing Stacking Ensemble**:

o	Select **a meta-learner** (e.g., a simple model like Logistic Regression or a more complex model) to combine the predictions of the base models.

o	Train the meta-learner using the predictions from the base models as input features.

o	Evaluate the performance of the Stacking model using cross-validation.

o	Use metrics such as accuracy, precision, recall, F1-score (for classification), or RMSE, MAE (for regression) to evaluate and compare models.


In [2]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
from sklearn.tree import DecisionTreeClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.naive_bayes import GaussianNB
from sklearn.ensemble import StackingClassifier
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler

# Step 1: Load and prepare the dataset
url = 'https://raw.githubusercontent.com/amrahmani/ML/main/student_data.csv'
df = pd.read_csv(url)

# Feature selection
features = ['Assignments Completed', 'Attendance Rate', 'Previous GPA', 'Study Hours', 'Previous Passes']
X = df[features]
y = df['Pass/Fail']

# Split the data into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Step 2: Train base models and evaluate performance
# Base model 1: Decision Tree
dt_model = DecisionTreeClassifier(random_state=42)
dt_model.fit(X_train, y_train)
dt_pred = dt_model.predict(X_test)

# Base model 2: Logistic Regression
lr_model = make_pipeline(StandardScaler(), LogisticRegression(random_state=42))
lr_model.fit(X_train, y_train)
lr_pred = lr_model.predict(X_test)

# Base model 3: Naive Bayes
nb_model = GaussianNB()
nb_model.fit(X_train, y_train)
nb_pred = nb_model.predict(X_test)

# Evaluate base models
def evaluate_model(predictions, y_true):
    accuracy = accuracy_score(y_true, predictions)
    precision = precision_score(y_true, predictions)
    recall = recall_score(y_true, predictions)
    f1 = f1_score(y_true, predictions)
    return accuracy, precision, recall, f1

# Recording the performance of each base model
dt_metrics = evaluate_model(dt_pred, y_test)
lr_metrics = evaluate_model(lr_pred, y_test)
nb_metrics = evaluate_model(nb_pred, y_test)

# Step 3: Implement Stacking Ensemble
# Meta-learner: Logistic Regression
estimators = [
    ('decision_tree', dt_model),
    ('logistic_regression', lr_model),
    ('naive_bayes', nb_model)
]

stacking_model = StackingClassifier(
    estimators=estimators,
    final_estimator=LogisticRegression(),
    cv=5
)

# Train the Stacking model
stacking_model.fit(X_train, y_train)

# Evaluate the Stacking model
stacking_pred = stacking_model.predict(X_test)
stacking_metrics = evaluate_model(stacking_pred, y_test)

# Display the results
results = pd.DataFrame({
    'Model': ['Decision Tree', 'Logistic Regression', 'Naive Bayes', 'Stacking Ensemble'],
    'Accuracy': [dt_metrics[0], lr_metrics[0], nb_metrics[0], stacking_metrics[0]],
    'Precision': [dt_metrics[1], lr_metrics[1], nb_metrics[1], stacking_metrics[1]],
    'Recall': [dt_metrics[2], lr_metrics[2], nb_metrics[2], stacking_metrics[2]],
    'F1-Score': [dt_metrics[3], lr_metrics[3], nb_metrics[3], stacking_metrics[3]]
})

print(results)


                 Model  Accuracy  Precision  Recall  F1-Score
0        Decision Tree       1.0        1.0     1.0       1.0
1  Logistic Regression       1.0        1.0     1.0       1.0
2          Naive Bayes       1.0        1.0     1.0       1.0
3    Stacking Ensemble       1.0        1.0     1.0       1.0
