<a href="https://colab.research.google.com/github/Remonah-3/Github_Assignment/blob/master/Ensemble_Learning.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [2]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LinearRegression
from sklearn.svm import SVR
from sklearn.tree import DecisionTreeRegressor
from sklearn.metrics import mean_squared_error
from sklearn.utils import resample
from sklearn.ensemble import StackingRegressor

df = pd.read_csv('train.csv')

# Target variable: SalePrice
# Features: GrLivArea, YearBuilt
data = df[['GrLivArea', 'YearBuilt', 'SalePrice']].dropna()

X = data[['GrLivArea', 'YearBuilt']]
y = data['SalePrice']

# Split data (80% training, 20% validation)
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2, random_state=42)

# Scale features
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_val_scaled = scaler.transform(X_val)

# Single Models for Comparison
lr = LinearRegression()
svm = SVR(kernel='rbf', C=100)
dt = DecisionTreeRegressor(max_depth=5, random_state=42)

# Train models
lr.fit(X_train_scaled, y_train)
svm.fit(X_train_scaled, y_train)
dt.fit(X_train_scaled, y_train)

# Predict
lr_pred = lr.predict(X_val_scaled)
svm_pred = svm.predict(X_val_scaled)
dt_pred = dt.predict(X_val_scaled)

# Evaluate single model performance
print("=== Single Model Performance ===")
print("Linear Regression MSE:", mean_squared_error(y_val, lr_pred))
print("SVM MSE:", mean_squared_error(y_val, svm_pred))
print("Decision Tree MSE:", mean_squared_error(y_val, dt_pred))

# 3. Blending
blend_pred = (0.4 * lr_pred) + (0.3 * svm_pred) + (0.3 * dt_pred)

blend_mse = mean_squared_error(y_val, blend_pred)
print("\n=== Blending ===")
print("Blended Model MSE:", blend_mse)

# Bagging
n_estimators = 5
bagged_preds = np.zeros_like(y_val, dtype=float)

for i in range(n_estimators):
    # Bootstrap sample
    X_resampled, y_resampled = resample(X_train_scaled, y_train, random_state=42 + i)
    model = LinearRegression()
    model.fit(X_resampled, y_resampled)
    preds = model.predict(X_val_scaled)
    bagged_preds += preds

# Average predictions
bagged_preds /= n_estimators

bagging_mse = mean_squared_error(y_val, bagged_preds)
print("\n=== Bagging ===")
print("Bagged Model (Linear Regression) MSE:", bagging_mse)

# Stacking
estimators = [
    ('lr', LinearRegression()),
    ('svm', SVR(kernel='rbf', C=100)),
    ('dt', DecisionTreeRegressor(max_depth=5, random_state=42))
]

stack_model = StackingRegressor(
    estimators=estimators,
    final_estimator=LinearRegression()
)

stack_model.fit(X_train_scaled, y_train)
stack_pred = stack_model.predict(X_val_scaled)
stacking_mse = mean_squared_error(y_val, stack_pred)

print("\n=== Stacking ===")
print("Stacked Model MSE:", stacking_mse)

# Summary of All Methods


print("\n=== Summary ===")
print(f"Linear Regression MSE: {mean_squared_error(y_val, lr_pred):.2f}")
print(f"SVM MSE: {mean_squared_error(y_val, svm_pred):.2f}")
print(f"Decision Tree MSE: {mean_squared_error(y_val, dt_pred):.2f}")
print(f"Blending MSE: {blend_mse:.2f}")
print(f"Bagging MSE: {bagging_mse:.2f}")
print(f"Stacking MSE: {stacking_mse:.2f}")


=== Single Model Performance ===
Linear Regression MSE: 2495554898.6683207
SVM MSE: 6418570975.686604
Decision Tree MSE: 1844304720.6577315

=== Blending ===
Blended Model MSE: 2597232396.43489

=== Bagging ===
Bagged Model (Linear Regression) MSE: 2448340185.536358

=== Stacking ===
Stacked Model MSE: 1968679182.1550841

=== Summary ===
Linear Regression MSE: 2495554898.67
SVM MSE: 6418570975.69
Decision Tree MSE: 1844304720.66
Blending MSE: 2597232396.43
Bagging MSE: 2448340185.54
Stacking MSE: 1968679182.16


We explored ensemble learning methods to improve regression model accuracy using the House Prices dataset.
We trained Linear Regression, Support Vector Regression, and Decision Tree Regressor, and combined them using blending, bagging, and stacking.

Among these, stacking achieved the lowest Mean Squared Error, showing that combining models in multiple layers can capture different aspects of the data and produce more accurate predictions.

This demonstrates that ensemble learning provides higher predictive power and generalization ability compared to individual models alone.