<a href="https://colab.research.google.com/github/Jhansipothabattula/Machine_Learning/blob/main/Day35.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Introduction to Ensemble Learning

**Concept of Ensemble Learning**

- What is Ensemble Learning?

  - Machine Learning technique that combines the predictions of multiple models to produce a final output

- What does Ensemble Learning improve Perfomance?

  - Reduces Variance

  - Reduces Bias

  - Improves Robustness

- Applications

  - Fraud Detection, Medical Diagnoses, Recommendation Systems, and Predictive Analysis
  

**Types of Ensemble Methods**

- Stacking

  - Combines predictions from multiple base models(of different types)using a meta-model to learn how to best combine their outputs

  - Strengths: can Utilize diverse model types of maximize perfomance

**Overview of commonly used Ensemble methods**

- Random Forest

  - Combines multiple decision trees using bagging

  - Reduces overfitting common in individual decision trees

- Gradient Boosting

  - Sequentially builds models that minimizes errors in the previous ones

  - Suitable for both regression and classification tasks

- AdaBoost

  - Adjusts model weights based on perfomance

  - Focuses on misclassified instances

- XGBoost

  - Optimized version of Gradient Boosting known for Speed and Accuracy

- Voting Classifier

**1. Build a basic Ensemble model combining predictions from Linear Regression, Decision Trees, and k-NN to observe the impact on accuracy**

In [2]:
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import VotingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.neighbors import KNeighborsClassifier
import numpy as np

# Load Dataset
data = load_iris()
X, y = data.data, data.target

# Split Dataset
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Scale Features
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Train Individual Models
log_model = LogisticRegression()
dt_model = DecisionTreeClassifier()
knn_model = KNeighborsClassifier()

log_model.fit(X_train, y_train)
dt_model.fit(X_train, y_train)
knn_model.fit(X_train, y_train)

# Creating Voting Classifier
ensemble_model = VotingClassifier(
    estimators=[("log_reg", log_model),
                ("decision tree", dt_model),
                ('knn', knn_model)
    ],
    voting='hard'
)

# Train ensemble model
ensemble_model.fit(X_train, y_train)

# Predict with ensemble
y_pred_ensemble = ensemble_model.predict(X_test)

# Evaluate accuracy
from sklearn.metrics import accuracy_score
accuracy = accuracy_score(y_test, y_pred_ensemble)
print(f"Ensemble Model Accuracy: {accuracy:.2f}")

# Evaluate Individual Models
y_pred_log = log_model.predict(X_test)
y_pred_dt = dt_model.predict(X_test)
y_pred_knn = knn_model.predict(X_test)

print(f"Logistic Regression Accuracy: {accuracy_score(y_test, y_pred_log):.2f}")
print(f"Decision Tree Accuracy: {accuracy_score(y_test, y_pred_dt):.2f}")
print(f"k-NN Accuracy: {accuracy_score(y_test, y_pred_knn):.2f}")
print(f"Ensemble Model Accuracy: {accuracy:.2f}")

Ensemble Model Accuracy: 1.00
Logistic Regression Accuracy: 1.00
Decision Tree Accuracy: 1.00
k-NN Accuracy: 1.00
Ensemble Model Accuracy: 1.00
