# 📚 What is Ensemble Learning?

**Ensemble learning** is a technique where multiple models (also called **learners** or **estimators**) are combined to solve a problem and improve performance. The main idea is that a group of **weak models**, when combined properly, can create a **stronger overall model**.
- Ensemble learning helps improve machine learning results by combining several models. This approach allows the production of better predictive performance compared to a single model. Basic idea is to learn a set of classifiers (experts) and to allow them to vote.



Eg:- Majority Voting, Average, Weighted Mean are basic techniques
---

## 🔧 Some of Advance Types of Ensemble Methods:

### 1. 🧺 Bagging (Bootstrap Aggregating)

- **How it works**: Trains several instances of the same model on **random subsets** of the training data (with replacement), then **averages** the results (for regression) or uses **majority voting** (for classification).
- **Goal**: Reduce **variance** and prevent **overfitting**.
- **Example**: `Random Forest` (ensemble of decision trees)

---

### 2. 🚀 Boosting

- **How it works**: Models are trained **sequentially**. Each new model focuses on **correcting the errors** made by the previous one.
- **Goal**: Reduce **bias** and improve **accuracy**.
- **Popular boosting methods**:
  - AdaBoost
  - Gradient Boosting
  - XGBoost, LightGBM, CatBoost

---

### 3. 🧠 Stacking (Stacked Generalization)

- **How it works**: Combines predictions from multiple models using a **meta-model** (e.g., logistic regression) that learns how to best combine the outputs of the base models.
- **Common use**: Frequently used in machine learning competitions like **Kaggle**.

---

## 🎯 Why Use Ensemble Learning?

- ✅ Better accuracy than a single model  
- ✅ More robust and generalizes better to new data  
- ✅ Reduces overfitting  
- ✅ Helps capture more complex patterns in the data


# Voting Classifier

In [1]:
import pandas as pd

df = pd.read_excel("D:\\utils\\DataSets\\Raisin_Dataset.xlsx")


X = df[["Area", "MajorAxisLength", "MinorAxisLength", "Eccentricity", "ConvexArea", "Extent", "Perimeter"]]
y = df["Class"]

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=10)


In [2]:
from sklearn.svm import SVC

model = SVC(kernel="rbf")
model.fit(X_train, y_train)

y_pred = model.predict(X_test)

from sklearn.metrics import classification_report

report = classification_report(y_test, y_pred)
print(report)

model.n_iter_

              precision    recall  f1-score   support

       Besni       0.86      0.75      0.80        83
     Kecimen       0.81      0.90      0.85        97

    accuracy                           0.83       180
   macro avg       0.83      0.82      0.82       180
weighted avg       0.83      0.83      0.83       180



array([229], dtype=int32)

In [3]:
# Alternatively
from sklearn.ensemble import VotingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.svm import SVC
import numpy as np
import statistics as st
import warnings
warnings.filterwarnings('ignore')


# MODELS CREATION
model1 = DecisionTreeClassifier()
model2 = SVC(probability=True)
model3= LogisticRegression()

model1.fit(X_train,y_train)
model2.fit(X_train,y_train)
model3.fit(X_train,y_train)


# PREDICTION
pred1=model1.predict(X_test)
pred2=model2.predict(X_test)
pred3=model3.predict(X_test)

# FINAL_PREDICTION
final_pred = np.array([])
for i in range(0,len(X_test)):
    final_pred = np.append(final_pred, st.mode([pred1[i], pred2[i], pred3[i]]))
    
report = classification_report(y_test, final_pred)
print(report)


              precision    recall  f1-score   support

       Besni       0.92      0.83      0.87        83
     Kecimen       0.87      0.94      0.90        97

    accuracy                           0.89       180
   macro avg       0.89      0.88      0.89       180
weighted avg       0.89      0.89      0.89       180



In [4]:
#Else You can use the VotingClassifier from sklearn.ensemble to create a voting classifier.
# Import necessary libraries
# Create different models
log_model = LogisticRegression()
dt_model = DecisionTreeClassifier()
svm_model = SVC(probability=True)

# Create a voting classifier
voting_clf = VotingClassifier(
    estimators=[('lr', log_model), ('dt', dt_model), ('svm', svm_model)],
    voting='hard')  # Use 'soft' for soft voting

voting_clf.fit(X_train, y_train)

In [5]:
y_pred = voting_clf.predict(X_test)

report = classification_report(y_test, y_pred)
print(report)

              precision    recall  f1-score   support

       Besni       0.92      0.83      0.87        83
     Kecimen       0.87      0.94      0.90        97

    accuracy                           0.89       180
   macro avg       0.89      0.88      0.89       180
weighted avg       0.89      0.89      0.89       180



In [6]:
voting_clf.score(X_test, y_test)

0.8888888888888888

In [7]:

# Using Average

# Alternatively
from sklearn.ensemble import VotingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.svm import SVC
import numpy as np
import statistics as st
import warnings
warnings.filterwarnings('ignore')


# MODELS CREATION
model1 = DecisionTreeClassifier()
model2 = SVC(probability=True)
model3= LogisticRegression()

model1.fit(X_train,y_train)
model2.fit(X_train,y_train)
model3.fit(X_train,y_train)


# PREDICTION
pred1=model1.predict_proba(X_test)
pred2=model2.predict_proba(X_test)
pred3=model3.predict_proba(X_test)


finalpred=(pred1+pred2+pred3)/3

y_test.value_counts()

Class
Kecimen    97
Besni      83
Name: count, dtype: int64

In [8]:
finalpred

array([[0.33739471, 0.66260529],
       [0.11136191, 0.88863809],
       [0.11553733, 0.88446267],
       [0.27178544, 0.72821456],
       [0.08742303, 0.91257697],
       [0.63396009, 0.36603991],
       [0.06018293, 0.93981707],
       [0.75513605, 0.24486395],
       [0.43104939, 0.56895061],
       [0.76637619, 0.23362381],
       [0.93942216, 0.06057784],
       [0.59617324, 0.40382676],
       [0.99533512, 0.00466488],
       [0.05024578, 0.94975422],
       [0.07970276, 0.92029724],
       [0.9952966 , 0.0047034 ],
       [0.27065168, 0.72934832],
       [0.1398723 , 0.8601277 ],
       [0.08110255, 0.91889745],
       [0.91779843, 0.08220157],
       [0.96476181, 0.03523819],
       [0.12501966, 0.87498034],
       [0.14643192, 0.85356808],
       [0.92858025, 0.07141975],
       [0.24760448, 0.75239552],
       [0.08278502, 0.91721498],
       [0.72984408, 0.27015592],
       [0.04770964, 0.95229036],
       [0.98634758, 0.01365242],
       [0.27719216, 0.72280784],
       [0.

In [17]:
# Using Weighted 

# PREDICTION
pred1=model1.predict_proba(X_test)
pred2=model2.predict_proba(X_test)
pred3=model3.predict_proba(X_test)

finalpred_after_wt=(pred1*0.3+pred2*0.3+pred3*0.4)

print(finalpred_after_wt[0])


[0.34952208 0.65047792]


# What is Bagging?
- Bagging (Bootstrap Aggregating) is an ensemble learning technique designed to improve the accuracy and stability of machine learning algorithms.
- Bagging stands for Bootstrap Aggregating.
- It’s a technique to improve the accuracy and stability of machine learning models by:
- Creating multiple versions of a model (trained on different subsets of data).
- Combining their predictions (usually by averaging or voting).

# Why use Bagging?

- Because a single model can be:
    -  Too sensitive to noise (overfitting)
    - Biased by outliers
    - Limited in what it can learn

- Bagging helps by:
  - Reducing variance (making the model more stable)
  - Improving accuracy,Preventing overfitting (especially for models like decision trees)

