<a href="https://colab.research.google.com/github/TerryTian21/PyTorch-Practice/blob/main/Tutorials/Ensemble_Learning.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Ensemble Techniques for ML

## What is Ensemble Learning

Combining multiple machine learning models to solve a machine learning problem. There are three main components of ensemble learning.

1. Random Data Samples - Each sample has an equal probability of being chosen
2. Weak Learners - A ML model which performs relatively poorly
3. Final Model - A combination of the weak learners

## Bias

Sometimes called algorithm bias or AI bias is a phenonmenon that occurs when an algorithm produces results that a systemically prejudiced due to erroneous assumptions.

i.e. All data points are skewed `x` degrees left.

## Variance

An error from sensitivity to small fluctuations in the training set. High variacne can cause an algorithm to model the random noise in the training data rather than the intended outputs.

## Different types of Ensemble Methods

1. Bagging - Also known as bootstrap aggregation. We randomly sample `m` observeratioons from a population of `n`. Where $ n>m $. Models are trained using the bootstrap samples and the final output is usually a result of soft voting from the models.

2. Stacking - Also known as voting. 2 or more models are created from the same dataset and the final model aggregates the results from each of the weak learners, with the opportunity to weight each model's output.

3. Boosting - A sequential learning technique where subsequent models are fit to the residuals of the previous models. Incorrect classifications or high loss data points are prioritized in subsequent models.

# Ensemble Learning in Action

In [1]:
from sklearn.datasets import load_iris
import pandas as pd
import numpy as np


In [4]:
data = load_iris()

In [14]:
# Create the dataframe

df = pd.DataFrame(data = data.data, columns=data.feature_names)
df.insert(4, "Class", data.target, True)
df

Unnamed: 0,sepal length (cm),sepal width (cm),petal length (cm),petal width (cm),Class
0,5.1,3.5,1.4,0.2,0
1,4.9,3.0,1.4,0.2,0
2,4.7,3.2,1.3,0.2,0
3,4.6,3.1,1.5,0.2,0
4,5.0,3.6,1.4,0.2,0
...,...,...,...,...,...
145,6.7,3.0,5.2,2.3,2
146,6.3,2.5,5.0,1.9,2
147,6.5,3.0,5.2,2.0,2
148,6.2,3.4,5.4,2.3,2


## Bagging

https://machinelearningmastery.com/bagging-ensemble-with-python/

In [16]:
# Create our features and targets

X = df.drop("Class", axis="columns")
y = df["Class"]

In [17]:
X.head(5)

Unnamed: 0,sepal length (cm),sepal width (cm),petal length (cm),petal width (cm)
0,5.1,3.5,1.4,0.2
1,4.9,3.0,1.4,0.2
2,4.7,3.2,1.3,0.2
3,4.6,3.1,1.5,0.2
4,5.0,3.6,1.4,0.2


Let's scale the data. We can use the `StandardScaler` from `sklearn`.

In [21]:
from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
X_scaled[:5], X[:5].to_numpy()

(array([[-0.90068117,  1.01900435, -1.34022653, -1.3154443 ],
        [-1.14301691, -0.13197948, -1.34022653, -1.3154443 ],
        [-1.38535265,  0.32841405, -1.39706395, -1.3154443 ],
        [-1.50652052,  0.09821729, -1.2833891 , -1.3154443 ],
        [-1.02184904,  1.24920112, -1.34022653, -1.3154443 ]]),
 array([[5.1, 3.5, 1.4, 0.2],
        [4.9, 3. , 1.4, 0.2],
        [4.7, 3.2, 1.3, 0.2],
        [4.6, 3.1, 1.5, 0.2],
        [5. , 3.6, 1.4, 0.2]]))

Splitting the Dataset into train and test

The `stratify` parameter dictates where to keep the same proportion of classes in training and test sets. For example if 80% of the data belonged to class A. Then in both the train and test dataset, this proportion is maintained.


In [22]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, random_state=42, stratify=y)

In [24]:
X_train.shape, X_test.shape

((112, 4), (38, 4))

Let's create our weak learner and check the accuracy. We will be using the `DecisionTreeClassifier` and K-Fold cross-validation.

K-Fold cross-validation allows us to split the dataset into various subsets and the model is then trained using each subset and gets the accuracy scores after each iteration. K referes to the number of subsets of folds the dataset gets divided into.

In [29]:
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import cross_val_score

score = cross_val_score(DecisionTreeClassifier(), X_scaled, y, cv=5)
print(f"Scores : {score}")
print(f"Average Score: {score.mean()}")


Scores : [0.96666667 0.96666667 0.9        1.         1.        ]
Average Score: 0.9666666666666668


Implementing the bagging algorithm.

We can start by building the `BaggingClassifier` which fits the weak learners (DecisionTreeClassifier) on randomply sampled subsets and creates an aggregated final model which implements a voting mechansim to produce the final class

- **n-estimators** : Represents the number of weak learners used.
- **max_sample** : The maximum number of data that is sampled from the training set
- **bootstrap** : Allows for resampling of the training set without replacement
- **oob_score** Used to compute accuracy score after training. Taken from the data that was not used as part of the training subset


In [31]:
from sklearn.ensemble import BaggingClassifier

bag_model = BaggingClassifier(base_estimator=DecisionTreeClassifier(),
                              n_estimators=20,
                              max_samples=0.8,
                              bootstrap=True,
                              oob_score=True,
                              random_state=42)

In [32]:
bag_model.fit(X_train, y_train)

Notice how the accuracy actually decreased from our original decision tree. It could be that there was no need for bagging as the dataset was small and introducing bagging added a layer of uncessary complication.

In [35]:
# Check results

print(bag_model.oob_score_)
print(bag_model.score(X_test, y_test))

0.9464285714285714
0.9210526315789473


### More Complex Data

## Boosting