<a href="https://colab.research.google.com/github/Stekosan/500-AI-Machine-learning-Deep-learning-Computer-vision-NLP-Projects-with-code/blob/main/2_Bagging.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [11]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns


from sklearn.tree import DecisionTreeRegressor
from sklearn.tree import DecisionTreeClassifier

from sklearn.svm import SVC

from sklearn.model_selection import cross_val_score
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import train_test_split

from sklearn.ensemble import BaggingClassifier
from sklearn.ensemble import BaggingRegressor
from sklearn.ensemble import RandomForestClassifier

from sklearn.linear_model import LogisticRegression
from sklearn.linear_model import LinearRegression


from sklearn.neighbors import KNeighborsClassifier

from sklearn.metrics import f1_score
from sklearn.metrics import classification_report
from sklearn.metrics import confusion_matrix
from sklearn.metrics import accuracy_score
from sklearn.metrics import mean_absolute_error

# 1. The strength of "weak" models
Hello learners! Welcome to the second chapter of our course! In this chapter, you'll learn about a popular and widely used ensemble method: Bagging. In this first lesson, we'll see what a "weak" model is and how to identify one by its properties.

## 2. "Weak" model
Voting and averaging, which you learned about in the previous chapter, work by combining the predictions of already trained models.

**Usually, these use a small number of estimators that are fine-tuned and individually optimized for the problem.
In fact, these estimators are so well trained that, in some cases, they produce decent results on their own. We'll refer to these estimators as fine-tuned.**

This approach is appropriate when you already have optimized models and want to improve performance further by combining them. But what happens when you don't have these estimators trained beforehand? Well, that's when "weak" estimators come into play.

## 3. Fine-tuned vs "weak" models
You may ask yourself: what's the difference between a weak and a fine-tuned model? First, let's see what a "weak" model is. The idea of "weak" doesn't mean that it is a bad model, just that it is not as strong as a highly optimized, fine-tuned model.

## 4. Properties of "weak" models
A weak estimator, or model, is one which is just slightly better than random guessing.

Therefore, the error rate is less than 50% but close to it. A weak model should be light in terms of space and computational requirements, and fast during training and evaluation.
One good example is a decision tree. Imagine that we fit a decision tree to the data, but instead of optimizing it completely, we limit it to a depth of two. This restricts the model to learn as much as possible, but makes sure that it has the three desired properties: low performance (just above random guessing), it is light (we only need two levels of decision), and therefore, it is also fast for predictions.

## 5. Examples of "weak" models
These are some common examples of weak models. As we stated before, a decision tree constrained with small depth could be used as a weak model.

some weak models:

```python
model = DecisionTreeClassifier(max_depth=3)
model = LogisticRegression(max_iter=50, C=100.0)
model = LinearRegression()
```

Another example is logistic regression, which makes the assumption that the classes are linearly separable. This is not always true, in which case, logistic regression would be wrong, but potentially still useful as a weak estimator.

We could also limit the number of iterations for training, or specify a high value of the parameter C to use a weak regularization.

For regression problems, we have linear regression. Linear regression, like logistic regression, makes the assumption that the output is a linear function of the input features. In addition, it relies on the independence of these features. Because of the simple assumptions of the model, we can use it as a weak estimator. As we are more interested in the properties of a weak model, any other estimator which has the three desired properties can be used as well.

# 6. Let's practice!
Now, let's get some practice with weak models!



# Exercice
## Restricted and unrestricted decision trees
For this exercise, we will revisit the Pokémon dataset from the last chapter. Recall that the goal is to predict whether or not a given Pokémon is legendary.

Here, you will build two separate decision tree classifiers. In the first, you will specify the parameters min_samples_leaf and min_samples_split, but not a maximum depth, so that the tree can fully develop without any restrictions.

In the second, you will specify some constraints by limiting the depth of the decision tree. By then comparing the two models, you'll better understand the notion of a "weak" learner.

In [2]:
import os

file_path2 = "/content/drive/MyDrive/Colab Notebooks/data/EnsembleLearning/Pokemon.csv"

if not os.path.isfile(file_path2):
    raise FileNotFoundError("File not found:", file_path2)

# Read the file using pandas
pokemon = pd.read_csv(file_path2)

In [3]:
pokemon = pd.DataFrame(pokemon)

In [4]:
pokemon.head()

Unnamed: 0,#,Name,Type 1,Type 2,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary
0,1,Bulbasaur,Grass,Poison,318,45,49,49,65,65,45,1,False
1,2,Ivysaur,Grass,Poison,405,60,62,63,80,80,60,1,False
2,3,Venusaur,Grass,Poison,525,80,82,83,100,100,80,1,False
3,3,VenusaurMega Venusaur,Grass,Poison,625,80,100,123,122,120,80,1,False
4,4,Charmander,Fire,,309,39,52,43,60,50,65,1,False


In [5]:
# Define label and target
X = pokemon.drop(["#","Name","Type 1","Type 2","Legendary"], axis=1).values
y = pokemon["Legendary"].values


In [6]:
# Split into train (80%) and test (20%) sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In [7]:
# Build unrestricted decision tree
clf = DecisionTreeClassifier(min_samples_leaf =3, min_samples_split =9, random_state =500)
clf.fit(X_train, y_train)

# Predict the labels
pred = clf.predict(X_test)

# Print the confusion matrix
cm = confusion_matrix(y_test, pred)
print('Confusion matrix:\n', cm)

# Print the F1 score
score = f1_score(y_test, pred)
print('F1-Score: {:.3f}'.format(score))

Confusion matrix:
 [[143   7]
 [  1   9]]
F1-Score: 0.692


In [8]:
# Build a restricted tree by replacing min_samples_leaf and min_samples_split with max_depth=4 and max_features=2.
# Build restricted decision tree
clf = DecisionTreeClassifier(max_depth=4,max_features=2, random_state=500)
clf.fit(X_train, y_train)

# Predict the labels
pred = clf.predict(X_test)

# Print the confusion matrix
cm = confusion_matrix(y_test, pred)
print('Confusion matrix:\n', cm)

# Print the F1 score
score = f1_score(y_test, pred)
print('F1-Score: {:.3f}'.format(score))


Confusion matrix:
 [[141   9]
 [  3   7]]
F1-Score: 0.538


Well done! Notice how the restricted *decision tree performs worse, and is only slightly better than random guessing.*

# Bootstrap aggregating
Having learned about weak models, you're now ready to learn about Bootstrap Aggregating, also know as "**Bagging**".

## 2. Heterogeneous vs Homogeneous Ensembles
Until now, you've only seen heterogeneous ensemble methods, which use different types of fine-tuned algorithms.

Therefore, they tend to work well with a small number of estimators.

For example, we could combine a decision tree, a logistic regression, and a support vector machine using voting to improve the results.

Here are included Voting, Averaging, and Stacking.

**Homogeneous ensemble methods such as bagging, **on the other hand, work by applying the same algorithm on all the estimators, and this algorithm must be a "weak" model.


Heterogeneous Ensemble:
* Voting, averaging, and stacking

Homegeneous Ensemble:
* Bagging and Boosting


In practice, we end up working with a large number of "weak" estimators in order to have better performance than that of a single model. Bagging and Boosting are some of the most popular of this kind.

## 3. Condorcet's Jury Theorem
You might be wondering how it is possible for a large group of "weak" models to be able to achieve good performance?

Again, here is the work of the wisdom of the crowd. Do homogeneous ensemble methods also have that potential? Well, that's what Condorcet showed with his theorem, known as "Condorcet's Jury Theorem". The requirements for this theorem are the following: First, all the models must be independent.

Secondly, each model performs better than random guessing.

And finally, all individual models have similar performance.

If these three conditions are met, then adding more models increases the probability of the ensemble to be correct, and makes this probability tend to 1, equivalent to 100%!

The second and third requirements can be fulfilled by using the same "weak" model for all the estimators, as then all will have a similar performance and be better than random guessing.

## 4. Bootstrapping
To guarantee the first requirement of the theorem, the bagging algorithm trains individual models using a random subsample for each.

This is known as **bootstrapping**, and it guarantees some of the characteristics of a wise crowd. If you recall, a wise crowd needs to be diverse, either through using different algorithms or datasets. Here we're using the same weak model for all the algorithms, but the dataset for each is a different subsample, which provides diversity. Other properties of a wise crowd are independence and no correlation, which are implicit in bootstrapping as the samples are taken separately. After the individual models are trained with their respective samples, they are aggregated using voting or averaging.

## 5. Pros and cons of bagging
Why is bagging a useful technique? First, **it helps reduce variance, as the sampling **is truly random. Bias can also be reduced since we use voting or averaging to combine the models. Because of the high number of estimators used, bagging provides stability and robustness. However, Bagging is computationally expensive in terms of space and time.

## 6. It's time to practice!
Let's now get some practice!

# Exercice
## Training with bootstrapping
Let's now build a "weak" decision tree classifier and train it on a sample of the training set drawn with replacement. This will help you understand what happens on every iteration of a bagging ensemble.

To take a sample, you'll use pandas' .sample() method, which has a replace parameter. For example, the following line of code samples with replacement from the whole DataFrame df:


In [10]:
# Take a sample with replacement
X_train_sample = X_train.sample(frac=1.0, replace=True, random_state=42)
y_train_sample = y_train.loc[X_train_sample.index]

# Build a "weak" Decision Tree classifier
clf = DecisionTreeClassifier(max_depth =4, random_state=500)

# Fit the model to the training sample
clf.fit(X_train_sample, y_train_sample)

AttributeError: 'numpy.ndarray' object has no attribute 'sample'