# 1. Introduction
## 1.1 Definition
## Bias
**Bias** in machine learning refers to the systematic errors or inaccuracies that arise from the training data used to build a model. These errors can lead to unfair or discriminatory outcomes when the model is applied to real-world situations.

* **High bias:** When a model is too simple, it may make overly strong assumptions about the data. This often leads to underfitting, where the model fails to capture the underlying patterns in the data. For example, a linear model might have high bias if it's used to approximate a non-linear relationship.

* **Low bias:** A more complex model may have low bias, meaning it can capture intricate (*many complexly arranged elements*) patterns in the data, but this comes at the cost of higher variance (it may overfit the data and perform poorly on unseen data).

## Variance
**Variance** refers to the model's sensitivity to small fluctuations in the training data. A model with high variance pays too much attention to the noise or details of the training data, which can lead to **overfitting.**

**Overfitting** occurs when the model performs well on the training data but poorly on new, unseen data because it has memorized the specific patterns (and noise) in the training set rather than learning generalizable patterns.
<br>
<br>
Here's a breakdown of how variance works:

* **High variance:** When a model is too complex, it can fit the training data almost perfectly, capturing not only the true underlying patterns but also the noise or random fluctuations. This results in **overfitting**, where the model has poor **generalization** to new data because it is too tuned to the specificities of the training data.

* **Low variance:** A model with low variance makes smoother and more general predictions, potentially ignoring small fluctuations in the data. However, if the variance is too low, the model might **underfit** the data, failing to capture important details.

## Bias Variance Tradeoff
The **bias-variance trade-off** is a fundamental concept in machine learning that describes the relationship between a model's ability to fit the training data (**bias**) and its ability to generalize to new data (**variance**).

### Key Tradeoff:
* **High bias, low variance** (*elements, are away from centroid but close to each others*): A simple model with **high bias** might not fit the training data well (**underfitting**), but it will likely **generalize** better because *it’s less sensitive to changes in the training data.*

* **Low bias, high variance** (*elements are close to centroid but away from each others*): A complex model with **low bias** might fit the training data perfectly (**overfitting**), but it will perform poorly on new data because *it’s too sensitive to variations in the training data.*

**Thanks to :** [Bias and Variance in Machine Learning](https://www.bmc.com/blogs/bias-variance-machine-learning/)

# Reference:
- [mlxtend](https://rasbt.github.io/mlxtend/)
- [bias_variance_decomp](https://rasbt.github.io/mlxtend/api_subpackages/mlxtend.evaluate/#bias_variance_decomp)
- [iris_data](https://rasbt.github.io/mlxtend/api_subpackages/mlxtend.data/#iris_data)
- [sklearn.model_selection.**train_test_split**](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html)
- [sklearn.tree.**DecisionTreeClassifier**](https://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html)
- [sklearn.ensemble.**BaggingClassifier**](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.BaggingClassifier.html)

To understand the **bias and variance trade off**, let's try to use a library called [**mlxtend**](https://rasbt.github.io/mlxtend/) (*machine learning extension*), which is targeted for data science tasks. This library offers a function called **bias_variance_decomp** that we can use to calculate bias and variance.

Let's use the **Iris data dataset** included in **mlxtend** as the base data set and carry out the **bias_variance_decomp** using two algorithms:
1. Decision Tree
2. Bagging.

# 2. Import libraries

### [Issue while training: AttributeError: module 'numpy' has no attribute 'int'](https://github.com/WongKinYiu/yolov7/issues/1280)

In [1]:
!pip install "numpy<1.24.0"



* **mlxtend.evaluate.bias_variance_decomp:** A function from the **mlxtend** library that calculates the **bias, variance,** and **overall error of a model** using decomposition techniques.
* **DecisionTreeClassifier:** The decision tree classifier from sklearn, which will be used *to train the model.*
* **iris_data:** A function to load the popular *Iris dataset from the mlxtend* library.
* **train_test_split:** A function to *split the dataset into training and testing sets.*

In [2]:
# import the necessarily required libraries
from mlxtend.evaluate import bias_variance_decomp
from mlxtend.data import iris_data
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import BaggingClassifier

# 3. Load the dataset
* **X:** Contains the features of the Iris dataset (*sepal length, sepal width, petal length, petal width*).
* **y:** Contains the labels (*iris species categories*).

In [3]:
# Get the iris flower data set
X, y = iris_data()

# 4. Split the dataset
* **train_test_split:** Splits the dataset into training and testing sets.
* **test_size=0.3:** Allocates 30% of the data for testing and 70% for training.
* **random_state=123:** Ensures the reproducibility of the split.
* **shuffle=True:** Shuffles the data before splitting.
* **stratify=y:** Ensures that the class distribution in the training and test sets matches that of the original dataset.

In [4]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3,
                              random_state=123, shuffle=True, stratify=y)

# 5. Define the Decision Tree Algorithm
**bias-variance decomposition** using a **Decision Tree classifier** *to evaluate the trade-off between bias and variance.*

In [5]:
# Initializes a Decision Tree classifier. The random_state=123 ensures that the results are reproducible.
tree = DecisionTreeClassifier(random_state=123)

# 6. Bias-Variance Decomposition
estimates the **average bias, variance**, and **expected loss (error)** of a model using cross-validation.

* **loss='0-1_loss':** The **0-1 loss** is used for classification tasks, where loss is **0 for a correct prediction** and **1 for an incorrect one.**
* **num_rounds=1000:** The decomposition is **run 1000 times** to get stable estimates for **bias and variance.**



In [6]:
# Get Bias and Variance - bias_variance_decomp function
avg_exp_loss, avg_bias, avg_var = bias_variance_decomp(tree, X_train, y_train,
            X_test, y_test, loss='0-1_loss', random_seed=123, num_rounds=1000)

# 7. Displaying Bias and Variance

* **avg_bias:** Measures the bias, which is the error introduced by approximating the true function with the model (low bias means the model fits the training data well).
* **avg_var:** Measures the variance, which reflects how sensitive the model is to variations in the training data (high variance means the model overfits the training data).


In [7]:
# Display Bias and Variance
print(f'Average Expected Loss: {round(avg_exp_loss, 4)}')
print(f'Average Bias: {round(avg_bias, 4)}')
print(f'Average Variance: {round(avg_var, 4)}')

Average Expected Loss: 0.0607
Average Bias: 0.0222
Average Variance: 0.0393


# 8. Define Bagging Classifier model
the bias and variance of a Bagging ensemble method using a decision tree as the base estimator

* **bag :** A bagging classifier that uses the ***decision tree as the base estimator.***
* **n_estimators=100**: it specifies that **100 decision trees** will be trained on *different subsets of the data*.

In [8]:
# Get the iris flower data set
# X, y = iris_data()
# X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3,
#                               random_state=123, shuffle=True, stratify=y)

# Define Algorithm
tree = DecisionTreeClassifier(random_state=123)
bag = BaggingClassifier(estimator=tree, n_estimators=100, random_state=123)

In [None]:
# Get Bias and Variance - bias_variance_decomp function
avg_exp_loss, avg_bias, avg_var = bias_variance_decomp(bag, X_train, y_train,
            X_test, y_test, loss='0-1_loss', random_seed=123, num_rounds=10000)

# Display Bias and Variance
print(f'Average Expected Loss: {round(avg_exp_loss, 4)}')
print(f'Average Bias: {round(avg_bias, 4)}')
print(f'Average Variance: {round(avg_var, 4)}')