# Bias and Variance in Machine Learning

We will explore the main problems in machine learning, namely bias, variance, underfitting, and overfitting. As you may know, machine learning is a branch of artificial intelligence that enables machines to analyze data and make predictions. However, if the machine learning model is not accurate, it can lead to prediction errors known as bias and variance. These errors are always present as there is a slight difference between model predictions and actual predictions. Our main goal is to reduce these errors to obtain more accurate results. In this notebook, we will delve into bias and variance, discuss the bias-variance trade-off, and explore underfitting and overfitting.

## Errors in Machine Learning?

In machine learning, an error is a measure of how accurately an algorithm can make predictions for the previously unknown dataset. On the basis of these errors, the machine learning model is selected that can perform best on the particular dataset. There are mainly two types of errors in machine learning, which are:

* **Reducible errors:** These errors can be reduced to improve the model accuracy. Such errors can further be classified into bias and Variance.

* **Irreducible errors:** These errors will always be present in the model regardless of which algorithm has been used. The cause of these errors is unknown variables whose value can't be reduced.

![bias-and-variance-in-machine-learning2.png](attachment:bias-and-variance-in-machine-learning2.png)



## What is Bias? 
Machine learning models learn patterns in data and use those patterns to make predictions. During the training phase, the model learns these patterns from the dataset and applies them to new data to make predictions. However, **there is often a difference between the predicted values and the actual values**, which is known as bias errors. Bias errors occur when machine learning algorithms, such as Linear Regression, **cannot capture the true relationship between the data points**

Bias errors arise due to assumptions in the model, which make the target function simple to learn. The bias in a model can be either low or high:
* **Low bias**: makes fewer assumptions about the target function.
* **High bias**: makes more assumptions, which can lead to a failure to capture important features in the dataset. A high bias model is also unable to perform well on new data.

![bias-and-variance-in-machine-learning4.png](attachment:bias-and-variance-in-machine-learning4.png)

Linear algorithms generally have high bias because they are designed to learn quickly. Simpler algorithms also tend to have higher bias. Nonlinear algorithms, on the other hand, often have low bias. Examples of machine learning algorithms with low bias include Decision Trees, k-Nearest Neighbours, and Support Vector Machines. Linear Regression, Linear Discriminant Analysis, and Logistic Regression are examples of algorithms with high bias.

### Ways to reduce High Bias:
High bias mainly occurs due to a much simple model. Below are some ways to reduce the high bias:

1. Increase the input features as the model is underfitted.
2. Decrease the regularization term.
3. Use more complex models, such as including some polynomial features.

## What is variance?

In machine learning, **variance refers to the amount of variation in the prediction that would occur if different training data was used.** A good model should not vary too much from one training dataset to another, indicating that it can understand the relationship between inputs and outputs. 
* **Low variance**: means there is a small variation in the prediction of the target function with changes in the training dataset.
* **High variance**: shows a large variation in the prediction. High variance can lead to overfitting, which means the model learns too much from the training dataset and does not generalize well with unseen data. Also, increase model complexities.

Usually, nonlinear algorithms have a lot of flexibility to fit the model, have high variance.

Some examples of machine learning algorithms with low variance are, Linear Regression, Logistic Regression, and Linear discriminant analysis. At the same time, algorithms with high variance are decision tree, Support Vector Machine, and K-nearest neighbours.


![bias-and-variance-in-machine-learning4%20%281%29.png](attachment:bias-and-variance-in-machine-learning4%20%281%29.png)



### Ways to Reduce High Variance:
1. Reduce the input features or number of parameters as a model is overfitted.
2. Do not use a much complex model.
3. Increase the training data.
4. Increase the Regularization term.

## Different Combinations of Bias-Variance
There are four possible combinations of bias and variances, which are represented by the below diagram:


![biasVVariance.png](attachment:biasVVariance.png)


* **Low-Bias, Low-Variance:** 
The combination of low bias and low variance shows an **ideal machine learning model**. However, it is not possible practically.
* **Low-Bias, High-Variance:** 
With low bias and high variance, model predictions are inconsistent and accurate on average. This case occurs when the model learns with a large number of parameters and hence leads to an **overfitting**
* **High-Bias, Low-Variance:** 
With High bias and low variance, predictions are consistent but inaccurate on average. This case occurs when a model does not learn well with the training dataset or uses few numbers of the parameter. It leads to **underfitting** problems in the model.
* **High-Bias, High-Variance:**
With high bias and high variance, predictions are inconsistent and also inaccurate on average.

## What is Overfitted Model?

A model which is too complex and fits the training data too well or has high accuracy on training set but it does not perform well on the test set. This results in a model that captures noise and random fluctuations in the training data and does not generalize well to new data. An overfit model will have a low bias and a high variance.

![overfit.png](attachment:overfit.png)


## What is Underfitted Model?

A model which is too simple and does not fit the training data well and does not perform well on both training and test set. This results in a model that does not capture the underlying patterns in the data and performs poorly on both the training and test data. An underfit model will have a high bias and a low variance.

![underfit.png](attachment:underfit.png)

So, now we know what is Underfitted model and Overfitted model. We will now see what is Balanced model.

## What is a Balanced Model?
A balanced model is a model which performs well both on training and test set. This may not have as high accuracy as an overfitted model on a training set but a balanced model will perform well on test set as well. A Balanced model will have low bias and low variance.


![balanced.jpeg](attachment:balanced.jpeg)



### Example of Overfitting

In [1]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns

In [2]:
dataset = pd.read_csv('dataset.csv')

In [3]:
X = dataset.drop('Price', axis=1)
y = dataset['Price']

In [4]:
from sklearn.model_selection import train_test_split
train_X, test_X, train_y, test_y = train_test_split(X, y, test_size=0.3, random_state=2)

In [5]:
from sklearn.linear_model import LinearRegression
reg = LinearRegression().fit(train_X, train_y)

In [7]:
# Training Coefficient of Determination score (R^2)
reg.score(train_X, train_y)

0.6827792395792723

In [8]:
# Testin Coefficient of Determination score (R^2)
reg.score(test_X, test_y)

0.13853683161589492

> **Here training $R^2$ score is 68% but test $R^2$ score is 13.85% which is very low. (Low Bias and High Variance)**

<h4 style='color:purple'>Normal Regression is clearly overfitting the data, let's try other models</h4>

# Hyper-parameter Optimization
Hyperparameter optimization is a crucial step in building a machine learning model. **Hyperparameters are parameters of a machine learning algorithm that are not learned from the data, but set prior to training the model.** Examples of hyperparameters include the learning rate, number of hidden layers in a neural network, regularization strength, and more. The goal of hyperparameter optimization is to find the optimal set of hyperparameters from a range of possible values that result in the best performance of the model on a given dataset.

## Hyperparameter Optimization Techniques:
**1. Grid Search**:
	
    Grid search is a brute-force approach that involves defining a range of hyper-parameters to explore, and then exhaustively search over all possible combinations of hyper-parameters. This approach is simple to understand and implement, but it can be computationally expensive when the search space is large (many hyperparameters).

**2. Random Search**:
	
    Random search is an alternative to grid search that works by randomly samples hyperparameters from a specified range. Unlike grid search, it does not test all possible combinations of hyperparameters, but instead focuses on a random subset. This approach is less computationally expensive than grid search, and it can be more effective when the search space is large and the impact of individual hyper-parameters on performance is unclear.


**3. Bayesian Optimization**:
	
    Bayesian optimization is a probabilistic approach that models the relationship between hyperparameters and model performance using a Gaussian process. It uses an acquisition function to select the next set of hyperparameters to evaluate based on the model's performance so far. Bayesian optimization can be more efficient than grid search and random search, especially when the search space is large and complex.

**4. Genetic Algorithms**:
	
    Genetic algorithms are inspired by natural selection and involve the evolution of a population of potential solutions over multiple generations. In hyperparameter optimization, the genetic algorithm works by creating a population of hyperparameter sets and then using selection, crossover, and mutation to create new populations. This can be effective in finding optimal hyperparameters, especially when the search space is large and complex.

**5. Gradient-Based Optimization**:
	
    Gradient-based optimization involves using gradients of the objective function with respect to hyperparameters to update the hyperparameters iteratively. This approach is typically used for optimizing deep neural networks, where the objective function is non-convex and high dimensional. This method can be effective, but can also be computationally expensive, especially for large models.

### 1. Grid Search:

In [13]:
from sklearn.model_selection import GridSearchCV
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, recall_score, precision_score, f1_score, confusion_matrix

In [17]:
# Load the dataset
dataset = pd.read_csv('diabetes.csv')
x = dataset.drop('Outcome',axis=1)
y = dataset['Outcome']

In [30]:
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, stratify=y, random_state=0)

In [33]:
rf=RandomForestClassifier(random_state=0)
rf.fit(x_train, y_train)
y_pred=rf.predict(x_test)

print('* Training score: %.3f' %(rf.score(x_train, y_train)*100))
print('* Testing score (Accuracy): %.3f' %(rf.score(x_test, y_test)*100)) #accuracy_score(y_test,y_pred)
print('* Recall score: %.3f' %(recall_score(y_test,y_pred)*100))
print('* Precision score: %.3f' %(precision_score(y_test,y_pred)*100))
print('* F1 score: %.3f' %(f1_score(y_test,y_pred)*100))

* Training score: 100.000
* Testing score (Accuracy): 81.169
* Recall score: 62.963
* Precision score: 79.070
* F1 score: 70.103


In [34]:
# Define the parameter grid
param_grid = {
    'n_estimators': [100, 200, 300],
    'max_depth': [5, 10, 15, None],
    'min_samples_split': [2, 5, 10],
    'min_samples_leaf': [1, 2, 4]
}

In [35]:
# Create a grid search object
grid_search = GridSearchCV(estimator=rf, param_grid=param_grid, cv=5) 

# Fit the grid search object to the data
grid_search.fit(x_train, y_train)

In [38]:
# Print the best hyper-parameters
print("Best hyper-parameters: ", grid_search.best_params_)
print("Best estimator: ", grid_search.best_estimator_)
print("Best score: ", grid_search.best_score_)

Best hyper-parameters:  {'max_depth': None, 'min_samples_leaf': 2, 'min_samples_split': 5, 'n_estimators': 200}
Best estimator:  RandomForestClassifier(min_samples_leaf=2, min_samples_split=5,
                       n_estimators=200, random_state=0)
Best score:  0.7622151139544182


In [39]:
# Rebuild the model using the optimal hyperparameters values
rf_best=RandomForestClassifier(min_samples_leaf=2, min_samples_split=5, n_estimators=200, random_state=0)
rf_best.fit(x_train, y_train)
y_pred=rf_best.predict(x_test)

print('* Training score: %.3f' %(rf_best.score(x_train, y_train)*100))
print('* Testing score (Accuracy): %.3f' %(rf_best.score(x_test, y_test)*100)) #accuracy_score(y_test,y_pred)
print('* Recall score: %.3f' %(recall_score(y_test,y_pred)*100))
print('* Precision score: %.3f' %(precision_score(y_test,y_pred)*100))
print('* F1 score: %.3f' %(f1_score(y_test,y_pred)*100))

* Training score: 97.231
* Testing score (Accuracy): 80.519
* Recall score: 61.111
* Precision score: 78.571
* F1 score: 68.750


### 2. Random Search:

In [40]:
from sklearn.model_selection import RandomizedSearchCV

In [41]:
rf=RandomForestClassifier(random_state=0)
rf.fit(x_train, y_train)
y_pred=rf.predict(x_test)

print('* Training score: %.3f' %(rf.score(x_train, y_train)*100))
print('* Testing score (Accuracy): %.3f' %(rf.score(x_test, y_test)*100)) #accuracy_score(y_test,y_pred)
print('* Recall score: %.3f' %(recall_score(y_test,y_pred)*100))
print('* Precision score: %.3f' %(precision_score(y_test,y_pred)*100))
print('* F1 score: %.3f' %(f1_score(y_test,y_pred)*100))

* Training score: 100.000
* Testing score (Accuracy): 81.169
* Recall score: 62.963
* Precision score: 79.070
* F1 score: 70.103


In [42]:
# Define the parameter grid
param_dist = {
    'n_estimators': [100, 200, 300],
    'max_depth': [5, 10, 15, None],
    'min_samples_split': [2, 5, 10],
    'min_samples_leaf': [1, 2, 4]
}

In [43]:
# Create a random search object
random_search = RandomizedSearchCV(estimator=rf, param_distributions=param_dist, n_iter=10, cv=5)

# Fit the random search object to the data
random_search.fit(x_train, y_train)

In [44]:
# Print the best hyper-parameters
print("Best hyper-parameters: ", random_search.best_params_)
print("Best estimator: ", random_search.best_estimator_)
print("Best score: ", random_search.best_score_)

Best hyper-parameters:  {'n_estimators': 100, 'min_samples_split': 2, 'min_samples_leaf': 4, 'max_depth': 15}
Best estimator:  RandomForestClassifier(max_depth=15, min_samples_leaf=4, random_state=0)
Best score:  0.7589630814340931


In [45]:
# Rebuild the model using the optimal hyperparameters values
rf_best2=RandomForestClassifier(max_depth=15, min_samples_leaf=4, random_state=0)
rf_best2.fit(x_train, y_train)
y_pred=rf_best2.predict(x_test)

print('* Training score: %.3f' %(rf_best2.score(x_train, y_train)*100))
print('* Testing score (Accuracy): %.3f' %(rf_best2.score(x_test, y_test)*100)) #accuracy_score(y_test,y_pred)
print('* Recall score: %.3f' %(recall_score(y_test,y_pred)*100))
print('* Precision score: %.3f' %(precision_score(y_test,y_pred)*100))
print('* F1 score: %.3f' %(f1_score(y_test,y_pred)*100))

* Training score: 90.717
* Testing score (Accuracy): 79.870
* Recall score: 59.259
* Precision score: 78.049
* F1 score: 67.368
