# Full understanding The Bias-Variance Tradeoff


https://medium.com/swlh/the-bias-variance-tradeoff-f24253c0ab45
    
https://towardsdatascience.com/understanding-the-bias-variance-tradeoff-165e6942b229
    
https://elitedatascience.com/bias-variance-tradeoff
    
    
https://searchenterpriseai.techtarget.com/feature/6-ways-to-reduce-different-types-of-bias-in-machine-learning
    
https://towardsdatascience.com/contents-9b2e49f49fe9
    
https://machinelearningmastery.com/how-to-reduce-model-variance/
    
    
https://www.section.io/engineering-education/ensemble-bias-var/
    
    
https://www.cs.cmu.edu/~wcohen/10-601/bias-variance.pdf

# The types of errors in the prediction of a model are:
* Bias error
* Variance error
* Irreducible error

The predictive error of a model can be calculated as seen below:

`Prediction error = (Bias error)^2 + Variance error + Irreducible error`

![](./i/1_hb8F9jMk0UyYcS3jsyD8sg.png)
If you guessed A has high Bias and B has high Variance, you’re right.


![](./i/image43.png)


# Bias Error:
When a Machine Learning model is unable to capture the true relationship between the features and target of the data, we have an error called Bias. OR Pays very little attention to the training data and oversimplifies the model. 

The Machine Learning model makes assumption based on the available data. If the assumption is too simple, then the model may not be able to accurately account for the relationship between the features and target of the data thereby producing inaccurate predictions.

Mathematically, Bias can be defined as the difference between the Predicted values and the Expected values.

![](./i/1_1R4Btn9TRhksPuIwjZTZ0g.png)

Linear models such as Linear Regression and Logistic Regression which make simple assumptions have high bias. While models such as Decision Trees and Support Vector Machines have low bias.

## Underfitting 
occurs when the model cannot accurately fit the training data and therefore performs poorly on training data.

## Reducing Bias
**1. Change the model(Choose the correct learning model):**
One of the first stages to reducing Bias is to simply change the model. As stated above, some models have High bias while some do not. Do not use a Linear model if features and target of your data do not in fact have a Linear Relationship.

**2. Ensure the Data is truly Representative (Use the right training dataset ):**

Ensure that the training data is diverse and represents all possible groups or outcomes. In the event of an imbalanced dataset, use weighting or penalized models. There has been discussion on the poor accuracy of facial recognition models in identifying people of color. One possible source of such error is that the training dataset was not diverse and the model did not have enough training data to clearly identify persons of color.

**3. Parameter tuning: **

This requires an understanding of the model and model parameters. Algorithms documentations are a good place to start. Every model has a list of parameters which it takes as inputs. Tweaking these parameters may give you the desired results. You can also build your own algorithms from scratch.

**4. Perform data processing mindfully** 

Machine intelligence involves three types of data processing: pre-processing, in-processing, and post-processing.  When you prepare datasets in pre-processing, bias can creep in during formatting before it is fed in the neural network. Any data that could introduce a bias should be excluded in this step. With in-processing, the data is manipulated as it passes through the neural network itself – so, the weighting of the neural nodes must be correct to prevent a biased output. Finally, ensure there is no bias when interpreting data for human-readable consumption in the post-processing stage. 
**5. Monitor real-world performance across the ML lifecycle **

No matter how carefully you choose the learning model or vet the training data, the real-world can throw up unexpected challenges. It is important to not consider any ML model as “trained” and finalized, not requiring any further monitoring. Also, try and use real-world data for testing ML wherever possible so that bias can be detected and corrected before it creates a situation affecting human lives negatively.

**6. Make sure that there are no infrastructural issues:**

Apart from data and the human factor, the infrastructure itself could cause bias. For example, if you rely on data collected via electronic or mechanical sensors, then equipment problems can introduce bias. This is often the hardest type of bias to detect and needs careful consideration, with investment in the latest digital and technology infrastructure.  These five best practices should form the starting point in the discussion around bias in machine learning. 

## Bias is as a result of over simplified model assumptions.

# Variance
Pays too much attention to training data and does not generalize on the data.
Variability of a model prediction for a given data point. We can build the model multiple
times, so the variance is how much the predictions for a given point vary between different realizations of the model.![]

## Overfitting 
occurs when a model has high variance and low bias. When a model fits too well with the training dataset such that it captures noise, it is said to have Overfit the training data. This will negatively impact the predictive power of the model. 

If our model returns a high accuracy on training data but performs poorly on testing data, we can denote that the model has fit too closely to the training data and can therefore not generalize on new data.
## You can measure both types of variance in your specific model using your training data.

### Measure Algorithm Variance: 
The variance introduced by the stochastic nature of the algorithm can be measured by repeating the evaluation of the algorithm on the same training dataset and calculating the variance or standard deviation of the model skill.
### Measure Training Data Variance: 
The variance introduced by the training data can be measured by repeating the evaluation of the algorithm on different samples of training data, but keeping the seed for the pseudorandom number generator fixed then calculating the variance or standard deviation of the model skill.

## Reducing Variance Error

**1. Ensemble Learning:**

A good way to tackle high variance is to train your data using multiple models. Ensemble learning is able to leverage on both weak and strong learners in order to improve model prediction. In fact, most winning solutions in Machine Learning 
competitions make use of Ensemble Learning.

**2. Ensemble Parameters from Final Models.**

**3. Increase Training Dataset Size: **

This sounds tricky. Why add more data when the variance is high? More data increases the data to noise ratio which reduces the variance of the model. Also, when the model has more data, it is better able to come up with a general rule which will also apply to new data.

**4. Decrease regularization :**

regularization is the process of adding information (an additional penalty ) in order to solve an ill-posed problem or to prevent overfitting. Regularization makes the parameter values small and this prevents overfitting. Later in the post, we’ll see why does this work. Regularization is typically used to reduce the variance with a model by applying a penalty to the input parameters with the larger coefficients. There are a number of different methods, such as L1 regularization, Lasso regularization, dropout, etc., which help to reduce the noise and outliers within a model. However, if the data features become too uniform, the model is unable to identify the dominant trend, leading to underfitting. By decreasing the amount of regularization, more complexity and variation is introduced into the model, allowing for successful training of the model.

## Variance occurs when the assumptions are too complex.



## Best: low Variance, low bias
High bias (no attention to detail) : 
1. Underfitting 
2. Overly-simplified Model 
3. High error on both test and train data

![](./i/image12.png)

## high Variance (too much attention to train):
 Overfitting, Low error on train data and high on test, Starts modelling the noise in the input




## Why is Bias Variance Tradeoff?

If our model is too simple and has very few parameters then it may have high bias and low variance. On the other hand if our model has large number of parameters then it’s going to have high variance and low bias. So we need to find the right/good balance without overfitting and underfitting the data.

**This tradeoff in complexity is why there is a tradeoff between bias and variance. An algorithm can’t be more complex and less complex at the same time.**

![](./i/image30.png)


### Reason 1:
Low variance (high bias) algorithms tend to be less complex, with simple or rigid underlying structure.

They train models that are consistent, but inaccurate on average.
These include linear or parametric algorithms such as regression and naive Bayes.



### Reason 2:
On the other hand, low bias (high variance) algorithms tend to be more complex, with flexible underlying structure.

They train models that are accurate on average, but inconsistent.
These include non-linear or non-parametric algorithms such as decision trees and nearest neighbors. 

## For example: 
    
Voting Republican - 13 Voting Democratic - 16 Non-Respondent - 21 Total - 50
The probability of voting Republican is 13/(13+16), or 44.8%. We put out our press release that the
Democrats are going to win by over 10 points; but, when the election comes around, it turns out they
lose by 10 points. That certainly reflects poorly on us. Where did we go wrong in our model?
## Bias scenario's: 
using a phonebook to select participants in our survey is one of our sources of bias.
By only surveying certain classes of people, it skews the results in a way that will be consistent if we repeated the entire model building exercise. Similarly, not following up with respondents is another source of bias, as it consistently changes the mixture of responses we get. On our bulls-eye diagram, these move us away from the center of the target, but they would not result in an increased scatter of estimates.
## Variance scenarios: 
the small sample size is a source of variance. If we increased our sample size, the results would be more consistent each time we repeated the survey and prediction. The results still might be highly inaccurate due to our large sources of bias, but the variance of predictions will be reduced

## A proper machine learning workflow includes:

1. Separate training and test sets
2. Trying appropriate algorithms (No Free Lunch)
3. Fitting model parameters
4. Tuning impactful hyperparameters
5. Proper performance metrics
6. Systematic cross-validation