# Intro to Error

### Introduction

As data scientists, we are constantly making mistakes.  Of course we are.  We have a tough job.  It is our job to predict the future or explain underlying causes in the present world, and both of these tasks are hard.  

We do our best by training our machine learning model to find the parameters that best predict future outcomes.  And while this technique is the best that we've got, it still has problems.  In the sections that follow, we'll explore some of the problems that occur.  

### Just an estimation

The outcomes we observe in the real world do exist for a reason.  They are generated by some underlying features, and there *are* parameters associated with these features.  

These features and the associated parameters makeup our *true model*.  They are the real world processes that generate the outcomes that we observe.  As data scientists we try to discover the features and estimate the parameters of this true model.  We represent our true model like so:

$y = \theta_1x_1 + ... \theta_nx_n + \epsilon $

> Where $n$ is the number of features of our underlying model.

When we train a machine learning model, we are finding estimates of these parameters:

$h_\theta(x) = \hat\theta_1x_1 + ... + \hat\theta_nx_n +  \epsilon_i $ 

> We use $\hat{\theta}$ to emphasize that these are just estimates.

Where $h(x) \approx y$.

Now that last line, $h(x) \approx y$, recognizes that our model's hypothesis function, $h(x)$, will never perfectly predict our outcomes.  And the hats over the parameters recognizes that the parameters of our hypothesis function are estimates, but also wrong.  Our models are wrong because they suffer from various types of errors.

### Let us count the ways

As we know, the difference between what our model predicts and what we observe is our error.  And as we'll see, there are three different sources of this error.

#### 1. Irreducible Error
Irreducible error means that our *future outcomes* will have a degree of randomness to them.  Randomness in future outcomes prevents our machine learning models from making predictions with one hundred percent accuracy.  This is called irreducible error.

#### 2. Variance
Variance occurs because *the data we train our models with also has a degree of randomness in it*.  The randomness in our training data affects the parameters that our machine learning algorithm arrives at.  You can imagine that if you trained the model many times, it would be fed different variations of data, and thus arrive at different outcomes.  This variation in our parameters based on randomness is called variance. 

But as we'll see, while this randomness may bring variation to each of our models, and make each of them wrong, if we averaged our parameters, this variation should cancel out.  Thus we could average our parameters over many models to approach the true parameters of the underlying model (so long as there are no other errors). 

#### 3. Bias

Bias occurs when, even if we were to train our model many different times, with different subsets of data, we still would not approach the true underlying model.  This occurs because we simply are not including all of the influences on our target variables.  Unlike the other sources of error, bias doesn't occur because of randomness, but rather because our model isn't fed the proper data to determine what causes the different outcomes.  This is bias.

### Summary

In this lesson, we saw that there are three sources of error when we train a machine learning model: irreducible error, variance, and bias.  Irreducible error is simply the error that comes from our machine learning model's inability to predict the randomness expressed in *future* outcomes.  

Variance occurs because our machine learning model trains on data that also suffers from randomness, and so the randomness affects the parameters of our model.  If we trained our model many times, we would see variations of the parameters.  However, if we averaged the parameters we see from these multiple trainings (with each trained model being fed different data), we would expect the randomness to cancel out and thus the variance to approach zero.  

Finally, our model is biased when even if we were to train our model on our selected features multiple times, the model still would not approach the true underlying model. This occurs when our model does not train on all of the features that exist in the underlying model.