# The bias variance tradeoff

### Introduction

We have now been exposed to the three different sources of error that can arise in our model.  Here are the three sources:

* **bias** when our model does not include one or more features that contributes to variations in outcomes of our data.  
* **variance** when we include too many features in our model that make our models too flexible, such that it picks up on randomness in the data.  
* **irreducible error** which occurs due to a degree of randomness in our target variable. 

As we know, we cannot eliminate irreducible error.  But we can develop techniques for not including too few parameters and thus contributing bias, or too many parameters and thus introducing variance, into our model.

### Setting up our data

So far we have seen a model that suffers from bias, one from variance, and another that has the correct number of variables.  In this lesson, we'll take a look at each of these types of models togther and see how they compare.

Now to create these three models, we'll just use the same process of inititializing our model, fitting the data and looking at some scores.  The only difference between the models will be the features that we pass into the models.  Lucky for us, we already have these features loaded in a separate file.  

So let's load up the data, and then fit three different models with the following features:

1. temperatures,
2. temperatures and weekends, and
3. temperatures, weekends and ages

In [2]:
from sklearn.linear_model import LinearRegression
from data import input_temps, temps_and_is_weekends, temps_weekends_and_ages, customers_with_errors
feature_datasets = [input_temps, temps_and_is_weekends, temps_weekends_and_ages]
models = []
for dataset in feature_datasets:
    model = LinearRegression()
    model.fit(dataset, customers_with_errors)
    models.append(model)
models

intercepts = [model.intercept_ for model in models]
# [35.62031572335471, 9.854773197812762, 12.155548281106803]

coefs = [model.coef_ for model in models]
coefs

[array([2.87988515]),
 array([ 3.07299452, 38.61313304]),
 array([ 3.07698899, 38.62306381, -0.05584566])]

Ok, let's take a look at these models.

In [11]:
from data import data_trace, prediction_traces
from graph import plot
# plot([data_trace,*prediction_traces])