## Bias-Variance

In order to minimize the expected test error, we need to select a statistical learning method that simultaneously achieves
low variance and low bias.


### Variance

Variance refers to the amount by which function f would change if we
estimated it using a different training data set. Since the training data
are used to fit the statistical learning method, different training data sets
will result in a different function f. But ideally the estimate for f should not vary
too much between training sets. However, if a method has high variance
then small changes in the training data can result in large changes in estimated f. In
general, more flexible statistical methods have higher variance.


### Bias

Bias refers to the error that is introduced by approximating
a real-life problem, which may be extremely complicated, by a much
simpler model. For example, linear regression assumes that there is a linear
relationship between Y and X1,X2, . . . , Xp. It is unlikely that any real-life
problem truly has such a simple linear relationship, and so performing linear
regression will undoubtedly result in some bias in the estimate of f.


### Bias-Variance Trade-off

In the diagram below, as the complexity increases, the variance increases and bias decreases.
This corresponds to the total error (= train_error + test_error) varying as a curve. It first decreases as the complexity grows to a point where the accuracy of the model is optimal. Once it reaches a point where the model starts overfitting the train data, the total error increases again.

The point where the total error is minimal is the trade-off point between bias and variance. This will correspond to the most accurate and optimal model for the train and test dataset.

<img src="../images/complexity-error.png", style="width: 700px;"> 


Reference - Introduction to Statistical Learning using R

In [6]:
import plotly.graph_objs as go
from plotly import tools
from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot
init_notebook_mode(connected=True)

import numpy as np
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import cross_val_score

In [7]:
#Data
np.random.seed(0)

n_samples = 30
degrees = [1, 4, 15]
title_msg = ['High Bias', 'Trade-Off', 'High Variance']
true_fun = lambda X: np.cos(1.5 * np.pi * X)
X = np.sort(np.random.rand(n_samples))
y = true_fun(X) + np.random.randn(n_samples) * 0.1
titles = []
data = [] 

In [8]:
for i in range(len(degrees)):
    polynomial_features = PolynomialFeatures(degree=degrees[i],
                                             include_bias=False)
    linear_regression = LinearRegression()
    pipeline = Pipeline([("polynomial_features", polynomial_features),
                         ("linear_regression", linear_regression)])
    pipeline.fit(X[:, np.newaxis], y)

    # Evaluate the models using crossvalidation
    scores = cross_val_score(pipeline, X[:, np.newaxis], y,
                             scoring="neg_mean_squared_error", cv=10)

    X_test = np.linspace(0, 1, 100)
    
    if(i==1):
        leg=True
    else:
        leg=False
    data.append([])
    trace1 = go.Scatter(x=X_test, y=pipeline.predict(X_test[:, np.newaxis]), 
                        name="Model", mode='lines',
                        line=dict(color='blue', width=1),
                        showlegend=leg)
    data[i].append(trace1)
    trace2 = go.Scatter(x=X_test, y=true_fun(X_test), 
                        name="True function", mode='lines',
                        line=dict(color='green', width=1),
                        showlegend=leg)
    data[i].append(trace2)
    
    trace3 = go.Scatter(x=X, y=y, 
                        name="Samples", mode='markers',
                        marker=dict(color='blue', 
                                    line=dict(color='black', width=1)),
                        showlegend=leg)
    data[i].append(trace3)
    
    titles.append("{} <br>MSE = {:.2e}(+/- {:.2e})".format(title_msg[i],
                    -scores.mean(), scores.std()))

In [9]:
fig = tools.make_subplots(rows=1, cols=3,
                          subplot_titles=tuple(titles[:3]),
                          print_grid=False)

for i in range(0, len(data)):
    for j in range(0, len(data[i])):
        fig.append_trace(data[i][j], 1, i+1)
        
for i in map(str,range(1, 4)):
    y = 'yaxis'+i
    x = 'xaxis'+i
    fig['layout'][y].update(title='y', showgrid=False,
                            showticklabels=False, ticks='')
    fig['layout'][x].update(title='x', showgrid=False,
                            showticklabels=False, ticks='')

### Interactive Example
In the below visualization, there are 3 sub-plots.

#### HIgh Bias
The First plot(left most) is the model with High Bias. The Model is simply a linear regression model. This is same as the polynomial of a first degree. The line is perhaps too simplistic for the data points in question. Since the model is too rigid, this has high bias.
The blue line represents the linear regression which means high bias.

#### High Variance
The last plot (right most) is the model with High Variance. The Model is a polynomial regression model. This is assuming a degree of 15. The resulting model function follows the training data points in the closest possible way. But this does not mean the model is the best for this dataset. The reason for that is while this fits the training data well, it is too complex to predict well for unknown data points, meaning that this model is too complex and weighs in the noise data as well. Hence this has high variance.
The complex blue curve represents the polynomial regression of degree 15 which has high variance.

#### Trade-Off
The plot in the center is the model with a reasonable Bias-Variance trade-off. The Model is a polynomial regression model. But this time we have modeled it with a degree of 4 to reasonably follow the training data points in the closest possible way without compromising its efficiency to predict the unknown values. 
The blue curve represents the polynomial regression of degree 4 (trade-off).





In [15]:
from IPython import display
from IPython.display import IFrame
url = '//plot.ly/~ManiRangaswamy/1.embed'
IFrame(url, width=900, height=800)


In [10]:
iplot(fig)