# Jupyter Notebook for Ridge
Auto-generated notebook.

Due to the few points in each dimension and the straight line that linear regression uses to follow these points as well as it can, noise on the observations will cause great variance as shown in the first plot. Every line’s slope can vary quite a bit for each prediction due to the noise induced in the observations.

Ridge regression is basically minimizing a penalised version of the least-squared function. The penalising shrinks the value of the regression coefficients. Despite the few data points in each dimension, the slope of the prediction is much more stable and the variance in the line itself is greatly reduced, in comparison to that of the standard linear regression

https://www.analyticsvidhya.com/blog/2016/01/complete-tutorial-ridge-lasso-regression-python/

$$\min_{w} || X w - y||_2^2 + \alpha ||w||_2^2$$

In [1]:
from sklearn import linear_model

import numpy as np
import pandas as pd
import plotly
import os
import sys
module_path = os.path.abspath(os.path.join('..'))
if module_path not in sys.path:
   sys.path.append(module_path)

from erudition.learning.helpers.plots.plotly_render import render, scatter
from plotly import graph_objs as go

np.random.seed(42) 

In [2]:
x = np.array([i*np.pi/180 for i in range(60,300,4)])
y = np.sin(x) + np.random.normal(0,0.15,len(x))

df = pd.DataFrame(np.column_stack([x,y]),columns=['x','y'])

fig = go.Figure(
    data=[scatter(df.x, df.y, 'Raw', mode='markers', color='yellow')]
)

render(fig)
fig.show()

This resembles a sine curve but not exactly because of the noise. We’ll use this as an example to test different scenarios in this article. Let’s try to estimate the sine function using polynomial regression with powers of x from 1 to 15. Let’s add a column for each power upto 15 in our dataframe. This can be accomplished using the following code:

In [4]:
for i in range(2,20):
    colname = 'x_%d'%i
    df[colname] = df['x']**i

In [10]:
linreg = linear_model.LinearRegression()

def linear_regression(df, power):

    predictors = ['x']

    if power >=2:
        predictors.extend(['x_%d'%i for i in range(2,power+1)])

    linreg.fit(df[predictors], df['y'])
    return scatter(df.x, linreg.predict(df[predictors]), 'x_%d'%power)

data = []
data.append(scatter(df.x, df.y, 'Data', mode='markers', color='yellow'))

for i in range(2,15):
    data.append(linear_regression(df, i))

fig = go.Figure(data=data)
render(fig)
fig.show()   