# Jupyter Notebook for Ridge
Auto-generated notebook.

Due to the few points in each dimension and the straight line that linear regression uses to follow these points as well as it can, noise on the observations will cause great variance as shown in the first plot. Every line’s slope can vary quite a bit for each prediction due to the noise induced in the observations.

Ridge regression is basically minimizing a penalised version of the least-squared function. The penalising shrinks the value of the regression coefficients. Despite the few data points in each dimension, the slope of the prediction is much more stable and the variance in the line itself is greatly reduced, in comparison to that of the standard linear regression

https://www.analyticsvidhya.com/blog/2016/01/complete-tutorial-ridge-lasso-regression-python/

In [22]:
from sklearn import linear_model

import numpy as np
import pandas as pd
import os
import sys
module_path = os.path.abspath(os.path.join('..'))
if module_path not in sys.path:
   sys.path.append(module_path)

from erudition.learning.helpers.plots.plotly_render import render, scatter

np.random.seed(42) 



In [2]:
import plotly.graph_objects as go

x = np.array([i*np.pi/180 for i in range(60,300,4)])
y = np.sin(x) + np.random.normal(0,0.15,len(x))

df = pd.DataFrame(np.column_stack([x,y]),columns=['x','y'])

fig = go.Figure(
    data=[
        go.Scatter(
            x=df.x, 
            y=df.y, 
            mode='markers',
            marker=dict(
                size=3,
                color='yellow'
            )),
        ]
)
render(fig)
fig.show()

This resembles a sine curve but not exactly because of the noise. We’ll use this as an example to test different scenarios in this article. Let’s try to estimate the sine function using polynomial regression with powers of x from 1 to 15. Let’s add a column for each power upto 15 in our dataframe. This can be accomplished using the following code:

In [11]:
for i in range(2,20):
    colname = 'x_%d'%i
    df[colname] = df['x']**i

Now that we have the 100 powers let's make 100 different linear models

In [18]:
def linear_regression(df, power):

    linreg = linear_model.LinearRegression()

    predictors = ['x']

    if power >=2:
        predictors.extend(['x_%d'%i for i in range(2,power+1)])

    linreg.fit(df[predictors], df['y'])

    plot = go.Scatter(
        x=df.x, 
        y=linreg.predict(df[predictors]), 
        mode ='lines',
        opacity = 0.5,
        name = 'x_%d'%power,
        marker=dict(
            size=3,
            )
        )


    #Return the result in pre-defined format
    rss = sum((linreg.predict(df[predictors])-df.y)**2)
    ret = [rss]
    ret.extend([linreg.intercept_])
    ret.extend(linreg.coef_)

    return plot, ret

data = []

data.append(
    go.Scatter(
        x=df.x, 
        y=df.y, 
        mode ='markers',
        name = 'Data',
        marker=dict(
            size=3,
            )
        )
)

#Initialize a dataframe to store the results:
col = ['rss','intercept'] + ['coef_x_%d'%i for i in range(1,20)]
ind = ['model_pow_%d'%i for i in range(1,20)]
coef_matrix_simple = pd.DataFrame(index=ind, columns=col)

for i in range(2,20):
    plot, res = linear_regression(df, i)
    data.append(plot)
    coef_matrix_simple.iloc[i-1,0:i+2] = res



fig = go.Figure(data=data)

render(fig, 1000,500)
fig.show()

    

In [20]:
coef_matrix_simple

Unnamed: 0,rss,intercept,coef_x_1,coef_x_2,coef_x_3,coef_x_4,coef_x_5,coef_x_6,coef_x_7,coef_x_8,...,coef_x_10,coef_x_11,coef_x_12,coef_x_13,coef_x_14,coef_x_15,coef_x_16,coef_x_17,coef_x_18,coef_x_19
model_pow_1,,,,,,,,,,,...,,,,,,,,,,
model_pow_2,2.72871,2.09032,-0.742934,0.0188043,,,,,,,...,,,,,,,,,,
model_pow_3,0.987025,-0.590594,2.49244,-1.12667,0.122905,,,,,,...,,,,,,,,,,
model_pow_4,0.963904,0.196718,1.19347,-0.398498,-0.0441913,0.0134465,,,,,...,,,,,,,,,,
model_pow_5,0.963896,0.234744,1.11403,-0.337133,-0.0662893,0.0171918,-0.000241112,,,,...,,,,,,,,,,
model_pow_6,0.928821,-6.1919,17.349,-16.354,7.87087,-2.07911,0.281264,-0.0151021,,,...,,,,,,,,,,
model_pow_7,0.911529,-17.8833,51.9778,-57.9649,34.2424,-11.6368,2.27071,-0.236262,0.0101697,,...,,,,,,,,,,
model_pow_8,0.90884,-5.89386,11.2552,-0.257822,-10.4021,9.04177,-3.61937,0.774528,-0.0856789,0.00385655,...,,,,,,,,,,
model_pow_9,0.90833,7.73199,-40.9344,84.9481,-88.3244,53.1136,-19.642,4.52794,-0.633346,0.0491304,...,,,,,,,,,,
model_pow_10,0.885511,246.662,-1059.46,1967.25,-2076.02,1383.1,-609.807,180.736,-35.6561,4.49254,...,0.0104793,,,,,,,,,
