```plaintext
Understanding the impact og model complexity on the magnitude of coefficient. 
taking the example of a sine curve (between 60° and 300°) and adding some random noise. 


In [1]:
import pandas as pd
import numpy as np 
import random

import plotly.express as px
import plotly.graph_objects as go
from sklearn.linear_model import LinearRegression
import plotly.graph_objects as go
from plotly.subplots import make_subplots
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import Ridge, Lasso

```plaintext
[i * np.pi/180 for i in range(60, 300, 4)]: A list comprehension that applies the conversion to each value in the range.

range(60, 300, 4) - geneates a sequence of numbers (60 - 296) step size 4
i * np.pi/180 for each number i multiplies it by np.pi/180 to convert degree to radians. 

radians = degree × π/180

In [2]:
x = np.array([i * np.pi/180 for i in range(60, 300, 4)])    # defining angles from 60 to 300 converting them to radians
np.random.seed(10)                      
                    
y_proper = np.sin(x)
y_noise = np.sin(x) + np.random.normal(0, 0.15, len(x))     

data = pd.DataFrame({'x' : x, 'y_noise' : y_noise, 'y_proper' : y_proper})
fig = go.Figure()

fig.add_traces(go.Scatter(x = data['x'], y = data['y_proper'], mode='markers', name='Sine Curve'))
fig.add_traces(go.Scatter(x = data['x'], y = data['y_noise'], mode='markers', name='Sine Curve with noise'))

fig.update_layout(
    title = 'Proper Sine Curver and Curve with Noise',
    xaxis_title='x (radians)',
    yaxis_title='y',
    legend=dict(title='Legend'),
    template='plotly_white'
)

fig.show()

In [3]:
for i in range(2,16):
    colname = 'x_%d'%i
    data[colname] = data['x']**i
data.head()

Unnamed: 0,x,y_noise,y_proper,x_2,x_3,x_4,x_5,x_6,x_7,x_8,x_9,x_10,x_11,x_12,x_13,x_14,x_15
0,1.047198,1.065763,0.866025,1.096623,1.148381,1.202581,1.25934,1.318778,1.381021,1.446202,1.514459,1.585938,1.66079,1.739176,1.82126,1.907219,1.997235
1,1.117011,1.006086,0.898794,1.247713,1.393709,1.556788,1.738948,1.942424,2.169709,2.423588,2.707173,3.023942,3.377775,3.773011,4.214494,4.707635,5.258479
2,1.186824,0.695374,0.927184,1.408551,1.671702,1.984016,2.354677,2.794587,3.316683,3.936319,4.671717,5.544505,6.580351,7.809718,9.26876,11.000386,13.055521
3,1.256637,0.949799,0.951057,1.579137,1.984402,2.493673,3.133642,3.93785,4.948448,6.218404,7.814277,9.81971,12.339811,15.506664,19.486248,24.487142,30.77145
4,1.32645,1.063496,0.970296,1.75947,2.33385,3.095735,4.106339,5.446854,7.224981,9.583578,12.712139,16.86202,22.36663,29.668222,39.35342,52.200353,69.24117


### why we see nonlinear curves

we will obtain nonlinear curves in the plot. the reason for it is explained below.

### the core issue - polynomial features in linear regression

the basic line of y = mx + c will plot a straight line. but in the code, we are adding polynomial features like x^2, x^3, ..., x^15 to the model. this turns it into polynomial regression.

polynomial curves can wiggle and bend to fit the data. the equation becomes:

y = b0 + b1x + b2x^2 + b3x^3 + ... + bnx^n

where n is the power. even though the regression is linear in coefficients (b0, b1, b2, ...), the relationship between x and y is nonlinear because of the polynomial terms.

### higher-order polynomials

as power increases, the model gains more flexibility to fit the data.

### examples to imagine:

- power = 1:  
  y = 0.5x + 0.1  
  this is a straight line. it’s a poor fit to sin(x).

- power = 3:  
  y = 0.1 + 0.8x - 0.2x^2 + 0.05x^3  
  this is a cubic curve. it starts to resemble a sine wave.

- power = 15:  
  a high-degree polynomial with many terms. it wiggles to match sin(x) + noise.

when you plot these against x, only the power = 1 case is a straight line. the rest are curves because they’re polynomials of degree greater than 1.

In [4]:
def linear_regression(data, power):
    predictor = ['x']                
    
    if power >= 2:
        predictor.extend('x_%d'% i for i in range(2, power + 1))    # the values of predictor will get added based on the power no change in data frame. 
        
    scaler = StandardScaler()                              
    x_train = scaler.fit_transform(data[predictor])     # training column 
    y_train = data['y_noise']                           # predicting column 
    
    model = LinearRegression()                          
    model.fit(x_train, y_train)
    
    y_pred = model.predict(x_train)                     # doing prediction on the x_train data
    
    rss = sum((y_pred - y_train)**2)    # residual sum of error
    ret = [rss]                         # converting it back to a list
    
    ret.extend([model.intercept_])      
    ret.extend([model.coef_])
    
    return y_pred, ret

models_to_plot = [1,3,6,9,12,15]

fig = make_subplots(rows=3, cols=2, subplot_titles=[f'Power: {p}' for p in models_to_plot])

row, col = 1,1
for i, power in enumerate(models_to_plot, start=1):
    y_pred, _ = linear_regression(data, power)
    
    fig.add_trace(go.Scatter(x=data['x'], y=y_pred,          mode='lines',   name=f'Predicted (Power {power})', showlegend=False), row=row, col=col)
    fig.add_trace(go.Scatter(x=data['x'], y=data['y_noise'], mode='markers', name=f'Actual (Power {power})', showlegend=False),    row=row, col=col)
    
    col += 1
    if col > 2:  # Move to the next row after 2 columns
        col = 1
        row += 1

fig.update_layout(
    height      =    900,  
    width       =    1500,   
    title       =    'Polynomial Regression Models',
    template    =    'plotly_white'
)  

fig.show()

In [5]:
# Initialize a dataframe to store the results:
col = ['rss', 'intercept'] + ['coef_x_%d' % i for i in range(1, 16)]
ind = ['model_pow_%d' % i for i in range(1, 16)]
coef_matrix_simple = pd.DataFrame(index=ind, columns=col)

for i in range(1,16):
    preds, rss = linear_regression(data, power=i)
    insert_list = []
    insert_list = [rss[0], rss[1], *rss[2].tolist()]
    coef_matrix_simple.iloc[i-1, :len(insert_list)] = insert_list  
    
coef_matrix_simple.head(15)

Unnamed: 0,rss,intercept,coef_x_1,coef_x_2,coef_x_3,coef_x_4,coef_x_5,coef_x_6,coef_x_7,coef_x_8,coef_x_9,coef_x_10,coef_x_11,coef_x_12,coef_x_13,coef_x_14,coef_x_15
model_pow_1,3.280316,0.038032,-0.749085,,,,,,,,,,,,,,
model_pow_2,3.276677,0.038032,-0.704322,-0.045435,,,,,,,,,,,,,
model_pow_3,1.103352,0.038032,3.665259,-9.802146,5.50628,,,,,,,,,,,,
model_pow_4,1.078576,0.038032,2.039542,-4.054606,-1.430933,2.813007,,,,,,,,,,,
model_pow_5,1.018574,0.038032,-6.187537,36.025149,-77.347587,67.648438,-20.7948,,,,,,,,,,
model_pow_6,0.990074,0.038032,11.505926,-74.063131,209.599315,-314.236679,234.696654,-68.150543,,,,,,,,,
model_pow_7,0.928265,0.038032,-67.648136,525.790581,-1790.022779,3337.57283,-3552.3136,2025.054366,-479.106216,,,,,,,,
model_pow_8,0.91761,0.038032,-165.652786,1401.667375,-5354.169422,11656.187251,-15357.179105,12097.679865,-5233.371142,954.165835,,,,,,,
model_pow_9,0.874941,0.038032,-742.966092,7345.942803,-33947.478809,93146.729114,-162957.723499,184016.396946,-130094.965971,52439.96296,-9206.539208,,,,,,
model_pow_10,0.874578,0.038032,-587.740733,5536.760041,-23898.602115,59265.584821,-88055.674864,72820.882645,-20086.177409,-17178.033947,16296.127264,-4113.780783,,,,,


```plaintext
seeing the above table we can see that with increase in complexity of the model the rss value decreases. the model tend to fit even smaller deviation in the training data set. LEADING THE OVERFITTING

In [187]:
#Set the display format to be scientific for ease of analysis
# pd.options.display.float_format = '{:,.2g}'.format
# pd.reset_option('display.float_format')
# coef_matrix_simple

It is clearly evident that the size of the coefficients increases exponentially with an increase in model complexity.

From this, we can understand why it is necessary to put a **constraint** on the magnitude of the coefficients to reduce model complexity.

### What does a large coefficient signify?
- A large coefficient means that we are putting a lot of emphasis on that particular feature.
- This implies that the feature is a good predictor for the outcome.

However, when the coefficients become too large, the model might:
- Overcomplicate things by fitting the training data too closely.
- Lose its ability to generalize well to new, unseen data.

---

#### Function for ridge regression

In [6]:
def ridge_regression(data, alpha, predictors):
    scaler = StandardScaler()
    x_train = scaler.fit_transform(data[predictors])
    y_train = data['y_noise']
    
    ridge = Ridge(alpha=alpha)
    ridge.fit(x_train, y_train)
    
    y_pred = ridge.predict(x_train)
    
    rss = sum((y_pred - y_train)**2)
    ret = [rss]
    ret.extend([ridge.intercept_])
    ret.extend(ridge.coef_)
    
    return y_pred, ret

predictors = ['x']
predictors.extend(['x_%d'%i for i in range(2,16)])

alpha_ridge = [1e-15, 1e-10, 1e-8, 1e-4, 1e-3,1e-2, 1, 5, 10, 20]
fig = make_subplots(rows=5, cols=2, subplot_titles=[f'alpha: {a}' for a in alpha_ridge])
row, col = 1, 1

for i, alpha in enumerate(alpha_ridge):  
    y_pred, _ = ridge_regression(data, alpha, predictors)
    
    fig.add_trace(go.Scatter(x=data['x'], y=y_pred,          mode='lines',   name=f'Predicted (Alpha {alpha})', showlegend=False), row=row, col=col)
    fig.add_trace(go.Scatter(x=data['x'], y=data['y_noise'], mode='markers', name=f'Actual (Alpha {alpha})',    showlegend=False), row=row, col=col)
    
    col += 1
    if col > 2:  # Move to the next row after 2 columns
        col = 1
        row += 1

fig.update_layout(
    height      =    1500,  
    width       =    1500,   
    title       =    'Polynomial Regression Models',
    template    =    'plotly_white'
)  

fig.show()

In [7]:
predictors = ['x']
predictors.extend(['x_%d'%i for i in range(2,16)])

col = ['rss', 'intercept'] + ['coef_x_%d' % i for i in range(1, 16)]
alpha_ridge = [1e-15, 1e-10, 1e-8, 1e-4, 1e-3,1e-2, 1, 5, 10, 20]
ind = ['alpha_%.2g' % i for i in alpha_ridge]

coef_matrix_ridge = pd.DataFrame(index=ind, columns=col)

for i in range(len(alpha_ridge)):
    preds, rss = ridge_regression(data, alpha_ridge[i], predictors)
    
    # Extract the relevant values from the rss list
    coef_matrix_ridge.iloc[i, 0] = rss[0]  # Assign RSS
    coef_matrix_ridge.iloc[i, 1] = rss[1]  # Assign intercept
    coef_matrix_ridge.iloc[i, 2:2 + len(rss[2:])] = rss[2:]  # Assign coefficients

In [8]:
coef_matrix_ridge.head(20)

Unnamed: 0,rss,intercept,coef_x_1,coef_x_2,coef_x_3,coef_x_4,coef_x_5,coef_x_6,coef_x_7,coef_x_8,coef_x_9,coef_x_10,coef_x_11,coef_x_12,coef_x_13,coef_x_14,coef_x_15
alpha_1e-15,0.869617,0.038032,477.958096,-7316.830716,48462.232069,-177682.145357,375728.376133,-403774.428409,32340.799707,387829.759394,-210431.527974,-324548.998441,309707.776996,234238.920185,-479766.384855,267844.562642,-53110.821794
alpha_1e-10,0.887998,0.038032,-150.895504,1121.935809,-3390.824819,4548.528619,-1037.405968,-3087.115419,620.809027,2561.79686,559.219255,-1771.609342,-1358.236832,819.332776,1375.095404,-838.501239,27.187746
alpha_1e-08,0.929883,0.038032,-25.50625,170.686246,-431.210853,395.169423,132.898017,-302.812704,-195.325661,153.37786,267.7831,62.119811,-201.435452,-224.043012,50.261938,302.429095,-155.060277
alpha_0.0001,0.953741,0.038032,1.247141,-2.165186,-1.475943,0.465587,1.106084,0.515348,-0.34863,-0.786577,-0.594612,0.070117,0.867111,1.395835,1.263714,0.118885,-2.335929
alpha_0.001,0.958268,0.038032,0.897594,-1.474605,-1.071473,-0.23712,0.187358,0.245465,0.18315,0.164649,0.232861,0.349233,0.438375,0.417756,0.213187,-0.234881,-0.970055
alpha_0.01,0.963726,0.038032,0.604839,-0.89069,-1.001363,-0.602379,-0.156591,0.178452,0.382048,0.474821,0.481383,0.4203,0.303634,0.138796,-0.069644,-0.318664,-0.605987
alpha_1,1.567195,0.038032,-0.182977,-0.34891,-0.333408,-0.23317,-0.11442,-0.010541,0.06589,0.112916,0.133203,0.130996,0.110711,0.076357,0.031351,-0.021496,-0.07992
alpha_5,1.902341,0.038032,-0.272239,-0.263824,-0.214899,-0.152285,-0.092146,-0.041566,-0.002442,0.025754,0.044468,0.055347,0.059942,0.059595,0.055421,0.04833,0.03905
alpha_10,2.161507,0.038032,-0.262556,-0.236765,-0.18975,-0.137731,-0.089705,-0.049218,-0.016902,0.007886,0.026222,0.039228,0.047929,0.053195,0.055747,0.056173,0.054942
alpha_20,2.799428,0.038032,-0.235567,-0.206561,-0.16577,-0.123549,-0.085205,-0.0526,-0.025872,-0.004455,0.012431,0.025556,0.03561,0.043175,0.048732,0.052674,0.055317


```plaintext
do an observation on any coeff on alpha_1e-15, with alpha_20, we can see that the coeff are being brought down. 

---

#### Function for ridge regression

In [9]:
def lasso_regression(data, predictors, alpha):
    scaler = StandardScaler()
    
    x_train = scaler.fit_transform(data[predictors])
    y_train = data['y_noise']
    
    lassoreg = Lasso(alpha=alpha, max_iter=int(1e5))
    
    lassoreg.fit(x_train, y_train)
    y_pred = lassoreg.predict(x_train)
    
    rss = sum((y_pred - y_train)**2)
    ret = [rss]
    ret.extend([lassoreg.intercept_])
    ret.extend(lassoreg.coef_)
    
    return y_pred, ret
    
predictors = ['x']
predictors.extend(['x_%d'%i for i in range(2,16)])

alpha_lasso = [1e-15, 1e-10, 1e-8, 1e-4, 1e-3, 1e-2, 1, 5, 10, 20]

fig = make_subplots(rows=5, cols=2, subplot_titles=[f'alpha: {a}' for a in alpha_ridge])
row, col = 1, 1

for i, alpha in enumerate(alpha_ridge):  
    y_pred, _ = lasso_regression(data, predictors, alpha)
    
    fig.add_trace(go.Scatter(x=data['x'], y=y_pred, mode='lines', name=f'Predicted (Alpha {alpha})', showlegend=False), row=row, col=col)
    fig.add_trace(go.Scatter(x=data['x'], y=data['y_noise'], mode='markers', name=f'Actual (Alpha {alpha})', showlegend=False), row=row, col=col)

    col += 1
    if col > 2:  # Move to the next row after 2 columns
        col = 1
        row += 1
        
fig.update_layout(
    height      =    1500,  
    width       =    1500,   
    title       =    'Polynomial Regression Models',
    template    =    'plotly_white'
)  

fig.show()


Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 4.784e-01, tolerance: 3.695e-03


Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 4.783e-01, tolerance: 3.695e-03


Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 4.774e-01, tolerance: 3.695e-03



We can see some features coeffient beeing complete brought down to 0. 

In [10]:
col = ['rss', 'intercept'] + ['coef_x_%d' % i for i in range(1, 16)]
ind = ['alpha_%.2g' % i for i in alpha_lasso]

coef_matrix_lasso = pd.DataFrame(index=ind, columns=col)

for i, alpha in enumerate(alpha_lasso):
    preds, rss = lasso_regression(data, predictors, alpha)
    
    coef_matrix_lasso.iloc[i, 0] = rss[0]  # Assign RSS
    coef_matrix_lasso.iloc[i, 1] = rss[1]  # Assign intercept
    coef_matrix_lasso.iloc[i, 2:2 + len(rss[2:])] = rss[2:]  # Assign coefficients


Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 4.784e-01, tolerance: 3.695e-03


Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 4.783e-01, tolerance: 3.695e-03


Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 4.774e-01, tolerance: 3.695e-03



In [13]:
coef_matrix_ridge

Unnamed: 0,rss,intercept,coef_x_1,coef_x_2,coef_x_3,coef_x_4,coef_x_5,coef_x_6,coef_x_7,coef_x_8,coef_x_9,coef_x_10,coef_x_11,coef_x_12,coef_x_13,coef_x_14,coef_x_15
alpha_1e-15,0.869617,0.038032,477.958096,-7316.830716,48462.232069,-177682.145357,375728.376133,-403774.428409,32340.799707,387829.759394,-210431.527974,-324548.998441,309707.776996,234238.920185,-479766.384855,267844.562642,-53110.821794
alpha_1e-10,0.887998,0.038032,-150.895504,1121.935809,-3390.824819,4548.528619,-1037.405968,-3087.115419,620.809027,2561.79686,559.219255,-1771.609342,-1358.236832,819.332776,1375.095404,-838.501239,27.187746
alpha_1e-08,0.929883,0.038032,-25.50625,170.686246,-431.210853,395.169423,132.898017,-302.812704,-195.325661,153.37786,267.7831,62.119811,-201.435452,-224.043012,50.261938,302.429095,-155.060277
alpha_0.0001,0.953741,0.038032,1.247141,-2.165186,-1.475943,0.465587,1.106084,0.515348,-0.34863,-0.786577,-0.594612,0.070117,0.867111,1.395835,1.263714,0.118885,-2.335929
alpha_0.001,0.958268,0.038032,0.897594,-1.474605,-1.071473,-0.23712,0.187358,0.245465,0.18315,0.164649,0.232861,0.349233,0.438375,0.417756,0.213187,-0.234881,-0.970055
alpha_0.01,0.963726,0.038032,0.604839,-0.89069,-1.001363,-0.602379,-0.156591,0.178452,0.382048,0.474821,0.481383,0.4203,0.303634,0.138796,-0.069644,-0.318664,-0.605987
alpha_1,1.567195,0.038032,-0.182977,-0.34891,-0.333408,-0.23317,-0.11442,-0.010541,0.06589,0.112916,0.133203,0.130996,0.110711,0.076357,0.031351,-0.021496,-0.07992
alpha_5,1.902341,0.038032,-0.272239,-0.263824,-0.214899,-0.152285,-0.092146,-0.041566,-0.002442,0.025754,0.044468,0.055347,0.059942,0.059595,0.055421,0.04833,0.03905
alpha_10,2.161507,0.038032,-0.262556,-0.236765,-0.18975,-0.137731,-0.089705,-0.049218,-0.016902,0.007886,0.026222,0.039228,0.047929,0.053195,0.055747,0.056173,0.054942
alpha_20,2.799428,0.038032,-0.235567,-0.206561,-0.16577,-0.123549,-0.085205,-0.0526,-0.025872,-0.004455,0.012431,0.025556,0.03561,0.043175,0.048732,0.052674,0.055317


```plaintext
we can see that many coeff have become 0. 

```plaintext
counting the number of zeroes in a particular row for an alpha

In [14]:
coef_matrix_lasso.apply(lambda x: sum(x.values==0), axis = 1)

alpha_1e-15      0
alpha_1e-10      0
alpha_1e-08      0
alpha_0.0001     9
alpha_0.001     10
alpha_0.01      12
alpha_1         15
alpha_5         15
alpha_10        15
alpha_20        15
dtype: int32

```plaintext

The phenomenon where most coefficients become zero is called sparsity. Althought lasso performs feature selection. 