In [1]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

from sklearn.linear_model import Lasso , Ridge , LinearRegression
from sklearn.model_selection import GridSearchCV

from sklearn.metrics import mean_squared_error, r2_score , mean_absolute_error

1- Lasso and Ridge regression are both regularization techniques used to prevent overfitting in linear regression models. 

2- They add a penalty to the loss function to shrink the coefficients of the model. However, they differ in how they apply the penalty.

In [2]:

from sklearn.datasets import fetch_california_housing

# Load the California housing dataset

data = fetch_california_housing(as_frame=True)

In [3]:
data

{'data':        MedInc  HouseAge  AveRooms  AveBedrms  Population  AveOccup  Latitude  \
 0      8.3252      41.0  6.984127   1.023810       322.0  2.555556     37.88   
 1      8.3014      21.0  6.238137   0.971880      2401.0  2.109842     37.86   
 2      7.2574      52.0  8.288136   1.073446       496.0  2.802260     37.85   
 3      5.6431      52.0  5.817352   1.073059       558.0  2.547945     37.85   
 4      3.8462      52.0  6.281853   1.081081       565.0  2.181467     37.85   
 ...       ...       ...       ...        ...         ...       ...       ...   
 20635  1.5603      25.0  5.045455   1.133333       845.0  2.560606     39.48   
 20636  2.5568      18.0  6.114035   1.315789       356.0  3.122807     39.49   
 20637  1.7000      17.0  5.205543   1.120092      1007.0  2.325635     39.43   
 20638  1.8672      18.0  5.329513   1.171920       741.0  2.123209     39.43   
 20639  2.3886      16.0  5.254717   1.162264      1387.0  2.616981     39.37   
 
        Longitude 

In [4]:
data.DESCR

'.. _california_housing_dataset:\n\nCalifornia Housing dataset\n--------------------------\n\n**Data Set Characteristics:**\n\n:Number of Instances: 20640\n\n:Number of Attributes: 8 numeric, predictive attributes and the target\n\n:Attribute Information:\n    - MedInc        median income in block group\n    - HouseAge      median house age in block group\n    - AveRooms      average number of rooms per household\n    - AveBedrms     average number of bedrooms per household\n    - Population    block group population\n    - AveOccup      average number of household members\n    - Latitude      block group latitude\n    - Longitude     block group longitude\n\n:Missing Attribute Values: None\n\nThis dataset was obtained from the StatLib repository.\nhttps://www.dcc.fc.up.pt/~ltorgo/Regression/cal_housing.html\n\nThe target variable is the median house value for California districts,\nexpressed in hundreds of thousands of dollars ($100,000).\n\nThis dataset was derived from the 1990 U.S

In [5]:
data.feature_names

['MedInc',
 'HouseAge',
 'AveRooms',
 'AveBedrms',
 'Population',
 'AveOccup',
 'Latitude',
 'Longitude']

In [6]:
x = data.data
y = data.target

print(f'x shape:{x.shape}')

print(f'y shape:{y.shape}')

x shape:(20640, 8)
y shape:(20640,)


In [7]:
data.target_names

['MedHouseVal']

In [8]:
x_train , x_test, y_train , y_test = train_test_split(x,y,test_size=0.2,shuffle=True,random_state=123)

print(f'x trainshape:{x_train.shape}')

print(f'y train shape:{y_train.shape}')

print(f'x test shape:{x_test.shape}')

print(f'y test shape:{y_test.shape}')

x trainshape:(16512, 8)
y train shape:(16512,)
x test shape:(4128, 8)
y test shape:(4128,)


In [9]:
Scaler_standard = StandardScaler()

x_train_scaled = Scaler_standard.fit_transform(x_train)
x_test_scaled  = Scaler_standard.transform(x_test)

# Lasso regression

    Lasso Regression (Least Absolute Shrinkage and Selection Operator)

    Penalty: L1 Norm (sum of the absolute values of the coefficients).

    Objective Function: Minimize the sum of squared errors with an added penalty proportional to the absolute value of the coefficients.


![alt text](<lasso eqn.PNG>)


    λ is the regularization parameter and β j are the model coefficients.

    Effect on Coefficients: Lasso regression can shrink some coefficients to exactly zero, effectively performing feature selection. 

    This makes it useful when you have many features and suspect that some of them are not useful.

    Feature Selection: Yes, Lasso can select a subset of features by shrinking some coefficients to zero.

    Use Case: Useful when you believe only a few features are important, or when you want to perform feature selection.

In [10]:
lasso_model = Lasso()

lasso_model.fit(x_train_scaled, y_train)

In [11]:
y_pred = lasso_model.predict(x_test_scaled)

In [12]:
mean_absolute_error(y_test, y_pred)

0.9136917868018702

In [13]:
mean_squared_error(y_test, y_pred)

1.3298460219228103

In [14]:
r2_score(y_test, y_pred)

-2.3968123705309097e-05

In [None]:
param_grid = {

    'alpha' : [0.1,0.001,0.0001,1,100]
}

lasso_cv = GridSearchCV(lasso_model, param_grid , cv=3 , n_jobs=-1)

lasso_cv.fit(x_train_scaled, y_train)

In [16]:
y_pred_1 = lasso_cv.predict(x_test_scaled)

r2_score(y_test, y_pred_1)

0.6104009951335302

In [17]:
lasso_cv.best_estimator_

In [18]:
lasso_model_best = Lasso(alpha=0.001)
lasso_model_best.fit(x_train_scaled, y_train)

In [19]:
lasso_model_best.coef_

array([ 0.83059373,  0.11611372, -0.26746166,  0.30390965, -0.00603978,
       -0.04069929, -0.87950102, -0.84427344])

In [20]:
lasso_model_best.intercept_

2.0696872953003855

In [21]:
list_of_features = data.feature_names
list_of_features

['MedInc',
 'HouseAge',
 'AveRooms',
 'AveBedrms',
 'Population',
 'AveOccup',
 'Latitude',
 'Longitude']

In [22]:
list_of_coef_features = list(lasso_model_best.coef_)
list_of_coef_features

[0.8305937325863574,
 0.11611371757725981,
 -0.2674616608245845,
 0.3039096477497728,
 -0.00603978173527093,
 -0.040699294799306875,
 -0.8795010247562031,
 -0.8442734410342193]

In [23]:
df = pd.DataFrame({

'Feature_Names' : list_of_features , 
'coef' : list_of_coef_features

})

df

Unnamed: 0,Feature_Names,coef
0,MedInc,0.830594
1,HouseAge,0.116114
2,AveRooms,-0.267462
3,AveBedrms,0.30391
4,Population,-0.00604
5,AveOccup,-0.040699
6,Latitude,-0.879501
7,Longitude,-0.844273


# Ridge regression

    Penalty: L2 Norm (sum of the squared values of the coefficients).

    Objective Function: Minimize the sum of squared errors with an added penalty proportional to the square of the coefficients.

![alt text](<ridge regression.PNG>)

    where λ is the regularization parameter and 𝛽𝑗 are the model coefficients.

    Effect on Coefficients: Ridge regression shrinks the coefficients towards zero but never exactly zero. 

    This means it keeps all features but with reduced importance.

    Feature Selection: No, Ridge keeps all features but shrinks their coefficients.

    Use Case: Useful when you believe most features are important but you want to regularize their impact to avoid overfitting.

In [24]:
ridge_model = Ridge()

ridge_model.fit(x_train_scaled, y_train)

In [25]:
y_pred_ridge = ridge_model.predict(x_test_scaled)

In [26]:
mean_absolute_error(y_test,y_pred_ridge)

0.5255418304839452

In [27]:
mean_squared_error(y_test,y_pred_ridge)

0.5180292670999758

In [28]:
r2_score(y_test,y_pred_ridge)

0.6104498755874748

In [29]:
param_grid = {

    'alpha' : [0.1,0.001,0.0001,0.00001,1,100]
}

ridge_cv = GridSearchCV(ridge_model, param_grid , cv=3 , n_jobs=-1)

ridge_cv.fit(x_train_scaled, y_train)

In [30]:
y_pred_ridge_1 = ridge_cv.predict(x_test_scaled)

r2_score(y_test, y_pred_ridge_1)

0.6094607599231214

In [31]:
ridge_cv.best_estimator_

In [32]:
ridge_cv.best_estimator_.coef_ 

array([ 0.83069427,  0.12245118, -0.26252515,  0.29479362, -0.00476412,
       -0.04165376, -0.82174222, -0.78629717])

In [33]:
ridge_cv.best_estimator_.intercept_

2.069687295300386

In [34]:
list_of_coef_features_ridge = list(ridge_cv.best_estimator_.coef_ )
list_of_coef_features_ridge

[0.8306942739762467,
 0.1224511824653377,
 -0.26252514610715827,
 0.2947936153726388,
 -0.00476412130043625,
 -0.04165375912893464,
 -0.8217422228176069,
 -0.7862971717232279]

In [35]:
df_ridge = pd.DataFrame({

'Feature_Names' : list_of_features , 
'coef' : list_of_coef_features_ridge

})

df_ridge

Unnamed: 0,Feature_Names,coef
0,MedInc,0.830694
1,HouseAge,0.122451
2,AveRooms,-0.262525
3,AveBedrms,0.294794
4,Population,-0.004764
5,AveOccup,-0.041654
6,Latitude,-0.821742
7,Longitude,-0.786297


In [36]:
df

Unnamed: 0,Feature_Names,coef
0,MedInc,0.830594
1,HouseAge,0.116114
2,AveRooms,-0.267462
3,AveBedrms,0.30391
4,Population,-0.00604
5,AveOccup,-0.040699
6,Latitude,-0.879501
7,Longitude,-0.844273


# Key Differences


1. Penalty Type:


    Ridge (L2): Adds the squared magnitude of coefficients as penalty.

    Lasso (L1): Adds the absolute magnitude of coefficients as penalty.
    
    Lasso: L1 norm penalty, which can result in some coefficients being exactly zero.

    Ridge: L2 norm penalty, which shrinks coefficients but does not set them to zero.

2. Feature Selection:

    Lasso: Can select features by setting some coefficients to zero.

    Ridge: Does not perform feature selection; all coefficients are shrunk but none are eliminated.

3. Effect on Model:

    Lasso: Tends to produce sparse models (fewer features with non-zero coefficients).

    Ridge: Tends to produce models where all features are included but with smaller coefficients.

4. Computational Complexity:

    Lasso: Can be computationally more expensive due to the nature of the L1 penalty.

    Ridge: Generally computationally simpler because the L2 penalty is differentiable and leads to a closed-form solution.

5. Choosing Between Lasso and Ridge:

    If you have many features and expect only a few to be relevant:  Lasso is typically preferred as it performs feature selection.

    If you believe that all features contribute to the target variable and do not want to discard any: Ridge is usually the better choice as it retains all features with reduced effect.

# Fisrt Task in Machine learning about linear regression

1- you have two files train.csv and test.csv

2- you have two read two files and do data preprocessing as you see 

3- apply colleration between features and output , the biggest feaure having correlation with output

4- you do simple regression between this feature and output

5- apply multiple linear regression

6- apply lasso and ridge regression

7- report results and comparsion between different models and which one give you best result

8- evaluation metrics : MSE , RMSE , R2 score , MAE

9- last one : advanced --> deploy model with best results using streamlit library to take input from user and apply result on website

10 - notice : input of model may be one feature or more than one feature based on waht ? based on best model give you best r2_score
