<a href="https://colab.research.google.com/github/alemnew97de/Census-income-DATA-EDA/blob/main/_Python_Code_Example_Of_the_Regularization.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Python Code Example Of the Regularization
Below is an example of implementing L1 regularization using Python and scikit-learn on the California housing dataset.


> this example, we load the California Housing dataset using the fetch_california_housing function from scikit-learn. We then split the data into training and testing sets using train_test_split. We initialize a Lasso regression model with an alpha value of 0.1, which controls the strength of the L1 regularization. We fit the model to the training data using the fit method, and evaluate the model on the testing data using predict and mean_squared_error.

 Conclusion
Regularization is a fundamental technique in machine learning to prevent overfitting of the model on the training data and to improve the generalization performance of the model on unseen data. It works by adding a penalty term to the loss function of the model, which encourages the model to choose simpler solutions that generalize better to new data.



In [2]:
#import required modules
from sklearn.linear_model import Lasso
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error


In [3]:
#load the california housing dataset
X,y=fetch_california_housing(return_X_y=True)
X,y

(array([[   8.3252    ,   41.        ,    6.98412698, ...,    2.55555556,
           37.88      , -122.23      ],
        [   8.3014    ,   21.        ,    6.23813708, ...,    2.10984183,
           37.86      , -122.22      ],
        [   7.2574    ,   52.        ,    8.28813559, ...,    2.80225989,
           37.85      , -122.24      ],
        ...,
        [   1.7       ,   17.        ,    5.20554273, ...,    2.3256351 ,
           39.43      , -121.22      ],
        [   1.8672    ,   18.        ,    5.32951289, ...,    2.12320917,
           39.43      , -121.32      ],
        [   2.3886    ,   16.        ,    5.25471698, ...,    2.61698113,
           39.37      , -121.24      ]]),
 array([4.526, 3.585, 3.521, ..., 0.923, 0.847, 0.894]))

In [5]:
#split the data into training and testing sets
X_train, X_test, y_train, y_test=train_test_split(X, y, test_size=0.2, random_state=42)
X_train, X_test, y_train, y_test

(array([[   3.2596    ,   33.        ,    5.0176565 , ...,    3.6918138 ,
           32.71      , -117.03      ],
        [   3.8125    ,   49.        ,    4.47354497, ...,    1.73809524,
           33.77      , -118.16      ],
        [   4.1563    ,    4.        ,    5.64583333, ...,    2.72321429,
           34.66      , -120.48      ],
        ...,
        [   2.9344    ,   36.        ,    3.98671727, ...,    3.33206831,
           34.03      , -118.38      ],
        [   5.7192    ,   15.        ,    6.39534884, ...,    3.17889088,
           37.58      , -121.96      ],
        [   2.5755    ,   52.        ,    3.40257649, ...,    2.10869565,
           37.77      , -122.42      ]]),
 array([[   1.6812    ,   25.        ,    4.19220056, ...,    3.87743733,
           36.06      , -119.01      ],
        [   2.5313    ,   30.        ,    5.03938356, ...,    2.67979452,
           35.14      , -119.46      ],
        [   3.4801    ,   52.        ,    3.97715472, ...,    1.36033229,

In [6]:
#initialize the lasso regression model with L1 regularization
model=Lasso(alpha=0.1)
model

In [9]:
#fit the model to the traing data
model.fit(X_train, y_train)

In [12]:
#evaluate the model on the testing data
y_pred=model.predict(X_test)
y_pred

array([1.04628114, 1.61196314, 2.30822511, ..., 4.17618895, 1.64031173,
       1.81210646])

In [13]:
mse=mean_squared_error(y_test,y_pred)
mse

0.6135115198058131