### **Regularization**

* Regularization is a technique used to prevent overfitting by penalizing large coefficients in machine learning, it adds a regularization term to loaa function which modifies complex models.
* There are two types of regularizations and they are : 

1. L1 Regularization (Lasso).
2. L2 Regularization (Ridge).

### **Lasso Regularization**

* L1 Regularization, also known as Lasso (Least Absolute Shrinkage and Selection Operator) Regularization, is a technique used in machine learning to prevent overfitting by adding a penalty equal to the absolute value of the model's coefficients to the loss function.
* A penalty function in machine learning is an additional term added to the loss function to discourage complex models by imposing constraints on the model's parameters (weights). The goal of using a penalty function is to prevent overfitting, encourage simpler models, and improve generalization to unseen data.

**Step 1 : Import Necessary Libraries**

In [44]:
import pandas as pd 
import numpy as np 
from sklearn.preprocessing import LabelEncoder
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.linear_model import Lasso

**Step 2 : Load the Dataset**

In [45]:
df = pd.read_csv("E:\\Machine Learning\\Datasets\\House_Price_train.csv")
df.head()

Unnamed: 0,Id,MSSubClass,MSZoning,LotFrontage,LotArea,Street,Alley,LotShape,LandContour,Utilities,...,PoolArea,PoolQC,Fence,MiscFeature,MiscVal,MoSold,YrSold,SaleType,SaleCondition,SalePrice
0,1,60,RL,65.0,8450,Pave,,Reg,Lvl,AllPub,...,0,,,,0,2,2008,WD,Normal,208500
1,2,20,RL,80.0,9600,Pave,,Reg,Lvl,AllPub,...,0,,,,0,5,2007,WD,Normal,181500
2,3,60,RL,68.0,11250,Pave,,IR1,Lvl,AllPub,...,0,,,,0,9,2008,WD,Normal,223500
3,4,70,RL,60.0,9550,Pave,,IR1,Lvl,AllPub,...,0,,,,0,2,2006,WD,Abnorml,140000
4,5,60,RL,84.0,14260,Pave,,IR1,Lvl,AllPub,...,0,,,,0,12,2008,WD,Normal,250000


In [46]:
df.shape

(1460, 81)

In [47]:
df.columns

Index(['Id', 'MSSubClass', 'MSZoning', 'LotFrontage', 'LotArea', 'Street',
       'Alley', 'LotShape', 'LandContour', 'Utilities', 'LotConfig',
       'LandSlope', 'Neighborhood', 'Condition1', 'Condition2', 'BldgType',
       'HouseStyle', 'OverallQual', 'OverallCond', 'YearBuilt', 'YearRemodAdd',
       'RoofStyle', 'RoofMatl', 'Exterior1st', 'Exterior2nd', 'MasVnrType',
       'MasVnrArea', 'ExterQual', 'ExterCond', 'Foundation', 'BsmtQual',
       'BsmtCond', 'BsmtExposure', 'BsmtFinType1', 'BsmtFinSF1',
       'BsmtFinType2', 'BsmtFinSF2', 'BsmtUnfSF', 'TotalBsmtSF', 'Heating',
       'HeatingQC', 'CentralAir', 'Electrical', '1stFlrSF', '2ndFlrSF',
       'LowQualFinSF', 'GrLivArea', 'BsmtFullBath', 'BsmtHalfBath', 'FullBath',
       'HalfBath', 'BedroomAbvGr', 'KitchenAbvGr', 'KitchenQual',
       'TotRmsAbvGrd', 'Functional', 'Fireplaces', 'FireplaceQu', 'GarageType',
       'GarageYrBlt', 'GarageFinish', 'GarageCars', 'GarageArea', 'GarageQual',
       'GarageCond', 'PavedDrive

In [48]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1460 entries, 0 to 1459
Data columns (total 81 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   Id             1460 non-null   int64  
 1   MSSubClass     1460 non-null   int64  
 2   MSZoning       1460 non-null   object 
 3   LotFrontage    1201 non-null   float64
 4   LotArea        1460 non-null   int64  
 5   Street         1460 non-null   object 
 6   Alley          91 non-null     object 
 7   LotShape       1460 non-null   object 
 8   LandContour    1460 non-null   object 
 9   Utilities      1460 non-null   object 
 10  LotConfig      1460 non-null   object 
 11  LandSlope      1460 non-null   object 
 12  Neighborhood   1460 non-null   object 
 13  Condition1     1460 non-null   object 
 14  Condition2     1460 non-null   object 
 15  BldgType       1460 non-null   object 
 16  HouseStyle     1460 non-null   object 
 17  OverallQual    1460 non-null   int64  
 18  OverallC

**Step 3 : Data Processing**

In [49]:
df_num = df.select_dtypes(include = [np.number])
df_cat = df.select_dtypes(include = ['object'])

print(df_num.columns)
print(df_cat.columns)

Index(['Id', 'MSSubClass', 'LotFrontage', 'LotArea', 'OverallQual',
       'OverallCond', 'YearBuilt', 'YearRemodAdd', 'MasVnrArea', 'BsmtFinSF1',
       'BsmtFinSF2', 'BsmtUnfSF', 'TotalBsmtSF', '1stFlrSF', '2ndFlrSF',
       'LowQualFinSF', 'GrLivArea', 'BsmtFullBath', 'BsmtHalfBath', 'FullBath',
       'HalfBath', 'BedroomAbvGr', 'KitchenAbvGr', 'TotRmsAbvGrd',
       'Fireplaces', 'GarageYrBlt', 'GarageCars', 'GarageArea', 'WoodDeckSF',
       'OpenPorchSF', 'EnclosedPorch', '3SsnPorch', 'ScreenPorch', 'PoolArea',
       'MiscVal', 'MoSold', 'YrSold', 'SalePrice'],
      dtype='object')
Index(['MSZoning', 'Street', 'Alley', 'LotShape', 'LandContour', 'Utilities',
       'LotConfig', 'LandSlope', 'Neighborhood', 'Condition1', 'Condition2',
       'BldgType', 'HouseStyle', 'RoofStyle', 'RoofMatl', 'Exterior1st',
       'Exterior2nd', 'MasVnrType', 'ExterQual', 'ExterCond', 'Foundation',
       'BsmtQual', 'BsmtCond', 'BsmtExposure', 'BsmtFinType1', 'BsmtFinType2',
       'Heating', '

In [50]:
df_num.isnull().sum().sort_values(ascending = False)

LotFrontage      259
GarageYrBlt       81
MasVnrArea         8
LotArea            0
MSSubClass         0
Id                 0
OverallCond        0
OverallQual        0
YearRemodAdd       0
YearBuilt          0
BsmtFinSF2         0
BsmtUnfSF          0
TotalBsmtSF        0
BsmtFinSF1         0
2ndFlrSF           0
LowQualFinSF       0
GrLivArea          0
BsmtFullBath       0
BsmtHalfBath       0
FullBath           0
HalfBath           0
1stFlrSF           0
BedroomAbvGr       0
KitchenAbvGr       0
TotRmsAbvGrd       0
Fireplaces         0
GarageCars         0
GarageArea         0
WoodDeckSF         0
OpenPorchSF        0
EnclosedPorch      0
3SsnPorch          0
ScreenPorch        0
PoolArea           0
MiscVal            0
MoSold             0
YrSold             0
SalePrice          0
dtype: int64

In [51]:
print(df_num['LotFrontage'].median())
df_num['LotFrontage'] = df_num['LotFrontage'].fillna(df_num['LotFrontage'].median())

69.0


In [52]:
print(df_num['GarageYrBlt'].median())
df_num['GarageYrBlt'] = df_num['GarageYrBlt'].fillna(df_num['GarageYrBlt'].median())

1980.0


In [53]:
print(df_num['MasVnrArea'].median())
df_num['MasVnrArea'] = df_num['MasVnrArea'].fillna(df_num['MasVnrArea'].median())

0.0


In [54]:
df_num.isnull().sum().sort_values(ascending = False)

Id               0
MSSubClass       0
LotFrontage      0
LotArea          0
OverallQual      0
OverallCond      0
YearBuilt        0
YearRemodAdd     0
MasVnrArea       0
BsmtFinSF1       0
BsmtFinSF2       0
BsmtUnfSF        0
TotalBsmtSF      0
1stFlrSF         0
2ndFlrSF         0
LowQualFinSF     0
GrLivArea        0
BsmtFullBath     0
BsmtHalfBath     0
FullBath         0
HalfBath         0
BedroomAbvGr     0
KitchenAbvGr     0
TotRmsAbvGrd     0
Fireplaces       0
GarageYrBlt      0
GarageCars       0
GarageArea       0
WoodDeckSF       0
OpenPorchSF      0
EnclosedPorch    0
3SsnPorch        0
ScreenPorch      0
PoolArea         0
MiscVal          0
MoSold           0
YrSold           0
SalePrice        0
dtype: int64

In [55]:
df_cat.isnull().sum().sort_values(ascending = False)

PoolQC           1453
MiscFeature      1406
Alley            1369
Fence            1179
MasVnrType        872
FireplaceQu       690
GarageFinish       81
GarageQual         81
GarageType         81
GarageCond         81
BsmtFinType2       38
BsmtExposure       38
BsmtFinType1       37
BsmtQual           37
BsmtCond           37
Electrical          1
Condition1          0
Condition2          0
LotShape            0
Street              0
MSZoning            0
Neighborhood        0
LandSlope           0
LotConfig           0
Utilities           0
LandContour         0
RoofStyle           0
Heating             0
ExterCond           0
Foundation          0
HouseStyle          0
RoofMatl            0
Exterior1st         0
Exterior2nd         0
ExterQual           0
BldgType            0
HeatingQC           0
CentralAir          0
KitchenQual         0
Functional          0
PavedDrive          0
SaleType            0
SaleCondition       0
dtype: int64

In [56]:
df_cat.drop(columns = ['MiscFeature', 'Fence','PoolQC','FireplaceQu','MasVnrType','Alley'], inplace = True)

In [57]:
print(df_cat['GarageFinish'].mode())
df_cat['GarageFinish'] = df_cat['GarageFinish'].fillna(df_cat['GarageFinish'].mode())

0    Unf
Name: GarageFinish, dtype: object


In [58]:
print(df_cat['GarageQual'].mode())
df_cat['GarageQual'] =df['GarageQual'].fillna(df['GarageQual'].mode())

0    TA
Name: GarageQual, dtype: object


In [59]:
print(df_cat['GarageType'].mode())
df_cat['GarageType'] = df_cat['GarageType'].fillna(df_cat['GarageType'].mode())

0    Attchd
Name: GarageType, dtype: object


In [60]:
print(df_cat['GarageCond'].mode())
df_cat['GarageCond'] = df_cat['GarageCond'].fillna(df_cat['GarageCond'].mode()) 

0    TA
Name: GarageCond, dtype: object


In [61]:
print(df_cat['BsmtFinType2'].mode())
df_cat['BsmtFinType2'] = df_cat['BsmtFinType2'].fillna(df_cat['BsmtFinType2'].mode())

0    Unf
Name: BsmtFinType2, dtype: object


In [62]:
print(df_cat['BsmtFinType1'].mode())
df_cat['BsmtFinType1'] = df_cat['BsmtFinType1'].fillna(df_cat['BsmtFinType1'].mode())

0    Unf
Name: BsmtFinType1, dtype: object


In [63]:
print(df_cat['BsmtExposure'].mode())
df_cat['BsmtExposure'] = df_cat['BsmtExposure'].fillna(df_cat['BsmtExposure'].mode())   

0    No
Name: BsmtExposure, dtype: object


In [64]:
print(df_cat['BsmtQual'].mode())
df_cat['BsmtQual'] = df_cat['BsmtQual'].fillna(df_cat['BsmtQual'].mode())

0    TA
Name: BsmtQual, dtype: object


In [65]:
print(df_cat['BsmtCond'].mode())
df_cat['BsmtCond'] = df_cat['BsmtCond'].fillna(df_cat['BsmtCond'].mode())

0    TA
Name: BsmtCond, dtype: object


In [66]:
print(df_cat['Electrical'].mode())  
df_cat['Electrical'] = df_cat['Electrical'].fillna(df_cat['Electrical'].mode())

0    SBrkr
Name: Electrical, dtype: object


In [67]:
df_cat.isnull().sum().sort_values(ascending = False)

GarageType       81
GarageCond       81
GarageQual       81
GarageFinish     81
BsmtFinType2     38
BsmtExposure     38
BsmtFinType1     37
BsmtCond         37
BsmtQual         37
Electrical        1
MSZoning          0
LotShape          0
Street            0
Utilities         0
LandContour       0
LotConfig         0
LandSlope         0
Condition2        0
BldgType          0
Neighborhood      0
Condition1        0
ExterCond         0
Foundation        0
Exterior2nd       0
ExterQual         0
Exterior1st       0
RoofMatl          0
HouseStyle        0
RoofStyle         0
KitchenQual       0
CentralAir        0
HeatingQC         0
Heating           0
Functional        0
PavedDrive        0
SaleType          0
SaleCondition     0
dtype: int64

In [68]:
from sklearn.impute import SimpleImputer
imputer2 = SimpleImputer(strategy = 'most_frequent')
df_cat_imputed = pd.DataFrame(imputer2.fit_transform(df_cat))
df_cat_imputed.columns = df_cat.columns
print(df_cat_imputed.columns)

Index(['MSZoning', 'Street', 'LotShape', 'LandContour', 'Utilities',
       'LotConfig', 'LandSlope', 'Neighborhood', 'Condition1', 'Condition2',
       'BldgType', 'HouseStyle', 'RoofStyle', 'RoofMatl', 'Exterior1st',
       'Exterior2nd', 'ExterQual', 'ExterCond', 'Foundation', 'BsmtQual',
       'BsmtCond', 'BsmtExposure', 'BsmtFinType1', 'BsmtFinType2', 'Heating',
       'HeatingQC', 'CentralAir', 'Electrical', 'KitchenQual', 'Functional',
       'GarageType', 'GarageFinish', 'GarageQual', 'GarageCond', 'PavedDrive',
       'SaleType', 'SaleCondition'],
      dtype='object')


In [69]:
df_cat_imputed.isnull().sum().sort_values(ascending = False)

MSZoning         0
Street           0
LotShape         0
LandContour      0
Utilities        0
LotConfig        0
LandSlope        0
Neighborhood     0
Condition1       0
Condition2       0
BldgType         0
HouseStyle       0
RoofStyle        0
RoofMatl         0
Exterior1st      0
Exterior2nd      0
ExterQual        0
ExterCond        0
Foundation       0
BsmtQual         0
BsmtCond         0
BsmtExposure     0
BsmtFinType1     0
BsmtFinType2     0
Heating          0
HeatingQC        0
CentralAir       0
Electrical       0
KitchenQual      0
Functional       0
GarageType       0
GarageFinish     0
GarageQual       0
GarageCond       0
PavedDrive       0
SaleType         0
SaleCondition    0
dtype: int64

**Step 4 : Correlationship between columns**

In [70]:
df_num_corr = df_num.corr()
df_num_list = []

df_num_list.extend(df_num_corr[(df_num_corr['SalePrice'] > 0.3)].index.values)
df_num_list.extend(df_num_corr[(df_num_corr['SalePrice'] < 0.3)].index.values)

In [71]:
df_num_list

['LotFrontage',
 'OverallQual',
 'YearBuilt',
 'YearRemodAdd',
 'MasVnrArea',
 'BsmtFinSF1',
 'TotalBsmtSF',
 '1stFlrSF',
 '2ndFlrSF',
 'GrLivArea',
 'FullBath',
 'TotRmsAbvGrd',
 'Fireplaces',
 'GarageYrBlt',
 'GarageCars',
 'GarageArea',
 'WoodDeckSF',
 'OpenPorchSF',
 'SalePrice',
 'Id',
 'MSSubClass',
 'LotArea',
 'OverallCond',
 'BsmtFinSF2',
 'BsmtUnfSF',
 'LowQualFinSF',
 'BsmtFullBath',
 'BsmtHalfBath',
 'HalfBath',
 'BedroomAbvGr',
 'KitchenAbvGr',
 'EnclosedPorch',
 '3SsnPorch',
 'ScreenPorch',
 'PoolArea',
 'MiscVal',
 'MoSold',
 'YrSold']

In [72]:
df_cat_imputed['SP'] = df_num['SalePrice']

In [73]:
from scipy.stats import f_oneway

In [74]:
influence_list = []
noninfluence_list = []
for influence1 in list (df_cat_imputed.columns):
    if influence1 == 'SP':
        continue
    else:
        groups = [df_cat_imputed['SP'][df_cat_imputed[influence1] == category] for category in df_cat_imputed[influence1].unique()]
        f_stat, p_value = f_oneway(*groups)
        print(f"column : {influence1}, F-statistic: {f_stat}, P-value: {p_value}")
        if p_value < 0.05:
            influence_list.append(influence1)
        else:
            noninfluence_list.append(influence1)

column : MSZoning, F-statistic: 43.84028167245718, P-value: 8.817633866272648e-35
column : Street, F-statistic: 2.4592895583691994, P-value: 0.11704860406782483
column : LotShape, F-statistic: 40.13285166226295, P-value: 6.447523852011766e-25
column : LandContour, F-statistic: 12.850188333283924, P-value: 2.7422167521379096e-08
column : Utilities, F-statistic: 0.29880407484898486, P-value: 0.584716773968938
column : LotConfig, F-statistic: 7.809954123467792, P-value: 3.163167473604189e-06
column : LandSlope, F-statistic: 1.9588170374149438, P-value: 0.1413963584114019
column : Neighborhood, F-statistic: 71.78486512058272, P-value: 1.558600282771154e-225
column : Condition1, F-statistic: 6.118017137125925, P-value: 8.904549416138854e-08
column : Condition2, F-statistic: 2.0738986215227877, P-value: 0.043425658360948464
column : BldgType, F-statistic: 13.011077169620851, P-value: 2.0567364604967015e-10
column : HouseStyle, F-statistic: 19.595000995981223, P-value: 3.376776535121222e-25
c

In [75]:
influence_list

['MSZoning',
 'LotShape',
 'LandContour',
 'LotConfig',
 'Neighborhood',
 'Condition1',
 'Condition2',
 'BldgType',
 'HouseStyle',
 'RoofStyle',
 'RoofMatl',
 'Exterior1st',
 'Exterior2nd',
 'ExterQual',
 'ExterCond',
 'Foundation',
 'BsmtQual',
 'BsmtCond',
 'BsmtExposure',
 'BsmtFinType1',
 'BsmtFinType2',
 'Heating',
 'HeatingQC',
 'CentralAir',
 'Electrical',
 'KitchenQual',
 'Functional',
 'GarageType',
 'GarageFinish',
 'GarageQual',
 'GarageCond',
 'PavedDrive',
 'SaleType',
 'SaleCondition']

In [76]:
noninfluence_list

['Street', 'Utilities', 'LandSlope']

In [77]:
df_num1 = df_num[df_num_list]
df_num1.columns

Index(['LotFrontage', 'OverallQual', 'YearBuilt', 'YearRemodAdd', 'MasVnrArea',
       'BsmtFinSF1', 'TotalBsmtSF', '1stFlrSF', '2ndFlrSF', 'GrLivArea',
       'FullBath', 'TotRmsAbvGrd', 'Fireplaces', 'GarageYrBlt', 'GarageCars',
       'GarageArea', 'WoodDeckSF', 'OpenPorchSF', 'SalePrice', 'Id',
       'MSSubClass', 'LotArea', 'OverallCond', 'BsmtFinSF2', 'BsmtUnfSF',
       'LowQualFinSF', 'BsmtFullBath', 'BsmtHalfBath', 'HalfBath',
       'BedroomAbvGr', 'KitchenAbvGr', 'EnclosedPorch', '3SsnPorch',
       'ScreenPorch', 'PoolArea', 'MiscVal', 'MoSold', 'YrSold'],
      dtype='object')

In [78]:
df_cat1 = df_cat[influence_list]
df_cat1.columns

Index(['MSZoning', 'LotShape', 'LandContour', 'LotConfig', 'Neighborhood',
       'Condition1', 'Condition2', 'BldgType', 'HouseStyle', 'RoofStyle',
       'RoofMatl', 'Exterior1st', 'Exterior2nd', 'ExterQual', 'ExterCond',
       'Foundation', 'BsmtQual', 'BsmtCond', 'BsmtExposure', 'BsmtFinType1',
       'BsmtFinType2', 'Heating', 'HeatingQC', 'CentralAir', 'Electrical',
       'KitchenQual', 'Functional', 'GarageType', 'GarageFinish', 'GarageQual',
       'GarageCond', 'PavedDrive', 'SaleType', 'SaleCondition'],
      dtype='object')

**Step 5 : Encoding categorical columns**

In [79]:
le = LabelEncoder()
df_cat2 = df_cat1.apply(le.fit_transform)
df_cat2.head()

Unnamed: 0,MSZoning,LotShape,LandContour,LotConfig,Neighborhood,Condition1,Condition2,BldgType,HouseStyle,RoofStyle,...,Electrical,KitchenQual,Functional,GarageType,GarageFinish,GarageQual,GarageCond,PavedDrive,SaleType,SaleCondition
0,3,3,3,4,5,2,2,0,5,1,...,4,2,6,1,1,4,4,2,8,4
1,3,3,3,2,24,1,2,0,2,1,...,4,3,6,1,1,4,4,2,8,4
2,3,0,3,4,5,2,2,0,5,1,...,4,2,6,1,1,4,4,2,8,4
3,3,0,3,0,6,2,2,0,5,1,...,4,2,6,5,2,4,4,2,8,0
4,3,0,3,2,15,2,2,0,5,1,...,4,2,6,1,1,4,4,2,8,4


In [80]:
df_cat2.shape

(1460, 34)

**Step 6 : Outlier Removal by IQR method**

In [81]:
def remove_outliers_iqr(df, columns):
    for col in columns:
        # Ensure the column is numeric
        if pd.api.types.is_numeric_dtype(df[col]):
            Q1 = df[col].quantile(0.25)
            Q3 = df[col].quantile(0.75)
            IQR = Q3 - Q1

            # Define the lower and upper bounds for outliers
            lower_bound = Q1 - 1.5 * IQR
            upper_bound = Q3 + 1.5 * IQR

            # Remove rows where the values are outliers
            df = df[(df[col] >= lower_bound) & (df[col] <= upper_bound)]

    return df

columns_to_check = ['LotFrontage', 'OverallQual', 'YearBuilt', 'YearRemodAdd', 'MasVnrArea',
       'BsmtFinSF1', 'TotalBsmtSF', '1stFlrSF', '2ndFlrSF', 'GrLivArea',
       'FullBath', 'TotRmsAbvGrd', 'Fireplaces', 'GarageYrBlt', 'GarageCars',
       'GarageArea', 'WoodDeckSF', 'OpenPorchSF', 'SalePrice', 'Id',
       'MSSubClass', 'LotArea', 'OverallCond', 'BsmtFinSF2', 'BsmtUnfSF',
       'LowQualFinSF', 'BsmtFullBath', 'BsmtHalfBath', 'HalfBath',
       'BedroomAbvGr', 'KitchenAbvGr', 'EnclosedPorch', '3SsnPorch',
       'ScreenPorch', 'PoolArea', 'MiscVal', 'MoSold', 'YrSold']
df_no_outliers = remove_outliers_iqr(df_num1, columns_to_check)

print("DataFrame after removing outliers:")
print(df_no_outliers.shape)


DataFrame after removing outliers:
(516, 38)


**Step 7 : Scaling the numerical columns**

In [82]:
from sklearn.preprocessing import StandardScaler
columns_to_scale =  ['LotFrontage', 'OverallQual', 'YearBuilt', 'YearRemodAdd', 'MasVnrArea',
       'BsmtFinSF1', 'TotalBsmtSF', '1stFlrSF', '2ndFlrSF', 'GrLivArea',
       'FullBath', 'TotRmsAbvGrd', 'Fireplaces', 'GarageYrBlt', 'GarageCars',
       'GarageArea', 'WoodDeckSF', 'OpenPorchSF', 'SalePrice', 'Id',
       'MSSubClass', 'LotArea', 'OverallCond', 'BsmtFinSF2', 'BsmtUnfSF',
       'LowQualFinSF', 'BsmtFullBath', 'BsmtHalfBath', 'HalfBath',
       'BedroomAbvGr', 'KitchenAbvGr', 'EnclosedPorch', '3SsnPorch',
       'ScreenPorch', 'PoolArea', 'MiscVal', 'MoSold', 'YrSold']
standard_scaler = StandardScaler()
df_standard_scaled = df_no_outliers.copy()
df_standard_scaled[columns_to_scale] = standard_scaler.fit_transform(df_no_outliers[columns_to_scale])

In [83]:
final_df = pd.concat([df_standard_scaled, df_cat2], axis = 1)
final_df.head()

Unnamed: 0,LotFrontage,OverallQual,YearBuilt,YearRemodAdd,MasVnrArea,BsmtFinSF1,TotalBsmtSF,1stFlrSF,2ndFlrSF,GrLivArea,...,Electrical,KitchenQual,Functional,GarageType,GarageFinish,GarageQual,GarageCond,PavedDrive,SaleType,SaleCondition
0,-0.240322,0.719488,0.814365,0.731859,1.295518,0.765264,-0.655809,-0.87408,1.353269,0.823882,...,4,2,6,1,1,4,4,2,8,4
2,-0.007519,0.719488,0.734401,0.681359,0.95113,0.191107,-0.431401,-0.641539,1.382616,1.02407,...,4,2,6,1,1,4,4,2,8,4
4,1.234096,1.567105,0.694419,0.580359,2.855392,0.632164,0.357536,0.175991,1.839946,2.109301,...,4,2,6,1,1,4,4,2,8,4
10,0.147682,-0.975744,-0.704957,-1.187142,-0.689777,1.287224,-0.010635,-0.205523,-0.735287,-0.940934,...,4,3,6,5,2,4,4,2,8,4
13,1.777302,0.719488,0.934312,0.933859,2.409714,-1.077256,1.581263,1.444069,-0.735287,0.254926,...,4,2,6,1,1,4,4,2,6,5


In [85]:
final_df.isnull().sum().sort_values(ascending = False)

LotFrontage      944
OverallQual      944
YearBuilt        944
YearRemodAdd     944
MasVnrArea       944
                ... 
GarageQual         0
GarageCond         0
PavedDrive         0
SaleType           0
SaleCondition      0
Length: 72, dtype: int64

**Step 8 : Spltting the data into Training and testing sets**

In [86]:
# Drop rows with any missing values
df_clean = final_df.dropna()

# Separate features and target again
X = df_clean.drop(columns=['SalePrice'])
y = df_clean['SalePrice']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

**Step 9 : Training the model**

In [87]:
lasso = Lasso(alpha=0.1, random_state=42)  # alpha is the regularization strength
lasso.fit(X_train, y_train)

# Step 5: Predict on the test set
y_pred = lasso.predict(X_test)


r2 = r2_score(y_test, y_pred)

**Step 10 :Predicting the model score**

In [88]:
r2

0.8406969598812093