# Problem 1

## Boston Housing Data

**Title:** Boston housing data

**Creator:** Harrison, D. and Rubinfeld, D.L. 'Hedonic prices and the demand for clean air', J. Environ. Economics & Management, vol.5, 81-102, 1978.

**Purpose:** To predict housing price in Boston area

**Features:**

| Feature | Description |
|---------|-------------|
| CRIM    | per capita crime rate by town |
| ZN      | proportion of residential land zoned for lots over 25,000 sq.ft. |
| INDUS   | proportion of non-retail business acres per town |
| CHAS    | Charles River dummy variable (= 1 if tract bounds river; 0 otherwise) |
| NOX     | nitric oxides concentration (parts per 10 million) |
| RM      | average number of rooms per dwelling |
| AGE     | proportion of owner-occupied units built prior to 1940 |
| DIS     | weighted distances to five Boston employment centres |
| RAD     | index of accessibility to radial highways |
| TAX     | full-value property-tax rate per $10,000 |
| PTRATIO | pupil-teacher ratio by town |
| B       | 1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town |
| LSTAT   | % lower status of the population |

**Target:**

| Target | Description |
|--------|-------------|
| MEDV   | Median housing price value of owner-occupied homes in $1000's |


Repository: <a href="https://raw.githubusercontent.com/sesillim/ai/main/Housing.csv">https://raw.githubusercontent.com/sesillim/ai/main/Housing.csv</a>

## Questions

1. How many observations are in the data?
2. Build a linear regression model. Print the coefficients of the model and assess the fit.
3. Examine whether the scaling improves the fit of the linear regression model. You can choose any scaling method for the examination.
4. Examine whether implementing L2 regularization with scaling improves model performance. To optimize the hyperparameter, you have to compare at least 5 different hyperparameter values.
5. Examine whether implementing L1 regularization with scaling improves model performance. To optimize the hyperparameter, you have to compare at least 5 different hyperparameter values.


In [1]:
import pandas as pd
import numpy as np

In [2]:
url = 'https://raw.githubusercontent.com/sesillim/ai/main/Housing.csv'
data = pd.read_csv(url)

In [3]:
data.head()

Unnamed: 0,CRIM,ZN,INDUS,CHAS,NOX,RM,AGE,DIS,RAD,TAS,PTRATIO,B,LSTAT,MEDV
0,0.00632,18.0,2.31,0,0.538,6.575,65.2,4.09,1,296,15.3,396.9,4.98,24.0
1,0.02731,0.0,7.07,0,0.469,6.421,78.9,4.9671,2,242,17.8,396.9,9.14,21.6
2,0.02729,0.0,7.07,0,0.469,7.185,61.1,4.9671,2,242,17.8,392.83,4.03,34.7
3,0.03237,0.0,2.18,0,0.458,6.998,45.8,6.0622,3,222,18.7,394.63,2.94,33.4
4,0.06905,0.0,2.18,0,0.458,7.147,54.2,6.0622,3,222,18.7,396.9,5.33,36.2


In [4]:
data.describe()

Unnamed: 0,CRIM,ZN,INDUS,CHAS,NOX,RM,AGE,DIS,RAD,TAS,PTRATIO,B,LSTAT,MEDV
count,506.0,506.0,506.0,506.0,506.0,506.0,506.0,506.0,506.0,506.0,506.0,506.0,506.0,506.0
mean,3.613524,11.363636,11.136779,0.06917,0.554695,6.284634,68.574901,3.795043,9.549407,408.237154,18.455534,356.674032,12.653063,22.532806
std,8.601545,23.322453,6.860353,0.253994,0.115878,0.702617,28.148861,2.10571,8.707259,168.537116,2.164946,91.294864,7.141062,9.197104
min,0.00632,0.0,0.46,0.0,0.385,3.561,2.9,1.1296,1.0,187.0,12.6,0.32,1.73,5.0
25%,0.082045,0.0,5.19,0.0,0.449,5.8855,45.025,2.100175,4.0,279.0,17.4,375.3775,6.95,17.025
50%,0.25651,0.0,9.69,0.0,0.538,6.2085,77.5,3.20745,5.0,330.0,19.05,391.44,11.36,21.2
75%,3.677083,12.5,18.1,0.0,0.624,6.6235,94.075,5.188425,24.0,666.0,20.2,396.225,16.955,25.0
max,88.9762,100.0,27.74,1.0,0.871,8.78,100.0,12.1265,24.0,711.0,22.0,396.9,37.97,50.0


### 1. How many observations are in the data?

In [5]:
print("The number of observations are ", data.shape[0])

The number of observations are  506


#### 506 observations

### 2. Build a linear regression model. Print the coefficients of the model and assess the fit.

In [6]:
X = data.drop('MEDV', axis='columns')
y = data['MEDV']

In [7]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)

In [8]:
from sklearn.linear_model import LinearRegression
lr = LinearRegression()

In [9]:
lr.fit(X_train, y_train)

In [10]:
df = pd.DataFrame(lr.coef_, X.columns, columns=['Coefficeint'])
df

Unnamed: 0,Coefficeint
CRIM,-0.119443
ZN,0.04478
INDUS,0.005485
CHAS,2.340804
NOX,-16.123604
RM,3.708709
AGE,-0.003121
DIS,-1.386397
RAD,0.244178
TAS,-0.01099


In [11]:
y_pred = lr.predict(X_test)
actual_vs_predict = pd.DataFrame({'Actual': y_test, 'Predicted': y_pred})
actual_vs_predict.head()

Unnamed: 0,Actual,Predicted
329,22.6,24.889638
371,50.0,23.721411
219,23.0,29.364999
403,8.3,12.122386
78,21.2,21.443823


In [12]:
from sklearn import metrics
print('MAE: ', metrics.mean_absolute_error(y_test, y_pred))
print('MSE: ', metrics.mean_squared_error(y_test, y_pred))
print('RMSE: ', np.sqrt(metrics.mean_squared_error(y_test, y_pred)))

MAE:  3.8429092204444997
MSE:  33.44897999767638
RMSE:  5.783509315085122


### 3. Examine whether the scaling improves the fit of the linear regression model. You can choose any scaling method for the examination.

In [23]:
from sklearn.preprocessing import StandardScaler
std_scaler = StandardScaler()

In [24]:
std_scaler.fit(X_train)
X_train_scaled = std_scaler.transform(X_train)
X_test_scaled = std_scaler.transform(X_test)

In [31]:
lr_std_scaled = LinearRegression()
lr_std_scaled.fit(X_train_scaled, y_train)

In [32]:
df_scale = pd.DataFrame(lr_std_scaled.coef_, X.columns, columns=['Coefficient'])
print("A linear regression with scaled data:")
df_scale

A linear regression with scaled data:


Unnamed: 0,Coefficient
CRIM,-0.97082
ZN,1.057149
INDUS,0.038311
CHAS,0.594506
NOX,-1.855148
RM,2.573219
AGE,-0.087615
DIS,-2.880943
RAD,2.112245
TAS,-1.875331


In [33]:
y_pred_scaled = lr_std_scaled.predict(X_test_scaled)

In [34]:
print("MAE after scaling: ", metrics.mean_absolute_error(y_test, y_pred_scaled))
print("MSE after scaling: ", metrics.mean_squared_error(y_test, y_pred_scaled))
print("RMSE after scaling: ", np.sqrt(metrics.mean_squared_error(y_test, y_pred_scaled)))

MAE after scaling:  3.8429092204444952
MSE after scaling:  33.44897999767652
RMSE after scaling:  5.783509315085134


### 4. Examine whether implementing L2 regularization with scaling improves model performance. To optimize the hyperparameter, you have to compare at least 5 different hyperparameter values.

In [35]:
from sklearn.linear_model import Ridge


In [43]:
alpha_values = [0.01, 0.1, 1, 10, 100]
best_alpha = None
best_rmse = float('inf')

for alpha in alpha_values:
  ridge_scale = Ridge(alpha=alpha).fit(X_train_scaled, y_train)
  df_ridge_scale = pd.DataFrame(ridge_scale.coef_, X.columns, columns=['Coefficient'])
  print(f"L2 Regularization (Ridge Regression) with lambda {alpha} and scaled data: ")
  print(df_ridge_scale)
  print("R-squared for a training set: ", ridge_scale.score(X_train_scaled, y_train))
  print("R-squared for a test set: ", ridge_scale.score(X_test_scaled, y_test))
  y_pred_ridge_scaled = ridge_scale.predict(X_test_scaled)
  print("MAE: ", metrics.mean_absolute_error(y_test, y_pred_ridge_scaled))
  print("MSE: ", metrics.mean_squared_error(y_test, y_pred_ridge_scaled))
  print("RMSE: ", np.sqrt(metrics.mean_squared_error(y_test, y_pred_ridge_scaled)))
  print("================================================================")
  if np.sqrt(metrics.mean_squared_error(y_test, y_pred_ridge_scaled)) < best_rmse:
    best_rmse = np.sqrt(metrics.mean_squared_error(y_test, y_pred_ridge_scaled))
    best_alpha = alpha

print(f'Best alpha for L2 regularization: {best_alpha}')

L2 Regularization (Ridge Regression) with lambda 0.01 and scaled data: 
         Coefficient
CRIM       -0.970732
ZN          1.056981
INDUS       0.038034
CHAS        0.594550
NOX        -1.854790
RM          2.573329
AGE        -0.087694
DIS        -2.880612
RAD         2.111458
TAS        -1.874610
PTRATIO    -2.292670
B           0.718181
LSTAT      -3.592289
R-squared for a training set:  0.7730135553534744
R-squared for a test set:  0.5892114815046758
MAE:  3.8429288870967704
MSE:  33.44986784562439
RMSE:  5.783586071428728
L2 Regularization (Ridge Regression) with lambda 0.1 and scaled data: 
         Coefficient
CRIM       -0.969942
ZN          1.055479
INDUS       0.035549
CHAS        0.594942
NOX        -1.851578
RM          2.574308
AGE        -0.088395
DIS        -2.877636
RAD         2.104394
TAS        -1.868146
PTRATIO    -2.291793
B           0.718191
LSTAT      -3.590798
R-squared for a training set:  0.7730134004374536
R-squared for a test set:  0.5891134222735185
MAE

### 5. Examine whether implementing L1 regularization with scaling improves model performance. To optimize the hyperparameter, you have to compare at least 5 different hyperparameter values.

In [44]:
from sklearn.linear_model import Lasso

In [45]:
alpha_values = [0.01, 0.1, 1, 10, 100]
best_alpha_l1 = None
best_rmse_l1 = float('inf')

for alpha in alpha_values:
  lasso_scale = Lasso(alpha=alpha).fit(X_train_scaled, y_train)
  df_lasso_scale = pd.DataFrame(lasso_scale.coef_, X.columns, columns=['Coefficient'])
  print(f"L1 Regularization (Lasso Regression) with lambda {alpha} and scaled data: ")
  print(df_lasso_scale)
  print("R-squared for a training set: ", lasso_scale.score(X_train_scaled, y_train))
  print("R-squared for a test set: ", lasso_scale.score(X_test_scaled, y_test))
  y_pred_lasso_scaled = lasso_scale.predict(X_test_scaled)
  print("MAE: ", metrics.mean_absolute_error(y_test, y_pred_lasso_scaled))
  print("MSE: ", metrics.mean_squared_error(y_test, y_pred_lasso_scaled))
  print("RMSE: ", np.sqrt(metrics.mean_squared_error(y_test, y_pred_lasso_scaled)))
  print("================================================================")
  if np.sqrt(metrics.mean_squared_error(y_test, y_pred_lasso_scaled)) < best_rmse_l1:
    best_rmse_l1 = np.sqrt(metrics.mean_squared_error(y_test, y_pred_lasso_scaled))
    best_alpha_l1 = alpha

print(f'Best alpha for L1 regularization: {best_alpha_l1}')

L1 Regularization (Lasso Regression) with lambda 0.01 and scaled data: 
         Coefficient
CRIM       -0.940302
ZN          1.021582
INDUS      -0.000000
CHAS        0.594840
NOX        -1.804075
RM          2.585398
AGE        -0.069486
DIS        -2.809365
RAD         1.956109
TAS        -1.738284
PTRATIO    -2.278841
B           0.705484
LSTAT      -3.596398
R-squared for a training set:  0.7729557113377097
R-squared for a test set:  0.5874763161420908
MAE:  3.843868756386693
MSE:  33.59115965261396
RMSE:  5.795788095903262
L1 Regularization (Lasso Regression) with lambda 0.1 and scaled data: 
         Coefficient
CRIM       -0.663468
ZN          0.701524
INDUS      -0.130724
CHAS        0.588934
NOX        -1.358749
RM          2.722754
AGE        -0.000000
DIS        -2.140932
RAD         0.640853
TAS        -0.658779
PTRATIO    -2.172217
B           0.602666
LSTAT      -3.615800
R-squared for a training set:  0.7676692732920614
R-squared for a test set:  0.5663726208397636
MAE:

### Final Assessment

The regularization methods seem to have no effect in the improvement of the fit for this particular data. Instead the errors are a bit high in some cases.