# Boston Housing with Lasso Regression

** With this data our objective is create a model using lasso regression to predict the houses price  **

The data contains the following columns:
* 'crim': per capita crime rate by town.
* 'zn': proportion of residential land zoned for lots over 25,000 sq.ft.
* 'indus': proportion of non-retail business acres per town.
* 'chas':Charles River dummy variable (= 1 if tract bounds river; 0 otherwise).
* 'nox': nitrogen oxides concentration (parts per 10 million).
* 'rm': average number of rooms per dwelling.
* 'age': proportion of owner-occupied units built prior to 1940.
* 'dis': weighted mean of distances to five Boston employment centres.
* 'rad': index of accessibility to radial highways.
* 'tax': full-value property-tax rate per $10,000.
* 'ptratio': pupil-teacher ratio by town
* 'black': 1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town.
* 'lstat': lower status of the population (percent).
* 'medv': median value of owner-occupied homes in $$1000s


**Lets Start**

First we need to prepare our enviroment importing some librarys

In [2]:
import pandas as pd
import numpy as np
import warnings
warnings.filterwarnings('ignore')

In [3]:
# Importing DataSet and take a look at Data
data = pd.read_csv("boston_train.csv")

** Here we can look at the data **

In [4]:
data.head()

Unnamed: 0,ID,crim,zn,indus,chas,nox,rm,age,dis,rad,tax,ptratio,black,lstat,medv
0,1,0.00632,18.0,2.31,0,0.538,6.575,65.2,4.09,1,296,15.3,396.9,4.98,24.0
1,2,0.02731,0.0,7.07,0,0.469,6.421,78.9,4.9671,2,242,17.8,396.9,9.14,21.6
2,4,0.03237,0.0,2.18,0,0.458,6.998,45.8,6.0622,3,222,18.7,394.63,2.94,33.4
3,5,0.06905,0.0,2.18,0,0.458,7.147,54.2,6.0622,3,222,18.7,396.9,5.33,36.2
4,7,0.08829,12.5,7.87,0,0.524,6.012,66.6,5.5605,5,311,15.2,395.6,12.43,22.9


In [5]:
data.info()
data.describe()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 333 entries, 0 to 332
Data columns (total 15 columns):
ID         333 non-null int64
crim       333 non-null float64
zn         333 non-null float64
indus      333 non-null float64
chas       333 non-null int64
nox        333 non-null float64
rm         333 non-null float64
age        333 non-null float64
dis        333 non-null float64
rad        333 non-null int64
tax        333 non-null int64
ptratio    333 non-null float64
black      333 non-null float64
lstat      333 non-null float64
medv       333 non-null float64
dtypes: float64(11), int64(4)
memory usage: 39.1 KB


Unnamed: 0,ID,crim,zn,indus,chas,nox,rm,age,dis,rad,tax,ptratio,black,lstat,medv
count,333.0,333.0,333.0,333.0,333.0,333.0,333.0,333.0,333.0,333.0,333.0,333.0,333.0,333.0,333.0
mean,250.951952,3.360341,10.689189,11.293483,0.06006,0.557144,6.265619,68.226426,3.709934,9.633634,409.279279,18.448048,359.466096,12.515435,22.768769
std,147.859438,7.352272,22.674762,6.998123,0.237956,0.114955,0.703952,28.133344,1.981123,8.742174,170.841988,2.151821,86.584567,7.067781,9.173468
min,1.0,0.00632,0.0,0.74,0.0,0.385,3.561,6.0,1.1296,1.0,188.0,12.6,3.5,1.73,5.0
25%,123.0,0.07896,0.0,5.13,0.0,0.453,5.884,45.4,2.1224,4.0,279.0,17.4,376.73,7.18,17.4
50%,244.0,0.26169,0.0,9.9,0.0,0.538,6.202,76.7,3.0923,5.0,330.0,19.0,392.05,10.97,21.6
75%,377.0,3.67822,12.5,18.1,0.0,0.631,6.595,93.8,5.1167,24.0,666.0,20.2,396.24,16.42,25.0
max,506.0,73.5341,100.0,27.74,1.0,0.871,8.725,100.0,10.7103,24.0,711.0,21.2,396.9,37.97,50.0


** Now, our goal is think about the columns, and discovery which columns is relevant to build our model, because if we consider to put columns with not relevant  with our objective "medv" the model may be not efficient **

In [6]:
#ID columns does not relevant for our analysis.
data.drop('ID', axis = 1, inplace=True)

** Now lets take a loot how all the variables relate to each other. **

In [7]:
data.corr

<bound method DataFrame.corr of          crim    zn  indus  chas    nox     rm    age     dis  rad  tax  \
0     0.00632  18.0   2.31     0  0.538  6.575   65.2  4.0900    1  296   
1     0.02731   0.0   7.07     0  0.469  6.421   78.9  4.9671    2  242   
2     0.03237   0.0   2.18     0  0.458  6.998   45.8  6.0622    3  222   
3     0.06905   0.0   2.18     0  0.458  7.147   54.2  6.0622    3  222   
4     0.08829  12.5   7.87     0  0.524  6.012   66.6  5.5605    5  311   
5     0.22489  12.5   7.87     0  0.524  6.377   94.3  6.3467    5  311   
6     0.11747  12.5   7.87     0  0.524  6.009   82.9  6.2267    5  311   
7     0.09378  12.5   7.87     0  0.524  5.889   39.0  5.4509    5  311   
8     0.62976   0.0   8.14     0  0.538  5.949   61.8  4.7075    4  307   
9     0.63796   0.0   8.14     0  0.538  6.096   84.5  4.4619    4  307   
10    0.62739   0.0   8.14     0  0.538  5.834   56.5  4.4986    4  307   
11    1.05393   0.0   8.14     0  0.538  5.935   29.3  4.4986    4  

# Trainning Lasso Regression Model
**Define X and Y**

X: Varibles named as predictors, independent variables, features.                                                               
Y: Variable named as response or dependent variable

In [8]:
X = data[['crim', 'zn', 'indus', 'chas', 'nox', 'rm', 'age', 'dis', 'rad', 'tax',
       'ptratio', 'black', 'lstat']]
y = data['medv']

In [9]:
train_index = int(0.8 * len(X))
x_train, x_test = X[:train_index].values, X[train_index:].values
y_train, y_test = y[:train_index].values, y[train_index:].values

In [10]:
mean_X_train = np.mean(x_train)
std_X_train = np.std(x_train)
mean_X_test = np.mean(x_test)
std_X_test = np.std(x_test)
#Scaling the data without sklearn
x_train = (x_train - mean_X_train) / std_X_train
x_test = (x_test - mean_X_test) / std_X_test

In [11]:
x_train[:5] , x_test[:5]

(array([[-4.95098762e-01, -3.61685657e-01, -4.78018259e-01,
         -4.95145621e-01, -4.91156651e-01, -4.46395662e-01,
         -1.17239732e-02, -4.64820551e-01, -4.87731179e-01,
          1.69952934e+00, -3.81704651e-01,  2.44764659e+00,
         -4.58221698e-01],
        [-4.94943133e-01, -4.95145621e-01, -4.42725513e-01,
         -4.95145621e-01, -4.91668247e-01, -4.47537486e-01,
          8.98538884e-02, -4.58317344e-01, -4.80316736e-01,
          1.29914945e+00, -3.63168545e-01,  2.44764659e+00,
         -4.27377617e-01],
        [-4.94905615e-01, -4.95145621e-01, -4.78982136e-01,
         -4.95145621e-01, -4.91749806e-01, -4.43259353e-01,
         -1.55564157e-01, -4.50197788e-01, -4.72902294e-01,
          1.15086060e+00, -3.56495547e-01,  2.43081580e+00,
         -4.73347160e-01],
        [-4.94633654e-01, -4.95145621e-01, -4.78982136e-01,
         -4.95145621e-01, -4.91749806e-01, -4.42154601e-01,
         -9.32828401e-02, -4.50197788e-01, -4.72902294e-01,
          1.1508606

In [12]:
# creating a class for Lasso Regression

class Lasso_Regression():

  #initiating the hyperparameters
  def __init__(self, learning_rate, no_of_iterations, lambda_parameter):

    self.learning_rate = learning_rate
    self.no_of_iterations = no_of_iterations
    self.lambda_parameter = lambda_parameter


  # fitting the dataset to the Lasso Regression model
  def fit(self, X, Y):

    # m --> number of Data points --> number of rows
    # n --> number of input features --> number of columns
    self.m, self.n = X.shape

    self.w = np.zeros(self.n)

    self.b = 0

    self.X = X

    self.Y = Y

    # implementing Gradient Descent algorithm for Optimization

    for i in range(self.no_of_iterations):     # missed "self"
      self.upadte_weights()


  # function for updating the weight & bias value
  def upadte_weights(self):

    # linear equation of the model
    Y_prediction = self.predict(self.X)

    # gradients (dw, db)

    # gradient for weight
    dw = np.zeros(self.n)

    for i in range(self.n):

      if self.w[i]>0:

        dw[i] = (-(2*(self.X[:,i]).dot(self.Y - Y_prediction)) + self.lambda_parameter) / self.m 

      else :

        dw[i] = (-(2*(self.X[:,i]).dot(self.Y - Y_prediction)) - self.lambda_parameter) / self.m


    # gradient for bias
    db = - 2 * np.sum(self.Y - Y_prediction) / self.m


    # updating the weights & bias

    self.w = self.w - self.learning_rate*dw
    self.b = self.b - self.learning_rate*db

    


  # Predicting the Target variable
  def predict(self,X):

    return X.dot(self.w) + self.b

In [13]:
model = Lasso_Regression( learning_rate = 0.01, no_of_iterations=1000,
                        lambda_parameter= 0.98)

In [14]:
model.fit(x_train, y_train)

In [15]:
y_pred = model.predict(x_test)

In [16]:
SSres = np.sum((y_test - y_pred)**2)
y_pred_mean = np.mean(y_pred)
SStot = np.sum((y_test - y_pred_mean)**2)
r2 = 1 - SSres/SStot
r2for = "{:.2f}".format(r2*100)
print("r2_score is " , r2for)

r2_score is  41.42


In [17]:
mae = np.mean(np.abs(y_test-y_pred))
maefor = "{:.2f}".format(mae)
print("Mean absolute  error " , maefor)

rmse = np.sqrt((np.sum((y_test-y_pred)**2))/len(y_test))
rmsefor = "{:.2f}".format(rmse)
print("Root mean squared error " , rmsefor)

mse = ((np.sum((y_test-y_pred)**2))/len(y_test))
msefor = "{:.2f}".format(mse)
print("Mean squared error " , msefor)

Mean absolute  error  2.70
Root mean squared error  3.49
Mean squared error  12.17


In [18]:
def Gridsearch():
    

# Model training    
    max_accuracy = 0
      
    # learning_rate choices    
    learning_rates = [ 0.1, 0.2, 0.3, 0.4, 0.5, 
                      0.01, 0.02, 0.03, 0.04, 0.05 ]
      
    # iterations choices    
    no_of_iterations = [ 100, 200, 300, 400, 500 ]
    
    # lambda parameters
    lambda_parameter = [0, 0.01, 0.98, 0.1, 0.5, 1, 10]
      
    # available combination of learning_rate and iterations
      
    parameters = []    
    for i in learning_rates :        
        for j in no_of_iterations : 
            for a in lambda_parameter:
                parameters.append( ( i, j, a ) )
              
    #print("Available combinations : ",  parameters )
              
    # Applying linear searching in list of available combination
    # to achieved maximum accuracy on CV set
      
    for k in range( len( parameters ) ) :        
        model1 = Lasso_Regression( learning_rate = parameters[k][0], no_of_iterations = parameters[k][1], lambda_parameter = parameters[k][2])
      
        model1.fit( x_train, y_train )
        
        # Prediction on validation set
        Y_pred = model1.predict( x_test )
       
        # measure performance  on validation set
      
        correctly_price = 0
        count = 0
      
        for i in range( np.size(Y_pred) ) :            
            if y_test[count] != Y_pred[count]:                  
                SSres = np.sum((y_test - Y_pred)**2)
                y_pred_mean = np.mean(Y_pred)
                SStot = np.sum((y_test - y_pred_mean)**2)
                r2 = 1 - SSres/SStot
                
                mae = np.mean(np.abs(y_test-y_pred))
                maefor = "{:.2f}".format(mae)
                #print("Mean absolute  error " , maefor)

                rmse = np.sqrt((np.sum((y_test-y_pred)**2))/len(y_test))
                rmsefor = "{:.2f}".format(rmse)
                #print("Root mean squared error " , rmsefor)

                mse = ((np.sum((y_test-y_pred)**2))/len(y_test))
                msefor = "{:.2f}".format(mse)
                #print("Mean squared error " , msefor)
               
        curr_accuracy = ( r2) * 100
        mae = mae 
        rmsefor = rmsefor
        msefor = msefor
                
        r2_score = curr_accuracy
              
    print( "Maximum accuracy achieved by our model through grid searching : ", r2_score )
    print( "The least mean Absolute error", mae)
    print( "The least root mean squared error " , rmsefor)
    print( "The mean squared error " , msefor)

In [19]:
if __name__ == "__main__" :     
    Gridsearch()

Maximum accuracy achieved by our model through grid searching :  46.279817356638084
The least mean Absolute error 2.7022730417277074
The least root mean squared error  3.49
The mean squared error  12.17


In [25]:
import numpy as np
import itertools

def lasso_regression(X, y, alpha):
    # Implementation of Lasso regression with a given regularization parameter alpha
    # ...
    beta = 0
    return beta

def mean_squared_error(y_true, y_pred):
    # Implementation of mean squared error
    # ...
    return mse

def random_search(X, y, alphas, n_iter):
    # Implementation of RandomizedSearchCV for Lasso regression
    best_mse = np.inf
    best_alpha = None
    
    for i in range(n_iter):
        alpha = np.random.choice(alphas)
        beta = lasso_regression(X, y, alpha)
        y_pred = X.dot(beta)
        mse = mean_squared_error(y, y_pred)
        
        if mse < best_mse:
            best_mse = mse
            best_alpha = alpha
    
    return best_alpha

# Generate some sample data
np.random.seed(0)
n = 100
p = 10
X = np.random.randn(n, p)
y = np.random.randn(n)

# Set up the hyperparameters to search over and the number of iterations
alphas = np.logspace(-3, 3, num=7)
n_iter = 100

# Perform the random search
best_alpha = random_search(X, y, alphas, n_iter)
print(f"Best alpha: {best_alpha}")
print("mse",mse)


Best alpha: 0.001
mse 12.174046556501848


In [24]:

# Define Ridge Regression function
def Lasso_Regression(alpha, x_train, x_test, y_train, y_test):
    # Fit Ridge Regression model with given alpha
    model = np.linalg.inv(x_train.T.dot(x_train) + alpha*np.identity(x_train.shape[1])).dot(x_train.T).dot(y_train)
    
    # Predict on test set
    y_pred = x_test.dot(model)
    
    # Calculate R2 score
    r2 = 1 - np.sum((y_test - y_pred)**2) / np.sum((y_test -np.mean(y_test))**2)
    
    return r2

# Define range of alpha values to search over
alphas = np.logspace(-3, 3, num=100)

# Define number of iterations for random search
n_iter = 10

# Initialize best hyperparameters and R2 score
best_alpha = None
best_r2 = -np.inf

# Perform random search
for i in range(n_iter):
    # Select random alpha value
    alpha = np.random.choice(alphas)
    
    # Compute R2 score for Ridge Regression with this alpha
    r2 = Lasso_Regression(alpha, x_train, x_test, y_train, y_test)
    
    # Update best hyperparameters and R2 score if necessary
    if r2 > best_r2:
        best_alpha = alpha
        best_r2 = r2
        
# Print best hyperparameters and R2 score
print("Best alpha: {:.5f}".format(best_alpha))
print("Best R2 score: {:.5f}".format(best_r2*100))


Best alpha: 2.15443
Best R2 score: 45.42548
