# Linear Regression From Scratch
*Least Squares and Least Mean Squares*
***
**Ian Malone**

The goal of this project is to design a linear regressor using a set of features from a house sales
database to determine its market value. I will be using a well-known data set, the
Boston Housing data set located at https://www.cs.toronto.edu/~delve/data/boston/bostonDetail.html

This data set has 14 attributes (features), and I will regress to price (14th feature). Therefore, this target variable will not be used as an input. The data set is rather small, and it will be dived 2/3 for training and 1/3 for testing.

The results will be summarized by presenting the training error and its variance across different initializations. The test set error and its variance across different runs will be reported in a table.

The Least Squares (LS) solution, also called statistical regression, and the Least Mean Squares (LMS) algorithm will be used to train the regressor.

When using LS, the effect of different levels of regularizations will be shown. When training with LMS, the learning rate must be properly selected. The effect of the stepsize (or learning rate) will be shown by plotting the learning curve. The weight tracks for the LMS will also be shown. The accuracy of the LMS with the best initialization will be compared to the accuracy of the analytic LS solution.

Other that will be explored include: What happens when the target variable is used as an input? This will be explained by visualizing the learned parameters of the model. Does a bias term need to be included in the model for this problem?

### Import Libraries and Prepare Data

#### Import Libraries

In [301]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

#### Load Data

In [9]:
df = pd.read_csv("C:\\Users\\Ian\\Google Drive\\IanGMalone\\UF\\Classes\\Deep Learning\\HW\\HW1\\Boston_Housing.csv")

In [11]:
df.describe()

Unnamed: 0,CRIM,ZN,INDUS,CHAS,NOX,RM,AGE,DIS,RAD,TAX,PTRATIO,B,LSTAT,MEDV
count,506.0,506.0,506.0,506.0,506.0,506.0,506.0,506.0,506.0,506.0,506.0,506.0,506.0,506.0
mean,3.613524,11.363636,11.136779,0.06917,0.554695,6.284634,68.574901,3.795043,9.549407,408.237154,18.455534,356.674032,12.653063,22.532806
std,8.601545,23.322453,6.860353,0.253994,0.115878,0.702617,28.148861,2.10571,8.707259,168.537116,2.164946,91.294864,7.141062,9.197104
min,0.00632,0.0,0.46,0.0,0.385,3.561,2.9,1.1296,1.0,187.0,12.6,0.32,1.73,5.0
25%,0.082045,0.0,5.19,0.0,0.449,5.8855,45.025,2.100175,4.0,279.0,17.4,375.3775,6.95,17.025
50%,0.25651,0.0,9.69,0.0,0.538,6.2085,77.5,3.20745,5.0,330.0,19.05,391.44,11.36,21.2
75%,3.677083,12.5,18.1,0.0,0.624,6.6235,94.075,5.188425,24.0,666.0,20.2,396.225,16.955,25.0
max,88.9762,100.0,27.74,1.0,0.871,8.78,100.0,12.1265,24.0,711.0,22.0,396.9,37.97,50.0


#### Prepare Data

In [20]:
# separate target variable

X = df.drop('MEDV', axis = 1)
y = df['MEDV']

In [362]:
# split data set for training and testing
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=0)

### Implement Regression Algorithms

#### Least Squares

In [457]:
def least_squares(data, target):
    y = target
    X = np.c_[np.ones(target.shape[0]), data.values]
    betas = np.matmul(np.linalg.inv(X.T @ X), X.T @ y)
    return betas

In [472]:
np.dot(weights, X_test.iloc[4,:]) + intercept

21.18581162209604

In [466]:
y_test.values

array([22.6, 50. , 23. ,  8.3, 21.2, 19.9, 20.6, 18.7, 16.1, 18.6,  8.8,
       17.2, 14.9, 10.5, 50. , 29. , 23. , 33.3, 29.4, 21. , 23.8, 19.1,
       20.4, 29.1, 19.3, 23.1, 19.6, 19.4, 38.7, 18.7, 14.6, 20. , 20.5,
       20.1, 23.6, 16.8,  5.6, 50. , 14.5, 13.3, 23.9, 20. , 19.8, 13.8,
       16.5, 21.6, 20.3, 17. , 11.8, 27.5, 15.6, 23.1, 24.3, 42.8, 15.6,
       21.7, 17.1, 17.2, 15. , 21.7, 18.6, 21. , 33.1, 31.5, 20.1, 29.8,
       15.2, 15. , 27.5, 22.6, 20. , 21.4, 23.5, 31.2, 23.7,  7.4, 48.3,
       24.4, 22.6, 18.3, 23.3, 17.1, 27.9, 44.8, 50. , 23. , 21.4, 10.2,
       23.3, 23.2, 18.9, 13.4, 21.9, 24.8, 11.9, 24.3, 13.8, 24.7, 14.1,
       18.7, 28.1, 19.8, 26.7, 21.7, 22. , 22.9, 10.4, 21.9, 20.6, 26.4,
       41.3, 17.2, 27.1, 20.4, 16.5, 24.4,  8.4, 23. ,  9.7, 50. , 30.5,
       12.3, 19.4, 21.2, 20.3, 18.8, 33.4, 18.5, 19.6, 33.2, 13.1,  7.5,
       13.6, 17.4,  8.4, 35.4, 24. , 13.4, 26.2,  7.2, 13.1, 24.5, 37.2,
       25. , 24.1, 16.6, 32.9, 36.2, 11. ,  7.2, 22

In [465]:
y_test

329    22.6
371    50.0
219    23.0
403     8.3
78     21.2
       ... 
281    35.4
231    31.7
64     33.0
327    22.2
322    20.4
Name: MEDV, Length: 167, dtype: float64

#### Least Mean Squares

In [5]:


# algorithm

# learning rate

# weight tracks

### Train and Test Algorithms

#### Train

In [473]:
# least squares
betas = least_squares(X_train, y_train)
intercept = betas[0]
weights = betas[1:]

#### Test

In [130]:
np.dot([1,2,3],[1,2,3])

14

### Visualize Results

#### Training Error and Variance Across Initializations

#### Testing Error and Variance Across Initializations

#### Effect of Different Levels of Regularizations (LS)

#### Effect of Different Learning Rates (LMS)

In [3]:
# learning curves

In [4]:
# weight tracks

#### Accuracy Comparison Between Best LS and Best LMS

#### What happens when the target variable is used as an input?

In [2]:
# visualize the learned model parameters

#### Does a bias term need to be included in the model for this problem?