# Linear Regression (Multiple Variables)

## Function
In here we have multiple features so $w$ is a vector and is written as $\overrightarrow{w}$
and $x$ is also a vector written as $\overrightarrow{x}$. $b$ is the same number (bias) and we write the function as<br>
$f_{\overrightarrow{w},b}(\overrightarrow{x}) = \overrightarrow{w}.\overrightarrow{x} + b$<br>
which is its vectorized form; without vectorization it will be like <br>
$f_{w, b}(x) = w_1x_1 + w_2x_2 + . . . + w_nx_n + b$ <br>
The result of the dot product is the same<br>
$\overrightarrow{w}.\overrightarrow{x} = w_1x_1 + w_2x_2 + . . . + w_nx_n$

Let's use the same example of price of the houses but this time with multiple features; other than predicting the price via its size, we'll have number of bedrooms, age of the house, number of bathrooms.

In [1]:
import numpy as np
import pandas as pd
import copy
import matplotlib.pyplot as plt

In [2]:
# Our dataset
x_train = np.array([[1.275, 4, 1, 12], [1.674, 5, 2, 6], [2.000, 6, 3, 1], [0.987, 2, 1, 34], [1.275, 4, 1, 4]], dtype='float64')
y_train = np.array([452.983, 673.983, 983.992, 122.111, 555.211], dtype='float64')
m, n = x_train.shape # m is the number of data and n is the number of features
w = np.random.random(n) # initial value for w
b = 100 # initial value for b

In [3]:
df = pd.DataFrame(x_train, columns=['Size (1k feet squared)', 'Number of bedrooms', 'Number of bathrooms', 'Age'])
df['Price(1k $)'] = pd.Series(y_train)
df

Unnamed: 0,Size (1k feet squared),Number of bedrooms,Number of bathrooms,Age,Price(1k $)
0,1.275,4.0,1.0,12.0,452.983
1,1.674,5.0,2.0,6.0,673.983
2,2.0,6.0,3.0,1.0,983.992
3,0.987,2.0,1.0,34.0,122.111
4,1.275,4.0,1.0,4.0,555.211


In [4]:
def predict(x, w, b):
    return np.dot(w, x) + b

Now let's try to predict the value of the first house in the dataset; 

In [5]:
print(f'Value for a house with size=1.275 and numberOfBedrooms=4 and numberOfBathrooms=1 and age=12 is {predict(x_train[0], w, b):.3f}k$')
print(f'The actual price is {y_train[0]}k$')

Value for a house with size=1.275 and numberOfBedrooms=4 and numberOfBathrooms=1 and age=12 is 104.999k$
The actual price is 452.983k$


## Cost function
Now that we use vectorization, cost function is defined as <br>
$J(\overrightarrow{w}, b) = \frac{1}{2m}\Sigma_{i=1}^{m}(f_{\overrightarrow{x},b}(\overrightarrow{x}^{(i)}) - y^{(i)})^2$

In [6]:
def cost(x, y, w, b):
    err_sum = 0
    m = x.shape[0]
    for i in range(m):
        f_wb = np.dot(w, x[i]) + b
        err_sum += (f_wb - y[i]) ** 2
    err_sum = err_sum / (2 * m)
    return err_sum

If we run the cost function now due to our last test for predicting the price of the first house, we anticipate a large number


In [7]:
print(cost(x_train, y_train, w, b))

141801.20734060762


## Gradient Descent
Now since we have multiple features, we should find the derivative of each of the features. So, we have to find a good value for each of $w$s from $w_1$ to $w_n$
So we have to find the derivative for each of the features
$w_i = w_i - \alpha\frac{d}{dm}J(\overrightarrow{w}, b)$ and $b = b - \alpha\frac{d}{db}J(\overrightarrow{w}, b)$<br>
for each of the $w$s we have<br>
$w_j = w_j - \alpha\frac{1}{m}\Sigma_{i=1}^{m}(f_{\overrightarrow{w},b}(x^{(i)}) - y^{(i)})x_{j}^{(i)}$<br>
$b = b - \alpha\frac{1}{m}\Sigma_{i=1}^{m}(f_{\overrightarrow{w},b}(x^{(i)})-y^{(i)})$

In [8]:
def gradient(x, y, w, b):
    m, n = x.shape
    w_t = np.zeros((n,))
    b_t = 0
    for i in range(m):
        err = np.dot(w, x[i]) + b - y[i]
        for j in range(n):
            w_t[j] += err * x[i, j]
        b_t += err
    w_t = w_t / m
    b_t = b_t / m
    return w_t, b_t

In [9]:
def gradient_descent(x, y, init_w, init_b, alpha, cost_function, gradient_function, iterations=1000):
    w = copy.deepcopy(init_w)
    b = init_b
    for i in range(1, iterations + 1):
        w_t, b_t = gradient_function(x, y, w, b)
        w = w - (alpha * w_t)
        b = b - (alpha * b_t)
        if i % 10 == 0:
            print(f'w={w}, b={b}, cost={cost_function(x, y, w, b)}')
    return w, b

In [10]:
w, b = gradient_descent(x_train, y_train, w, b, 0.00714, cost, gradient, 1000)

w=[ 25.47018622  78.30596587  32.27792222 -11.97597953], b=113.68374639315961, cost=15194.632409965114
w=[ 31.2389634   95.83163615  40.53147666 -13.67506037], b=116.65237463069684, cost=6082.647631459326
w=[ 32.62810574  99.63494169  43.30667626 -13.22204681], b=117.16878071517891, cost=4006.381563646822
w=[ 33.00810272 100.28685925  44.8099625  -12.44777356], b=117.12134021407768, cost=2875.3564531031293
w=[ 33.15027991 100.20544247  46.00369431 -11.73567648], b=116.94265966557614, cost=2160.964685617432
w=[ 33.23217633  99.94639029  47.10919847 -11.14575991], b=116.73266750463878, cost=1701.6392374656164
w=[ 33.2954713   99.63945354  48.17832141 -10.66978358], b=116.51509244491618, cost=1403.8940812642854
w=[ 33.35059391  99.31653689  49.22398081 -10.28838661], b=116.29607067172039, cost=1208.860128090129
w=[33.4006538  98.98673022 50.25001928 -9.98322277], b=116.07756641486792, cost=1079.1920761538006
w=[33.44694241 98.65353596 51.25805789 -9.73900864], b=115.86046775549669, cost=9

Let's now predict the value of the house like before

In [11]:
print(f'Value for a house with size=1.275 and numberOfBedrooms=4 and numberOfBathrooms=1 and age=12 is {predict(x_train[0], w, b):.3f}k$')
print(f'The actual price is {y_train[0]}k$')

Value for a house with size=1.275 and numberOfBedrooms=4 and numberOfBathrooms=1 and age=12 is 470.337k$
The actual price is 452.983k$


In [12]:
predicted_values = []
for i in range(m):
    predicted_values.append(predict(x_train[i], w, b))
df['Prediction (1k $)'] = pd.Series(predicted_values)
df

Unnamed: 0,Size (1k feet squared),Number of bedrooms,Number of bathrooms,Age,Price(1k $),Prediction (1k $)
0,1.275,4.0,1.0,12.0,452.983,470.336946
1,1.674,5.0,2.0,6.0,673.983,716.920429
2,2.0,6.0,3.0,1.0,983.992,952.355187
3,0.987,2.0,1.0,34.0,122.111,111.253271
4,1.275,4.0,1.0,4.0,555.211,538.930877


As it seems, we've been able to predict not too close to the actual price but somehow promising.