## Multi Varaible Linear Regression

In this notebook I cover the concept of `multivariable linear regression`. The difference between `multi variable linear regression` and `simple linear regresssion` is about the number of *independent* variables. In crast with `linear regresssion` the `multi variable linear regression` has more than one *independent* vairable. 

The $x$ data is a matrix of features. There are are $n$ features and $m$ training examples Each row of the matrix represents one example. When you have $m$ training examples. 
- $m$ == number of rows
- $n$ == number of columns


$$\mathbf{X} = 
\begin{pmatrix}
 x^{(0)}_0 & x^{(0)}_1 & \cdots & x^{(0)}_{n-1} \\ 
 x^{(1)}_0 & x^{(1)}_1 & \cdots & x^{(1)}_{n-1} \\
 \cdots \\
 x^{(m-1)}_0 & x^{(m-1)}_1 & \cdots & x^{(m-1)}_{n-1} 
\end{pmatrix}
$$


The data includes the house specification and the price. The features are *size*, *number of beds*, *age of home*, and the target values are *price*.

In [14]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

In [6]:
data_dict ={
    'size' : [200, 1000, 2000],
    'number of beds': [1, 2, 4],
    'age of home' : [30, 40, 50],
    'price': [100, 230, 500]
}

In [7]:
df = pd.DataFrame(data_dict)

In [8]:
df

Unnamed: 0,size,number of beds,age of home,price
0,200,1,30,100
1,1000,2,40,230
2,2000,4,50,500


Since there are multiple features, there must be $n$ a vector of $w$ with $n$ elements. 

$$\mathbf{w} = \begin{pmatrix}
w_0 \\ 
w_1 \\
\cdots\\
w_{n-1}
\end{pmatrix}
$$

but $b$ is a scalar value. 

### Model

The model has $n$ features thus a vector $w$. 

$$ f_{\mathbf{w},b}(\mathbf{x}) =  w_0x_0 + w_1x_1 +... + w_{n-1}x_{n-1} + b \tag{1}$$
or in vector notation:
$$ f_{\mathbf{w},b}(\mathbf{x}) = \mathbf{w} \cdot \mathbf{x} + b  \tag{2} $$ 
where $\cdot$ is a vector `dot product`

In [9]:
def f_w_b(x, w, b):
    pred = np.dot(x, w) + b
    return pred

### cost function for multivariable linear regression
The equation for the cost function with multiple variables $J(\mathbf{w},b)$ is:
$$J(\mathbf{w},b) = \frac{1}{2m} \sum\limits_{i = 0}^{m-1} (f_{\mathbf{w},b}(\mathbf{x}^{(i)}) - y^{(i)})^2 \tag{3}$$ 
where:
$$ f_{\mathbf{w},b}(\mathbf{x}^{(i)}) = \mathbf{w} \cdot \mathbf{x}^{(i)} + b  \tag{4} $$ 

In [66]:
def cost_calc(x, y, w, b, f_w_b):
    m = x.shape[0]
    cost = 0
    for i in range(0, m):
        pred = f_w_b(x[i], w, b)
        #print(pred)
        cost += (pred-y[i])**2
    cost = (1/(2*m))*cost
    return cost

In [91]:
b_initial = 500
w_initial = np.array([ 0.2, 10, -0.1])

x = df.loc[:, df.columns != 'price'].to_numpy()
y = df.loc[:, df.columns == 'price'].to_numpy()

In [92]:
cost_calc(x, y, w_init, b_init, f_w_b)

array([204712.41751503])

### Gradient Descent for Multivariables

$$\begin{align*} \text{repeat}&\text{ until convergence:} \; \lbrace \newline\;
& w_j = w_j -  \alpha \frac{\partial J(\mathbf{w},b)}{\partial w_j} \tag{5}  \; & \text{for j = 0..n-1}\newline
&b\ \ = b -  \alpha \frac{\partial J(\mathbf{w},b)}{\partial b}  \newline \rbrace
\end{align*}$$

where, n is the number of features, parameters $w_j$,  $b$, are updated simultaneously and where  

$$
\begin{align}
\frac{\partial J(\mathbf{w},b)}{\partial w_j}  &= \frac{1}{m} \sum\limits_{i = 0}^{m-1} (f_{\mathbf{w},b}(\mathbf{x}^{(i)}) - y^{(i)})x_{j}^{(i)} \tag{6}  \\
\frac{\partial J(\mathbf{w},b)}{\partial b}  &= \frac{1}{m} \sum\limits_{i = 0}^{m-1} (f_{\mathbf{w},b}(\mathbf{x}^{(i)}) - y^{(i)}) \tag{7}
\end{align}
$$
* m is the number of training examples in the data set

    
*  $f_{\mathbf{w},b}(\mathbf{x}^{(i)})$ is the model's prediction, while $y^{(i)}$ is the target value

In [93]:
def gradient_calc(x, y, w, b):
    m, n = x.shape
    
    dj_dw = np.zeros((n,))
    dj_db = 0
    
    for i in range (0, m):
        diff = f_w_b(x[i], w, b) - y[i]
        for j in range(0, n):
            dj_dw[j] += diff*x[i, j]
        dj_db += diff
    
    dj_dw = dj_dw/m
    dj_db = dj_db/m
    
    return dj_dw, dj_db

In [94]:
tmp_dj_dw, tmp_dj_db = gradient_calc(x, y, w_init, b_init)
print(f'dj_db at initial w,b: {tmp_dj_db}')
print(f'dj_dw at initial w,b: \n {tmp_dj_dw}')

dj_db at initial w,b: [634.51447013]
dj_dw at initial w,b: 
 [617993.21258604   1378.42265253  24740.57880531]


In [95]:
def gradient_descent_calc(x, y, w, b, cost_calc, gradient_calc, alpha, num_iterations):
    
    costs = []
    w_b_s = []
    
    for i in range(0, num_iterations):
        dj_dw, dj_db = gradient_calc(x, y, w, b)
        
        w = w - alpha*dj_dw
        b = b - alpha*dj_db
        
        if i < 100000:
            costs.append(cost_calc(x, y, w, b, f_w_b))
        
        if i%100 == 0:
            print(f'iteration:{i}, cost:{cost_calc(x, y, w, b, f_w_b)}')
            
    return w, b, costs

In [96]:
w_input = np.zeros_like(w_initial)
b_input = 0.
# some gradient descent settings
iterations = 1000
alpha = 6e-8
# run gradient descent 
w_final, b_final, costs = gradient_descent_calc(x, y, w_input, b_input, cost_calc, gradient_calc, alpha, iterations)
print(f"b,w found by gradient descent: {b_final},{w_final} ")
m,_ = x.shape
for i in range(m):
    print(f"prediction: {np.dot(x[i], w_final) + b_final}, target value: {y[i]}")

iteration:0, cost:[42249.96991039]
iteration:100, cost:[477.00554879]
iteration:200, cost:[476.36190689]
iteration:300, cost:[475.72027138]
iteration:400, cost:[475.08061194]
iteration:500, cost:[474.44292248]
iteration:600, cost:[473.80719695]
iteration:700, cost:[473.17342928]
iteration:800, cost:[472.54161346]
iteration:900, cost:[471.91174346]
b,w found by gradient descent: [0.00087473],[0.24724536 0.00110022 0.02669521] 
prediction: [50.25190329], target value: [100]
prediction: [248.31624349], target value: [230]
prediction: [495.83075586], target value: [500]
