# GRADIENT DESCENT FOR MULTIPLE LINEAR REGRESSION 

1.1 Goals
Extend our regression model routines to support multiple features
Extend data structures to support multiple features
Rewrite prediction, cost and gradient routines to support multiple features
Utilize NumPy np.dot to vectorize their implementations for speed and simplicity

In [21]:
import copy , math 

import numpy as np 
np.set_printoptions(precision=2) # reduced display precision on numpy arrays 




<a name="toc_15456_1.3"></a>
## 1.3 Notation
Here is a summary of some of the notation you will encounter, updated for multiple features.  

|General <img width=70/> <br />  Notation  <img width=70/> | Description<img width=350/>| Python (if applicable) |
|: ------------|: ------------------------------------------------------------||
| $a$ | scalar, non bold                                                      ||
| $\mathbf{a}$ | vector, bold                                                 ||
| $\mathbf{A}$ | matrix, bold capital                                         ||
| **Regression** |         |    |     |
|  $\mathbf{X}$ | training example matrix                  | `X_train` |   
|  $\mathbf{y}$  | training example  targets                | `y_train` 
|  $\mathbf{x}^{(i)}$, $y^{(i)}$ | $i_{th}$Training Example | `X[i]`, `y[i]`|
| m | number of training examples | `m`|
| n | number of features in each example | `n`|
|  $\mathbf{w}$  |  parameter: weight,                       | `w`    |
|  $b$           |  parameter: bias                                           | `b`    |     
| $f_{\mathbf{w},b}(\mathbf{x}^{(i)})$ | The result of the model evaluation at $\mathbf{x^{(i)}}$ parameterized by $\mathbf{w},b$: $f_{\mathbf{w},b}(\mathbf{x}^{(i)}) = \mathbf{w} \cdot \mathbf{x}^{(i)}+b$  | `f_wb` | 


In [22]:
X_train = np.array([[2104, 5, 1, 45], [1416, 3, 2, 40], [852, 2, 1, 35]])
y_train = np.array([460, 232, 178])

# data is stored in numpy array/matrix 

print(f"X shape : {X_train.shape},X type :{type(X_train)}")
print(X_train)
print(f"y shape : {y_train.shape} y Type : {type(y_train)}" )

print(y_train)

X shape : (3, 4),X type :<class 'numpy.ndarray'>
[[2104    5    1   45]
 [1416    3    2   40]
 [ 852    2    1   35]]
y shape : (3,) y Type : <class 'numpy.ndarray'>
[460 232 178]


In [23]:
b_init = 785.1811367994083 
w_init = np.array([ 0.39133535, 18.75376741, -53.36032453, -26.42131618])

print(f"winit shape : {w_init.shape}, b_init type {type(b_init)}")

winit shape : (4,), b_init type <class 'float'>


In [37]:
# let's use for loop to make prediction in our multiple linear regression 


def predic_single_loop(x ,w , b): 
    "prediction made with for loop function "
    m = x.shape[0]
    prediction_value = 0

    for i in range(m): 
        prediction_value += x[i]*w[i] 
    total_predicted_value = prediction_value + b
    return total_predicted_value






In [38]:
# get a row from our training data 

x_vec = X_train[0,:]

# make a prediction for that line 

f_wb = predic_single_loop(x_vec, w_init,b_init)
print(f"value of x vect : {x_vec}")
print(f"f_wb shape {f_wb.shape} , prediction : {f_wb}")

value of x vect : [2104    5    1   45]
f_wb shape () , prediction : 459.9999976194083


In [33]:
# Single prediction vector with numpy 


def predict(x , w ,b): 

    "make prediction using dot function from numpy "

    p = np.dot(x , w) + b 

    return p 

In [34]:
# get a row from our training data 
b_init = 785.1811367994083 
w_init = np.array([ 0.39133535, 18.75376741, -53.36032453, -26.42131618])

x_train_first_row = X_train[0,:]
print(f"x_train_first_row shape { x_train_first_row.shape}, x_vec value : {x_train_first_row}")


# make a prediction 

f_wb = predict(x_train_first_row, w_init, b_init)
print(f"prediction  : {f_wb}")

x_train_first_row shape (4,), x_vec value : [2104    5    1   45]
prediction  : 459.9999976194083


<a name="toc_15456_4"></a>
# 4 Compute Cost With Multiple Variables
The equation for the cost function with multiple variables $J(\mathbf{w},b)$ is:
$$J(\mathbf{w},b) = \frac{1}{2m} \sum\limits_{i = 0}^{m-1} (f_{\mathbf{w},b}(\mathbf{x}^{(i)}) - y^{(i)})^2 \tag{3}$$ 
where:
$$ f_{\mathbf{w},b}(\mathbf{x}^{(i)}) = \mathbf{w} \cdot \mathbf{x}^{(i)} + b  \tag{4} $$ 


In contrast to previous labs, $\mathbf{w}$ and $\mathbf{x}^{(i)}$ are vectors rather than scalars supporting multiple features.

In [64]:
def compute_cost(x,  y , w ,b ): 

    m = x.shape[0]
    cost = 0.0

    for i in range(m):
        f_wb_i = np.dot(x[i], w) +b 
        cost += (f_wb_i - y[i])**2
    cost /= (2*m)

    return cost 






In [65]:
# compute and display cost using our pre-chosen optimal parameters 

cost = compute_cost(X_train, y_train, w_init, b_init)

print(f'Cost at optimal w : {cost}')

Cost at optimal w : 1.5578904428966628e-12


<a name="toc_15456_5.1"></a>
## 5.1 Compute Gradient with Multiple Variables
An implementation for calculating the equations (6) and (7) is below. There are many ways to implement this. In this version, there is an
- outer loop over all m examples. 
    - $\frac{\partial J(\mathbf{w},b)}{\partial b}$ for the example can be computed directly and accumulated
    - in a second loop over all n features:
        - $\frac{\partial J(\mathbf{w},b)}{\partial w_j}$ is computed for each $w_j$.
   

In [None]:
def compute_gradient(x , y ,w ,b ): 
    m, n = x.shape
    dj_dw = np.zeros((n,))
    dj_db = 0.
    for i in range(m):

        err = (np.dot(x[i], w) +b ) - y[i]
        for j in range(n):
            dj_dw[j] = dj_dw[j] + err*x[i , j]
        dj_db = dj_db + err 
    dj_dw = dj_dw / m 
    dj_db = dj_db / m 

    return  dj_db , dj_dw



