# Coursera Lab about Multiple Variable Linear Regression
### Andrew Ng

In [1]:
# Let's get started
import numpy as np
import matplotlib.pyplot as plt
np.set_printoptions(precision=2)  # reduced display precision on numpy arrays

Here is the dataset to work with:

| Size (sqft) | Number of Bedrooms  | Number of floors | Age of  Home | Price (1000s dollars)  |   
| ----------------| ------------------- |----------------- |--------------|-------------- |  
| 2104            | 5                   | 1                | 45           | 460           |  
| 1416            | 3                   | 2                | 40           | 232           |  
| 852             | 2                   | 1                | 35           | 178           |  


In [2]:
x_train = np.array([[2104, 5, 1, 45], [1416, 3, 2, 40], [852, 2, 1, 35]])
y_train = np.array([460,232,178])

Initilize w, b as the numbers below which are close to the optimum values. This is only for educational purposes.

In [3]:
b_init = 785.1811367994083
w_init = np.array([ 0.39133535, 18.75376741, -53.36032453, -26.42131618])

-----------------------------------------

# A: Model Prediction

### A_1: Model Prediction: element by element

$$ f_{\mathbf{w},b}(\mathbf{x}) =  w_0x_0 + w_1x_1 +... + w_{n-1}x_{n-1} + b \tag{1}$$

In [4]:
def predict_single_loop(x, w, b):
    """
    single predict using linear regression
    Args:
        x: example with multiple features
        w, b: model parameters
    Return:
        p: model prediction
    """
    
    n = x.shape[0]
    p = 0
    for i in range(n):
        p_i = w[i] * x[i]
        p = p + p_i
    p = p + b
    return p     

let's test this function using initialized model parameters (w, b)

In [5]:
# get a row from our training data
x_vec = x_train[0, :]

# make a prediction (now we are using the inital w, b)
# as it was mentioned before, they are initialized near optimum values.
f_wb = predict_single_loop(x_vec, w_init, b_init)

#print statements:
print(f"x_vec value: {x_vec}")
print(f"prediction: {f_wb}")

x_vec value: [2104    5    1   45]
prediction: 459.9999976194083


### A_2: Model Prediction: vector

$$ f_{\mathbf{w},b}(\mathbf{x}) = \mathbf{w} \cdot \mathbf{x} + b  \tag{2} $$ 

In [6]:
def predict(x, w, b):
    p = np.dot(x, w) + b
    return p

let's test this function using initialized model parameters (w, b)

In [7]:
# get a row from our training data
x_vec = x_train[0, :]

# make a prediction (now we are using the inital w, b)
# as it was mentioned before, they are initialized near optimum values.
f_wb = predict(x_vec, w_init, b_init)

#print statements:
print(f"x_vec value: {x_vec}")
print(f"prediction: {f_wb}")

x_vec value: [2104    5    1   45]
prediction: 459.9999976194083


--------------------------------------

# B: computing model cost

The equation for the cost function with multiple variables $J(\mathbf{w},b)$ is:
$$J(\mathbf{w},b) = \frac{1}{2m} \sum\limits_{i = 0}^{m-1} (f_{\mathbf{w},b}(\mathbf{x}^{(i)}) - y^{(i)})^2 \tag{3}$$ 
where:
$$ f_{\mathbf{w},b}(\mathbf{x}^{(i)}) = \mathbf{w} \cdot \mathbf{x}^{(i)} + b  \tag{4} $$ 

In [8]:
def compute_cost(x, y, w, b):
    """
    Args:
        x: Data, m example with n features
        y: target value
        w, b: model parameters
    """
    m = x.shape[0]
    cost = 0
    for i in range (m):
        f_wb_i = np.dot(x[i], w) + b
        cost = cost + (f_wb_i - y[i])**2
    cost = cost / (2 * m)
    return cost

In [9]:
# Compute and display cost using our pre-chosen optimal parameters. 
cost = compute_cost(x_train, y_train, w_init, b_init)
print(f'Cost at optimal w : {cost}')

Cost at optimal w : 1.5578904045996674e-12


------------------

# C: Gradient Descent With Multiple Variables

Gradient descent for multiple variables:

$$\begin{align*} \text{repeat}&\text{ until convergence:} \; \lbrace \newline\;
& w_j = w_j -  \alpha \frac{\partial J(\mathbf{w},b)}{\partial w_j} \tag{5}  \; & \text{for j = 0..n-1}\newline
&b\ \ = b -  \alpha \frac{\partial J(\mathbf{w},b)}{\partial b}  \newline \rbrace
\end{align*}$$

where, n is the number of features, parameters $w_j$,  $b$, are updated simultaneously and where  

$$
\begin{align}
\frac{\partial J(\mathbf{w},b)}{\partial w_j}  &= \frac{1}{m} \sum\limits_{i = 0}^{m-1} (f_{\mathbf{w},b}(\mathbf{x}^{(i)}) - y^{(i)})x_{j}^{(i)} \tag{6}  \\
\frac{\partial J(\mathbf{w},b)}{\partial b}  &= \frac{1}{m} \sum\limits_{i = 0}^{m-1} (f_{\mathbf{w},b}(\mathbf{x}^{(i)}) - y^{(i)}) \tag{7}
\end{align}
$$
* m is the number of training examples in the data set

    
*  $f_{\mathbf{w},b}(\mathbf{x}^{(i)})$ is the model's prediction, while $y^{(i)}$ is the target value


### C_1: Compute gradient

In [13]:
def compute_gradient(x, y, w, b):
    """
    Args:
        x: Data, m example with n features
        y: target value
        w, b: model parameters
    Returns:
        dj_dw which is an array with n (number of features) rows
        dj_db which is a scalar
    """
    
    m, n = x.shape # number of training data and number of features
    dj_dw = np.zeros((n,))
    dj_db = 0
    
    for i in range (m):               # iterate throug examples
        temp_err = (np.dot(x[i], w) + b) - y[i]
        
        for j in range (n):           # iterate throug features
            dj_dw[j] = dj_dw[j] + temp_err * x[i, j]            
        dj_db = dj_db + temp_err      # is cumulated for all the examples (outer loop)
        
    dj_dw = dj_dw / m                                
    dj_db = dj_db / m
    
    #print(m, n, dj_dw)    
    return dj_db, dj_dw

#compute_gradient(x_train, y_train, w_init, b_init)

### C_2: Compute gradient descent

In [19]:
def gradient_descent(x, y, w_init, b_init, cost_function, gradient_function, alpha, num_iters): 
    """
    Args:
        x                   : Data, m example with n features
        y                   : target value
        w_init, b_init      : initial model parameters (will be set to zero)
        cost_function       : function to compute cost
        gradient_function   : function to compute the gradient
        alpha               : Learning rate
        num_iters           : number of iterations to run gradient descent
    Returns:
        w                   : Updated values of parameters 
        b                   : Updated value of parameter 
      """
    
    for i in range (num_iters):
        # Calculate the gradient and update the parameters
        dj_db,dj_dw = gradient_function(x, y, w_init, b_init)
        print(f"iteration: {i}\n cost: ")
        
        # Update Parameters using w, b, alpha and gradient
        w = w_init - alpha * dj_dw               
        b = b_init - alpha * dj_db              
        
        return (w, b)

Let's run it

In [25]:
initial_w = np.zeros((4,))
initial_b = 0

iterations = 1000
alpha = 5.0e-7

w, b = gradient_descent(x_train, y_train, initial_w, initial_b, compute_cost, compute_gradient, alpha, iterations)
print(f"w: {w}\nb:{b}")

w: [2.41e-01 5.59e-04 1.84e-04 6.03e-03]
b:0.000145


# This still needs work
# 1401-04-24