# Problem Statement

You will use the motivating example of housing price prediction. The training dataset contains three examples with four features (size, bedrooms, floors and, age) shown in the table below.  Note that, unlike the earlier labs, size is in sqft rather than 1000 sqft. This causes an issue, which you will solve in the next lab!

| Size (sqft) | Number of Bedrooms  | Number of floors | Age of  Home | Price (1000s dollars)  |   
| ----------------| ------------------- |----------------- |--------------|-------------- |  
| 2104            | 5                   | 1                | 45           | 460           |  
| 1416            | 3                   | 2                | 40           | 232           |  
| 852             | 2                   | 1                | 35           | 178           |  

You will build a linear regression model using these values so you can then predict the price for other houses. For example, a house with 1200 sqft, 3 bedrooms, 1 floor, 40 years old.  

Please run the following code cell to create your `x_train` and `y_train` variables.

In [1]:
import copy, math
import numpy as np
import matplotlib.pyplot as plt

In [2]:
# Features and Output set declaration
x_train=np.array([[2104,5,1,45],[1416,3,2,40],[852,2,1,35]])
y_train=np.array([460,232,178])

In [3]:
# initializing weights and bias
w_init = np.array([ 0.39133535, 18.75376741, -53.36032453, -26.42131618])
b_init = 785.1811367994083

### Prediction/Predicted Output, $$ f_{\mathbf{w},b}(\mathbf{x}^{(i)}) = \mathbf{w} \cdot \mathbf{x}^{(i)} + b   $$ 

In [4]:
# prediction funtion that returns predicted output y = f w,b(x)
def predict(w,b,x):
    p = np.dot(w,x)+b
    return p

### Cost Function,  $J(\mathbf{w},b)$ is:
$$J(\mathbf{w},b) = \frac{1}{2m} \sum\limits_{i = 0}^{m-1} (f_{\mathbf{w},b}(\mathbf{x}^{(i)}) - y^{(i)})^2 $$ 
where:
$$ f_{\mathbf{w},b}(\mathbf{x}^{(i)}) = \mathbf{w} \cdot \mathbf{x}^{(i)} + b   $$ 

In [5]:
# cost function.
def compute_cost(x,y,w,b):
    m=x.shape[0]
    
    cost = 0.0
     
    for i in range(m):
        f_wb_i = predict(w,b,x[i])
        cost +=(f_wb_i-y[i])**2
        
    cost = cost/(2*m)
    
    return cost
    

In [6]:
cost=compute_cost(x_train,y_train,w_init,b_init)
cost

1.5578904045996674e-12

## Gradient Descent

<img align="left" src="resources\images\gradient_descent.png"    style=" width:280px; padding: 10px;  " /> 

### Compute Gradient
- outer loop over all m examples. 
    - $\frac{\partial J(\mathbf{w},b)}{\partial b}$ for the example can be computed directly and accumulated
    - in a second loop over all n features:
        - $\frac{\partial J(\mathbf{w},b)}{\partial w_j}$ is computed for each $w_j$.

In [7]:
# function for calculating error - Predicted_Output - Given_Output
"""
Here w,b,x and y 1D numpy arrays
"""
def calculate_error(x,y,w,b):
    error=predict(w,b,x) - y
    return error

In [8]:
"""
    Computes the gradient for linear regression 
    Args:
      X (ndarray (m,n)): Data, m examples with n features
      y (ndarray (m,)) : target values
      w (ndarray (n,)) : model parameters  
      b (scalar)       : model parameter
      
    Returns:
      dj_dw (ndarray (n,)): The gradient of the cost w.r.t. the parameters w. 
      dj_db (scalar):       The gradient of the cost w.r.t. the parameter b. 
"""


def calculate_gradient(x,y,w,b):
    m,n=x.shape    #(number of examples, number of features)
    dj_dw=np.zeros((n,)) # initially all the dj/dw are 0
    dj_db=0.
    
    """ 
    """
    for i in range(m):
        error=predict(w,b,x[i]) - y[i]  # predict = f w,b(x[i]) = wx[i] + b
    
        for j in range(n):
            dj_dw[i]=dj_dw[j]+error*x[i,j]
        
        dj_db=dj_db+error
        
    dj_dw=dj_dw/m
    dj_db=dj_db/m
    
    return dj_db,dj_dw

In [9]:
def  gradient_descent(x,y,w_in,b_in,alpha,num_iters):
    
    
    """
    Performs batch gradient descent to learn w and b. Updates w and b by taking 
    num_iters gradient steps with learning rate alpha
    
    Args:
      x (ndarray (m,n))   : Data, m examples with n features
      y (ndarray (m,))    : target values
      w_in (ndarray (n,)) : initial model parameters  
      b_in (scalar)       : initial model parameter
      alpha (float)       : Learning rate
      num_iters (int)     : number of iterations to run gradient descent
      
    Returns:
      w (ndarray (n,)) : Updated values of parameters 
      b (scalar)       : Updated value of parameter 
    """
    
    # An array to store cost J and w's at each iteration primarily for graphing later
    j_history = []
    
    w=copy.deepcopy(w_in)
    b=b_in
    
    
    for i in range(num_iters):
        
        # Calculate the gradient and update the parameters
        dj_db,dj_dw = calculate_gradient(x, y, w, b)   ##None

        # Update Parameters using w, b, alpha and gradient
        w = w - alpha * dj_dw               ##None
        b = b - alpha * dj_db               ##None
        
    
    return w,b,j_history

In [11]:
# initialize parameters
initial_w = np.zeros_like(w_init)
initial_b = 0.
print(initial_w)
# some gradient descent settings
iterations = 1000
alpha = 5.0e-7
# run gradient descent 
w_final, b_final, j_hist = gradient_descent(x_train,y_train,initial_w,initial_b,alpha,iterations)
print(f"b,w found by gradient descent: {b_final:0.2f},{w_final} ")
m,_ = x_train.shape
for i in range(m):
    print(f"prediction: {np.dot(x_train[i], w_final) + b_final:0.2f}, target value: {y_train[i]}")

[0. 0. 0. 0.]
b,w found by gradient descent: -0.00,[ 0.21946227 -0.38661683  0.02068947  0.        ] 
prediction: 459.83, target value: 460
prediction: 309.64, target value: 232
prediction: 186.23, target value: 178
