# Additional Part
using Numpy and vertor

<a name="toc_40291_2"></a>
# Problem Statement

Let's use the same two data points as before - a house with 1000 square feet sold for \\$300,000 and a house with 2000 square feet sold for \\$500,000.

| Size (1000 sqft)     | Price (1000s of dollars) |
| ----------------| ------------------------ |
| 1               | 300                      |
| 2               | 500                      |

1. Activation Fucntion 

So far in this course, you have developed a linear model that predicts $f_{w,b}(x^{(i)})$:
$$f_{w,b}(x^{(i)}) = wx^{(i)} + b \tag{1}$$

In [None]:
import numpy as np
import matplotlib.pyplot as plt
x_data = np.array([1, 2])     # Size
y_data = np.array([300, 500]) # Price

In [None]:
print(f"x_data: {x_data}")
print(f"x_data.shape : {x_data.shape}")


In [None]:
def Linear_model(x, w, b):
    '''
    Input:
        w : Int, representing weight
        b : Int, Bias
        x : NumpyArray, repersenting Size of the house 
    Output:
        y_hat : NumpyArray, prediction of Price, based on y = w * x
    '''  
    f_wb = w * x + b


    y_hat = f_wb
        
    return y_hat

# Set initial w value
w_init = 250
b_init = 25
y_hat = Linear_model(x_data, w_init, b_init)
print(f"The result of prediction of y: {y_hat}")


# 2.Cost Function
In linear regression, you utilize input training data to fit the parameters $w$,$b$ by minimizing a measure of the error between our predictions $f_{w,b}(x^{(i)})$ and the actual data $y^{(i)}$. The measure is called the $cost$, $J(w,b)$. In training you measure the cost over all of our training samples $x^{(i)},y^{(i)}$
$$J(w,b) = \frac{1}{2m} \sum\limits_{i = 0}^{m-1} (f_{w,b}(x^{(i)}) - y^{(i)})^2\tag{2}$$ 

In [None]:
#Cost Function
def Cost_function(x, y, w, b):
    '''
    Input:
        w : Int, representing weight
        b : Int, Bias
        x : NumpyArray, representing Size of the house 
        y : NumpyArray, representing Price of the house
    Output:
        Cost : Int
    '''  

    m = len(y)
    cost = 0
    f_wb = Linear_model(x, w, b)
    
    #TODO:
    val = np.sum((f_wb - y) ** 2)
    cost += val

    cost /= 2*m

    
    return cost
        
cost = Cost_function(x_data, y_data, w_init, b_init)





In [None]:
# Plot, do not care
from mpl_toolkits.mplot3d import Axes3D
from matplotlib import cm
def plot_wb(x, y):
    c = 300
    d = 200
    len = (c-1) * (d-1)
    w = np.zeros(len)
    b = np.zeros(len)
    cost = np.zeros(len)
    for i in range(0,c-1):
        for j in range(0,d-1):
            w[i * j] = i
            b[i * j] = j
            z = Cost_function(x, y, i, j)
            cost[i * j] = z
    fig = plt.figure(figsize = (10, 10))
    ax = fig.gca(projection = '3d')
    surf = ax.plot_trisurf(w,b,cost,cmap = cm.coolwarm)
    fig.colorbar(surf)
    plt.show()

plot_wb(x_data,y_data)

# 3. Gradient descent
The gradient is defined as:
$$
\begin{align}
\frac{\partial J(w,b)}{\partial w}  &= \frac{1}{m} \sum\limits_{i = 0}^{m-1} (f_{w,b}(x^{(i)}) - y^{(i)})x^{(i)} \tag{4}\\
  \frac{\partial J(w,b)}{\partial b}  &= \frac{1}{m} \sum\limits_{i = 0}^{m-1} (f_{w,b}(x^{(i)}) - y^{(i)}) \tag{5}\\
\end{align}
$$

In [None]:
#Gradient Descent
def gradient_function(x, y, w, b):
    '''
    Input:
        w : Int, representing weight
        b : Int, Bias
        x : NumpyArray, representing Size of the house 
        y : NumpyArray, representing Price of the house
    Output:
        dj_dw : Float
        dj_db : Float
    '''  
    dj_dw = 0.0
    dj_db = 0.0
    f_wb = Linear_model(x, w, b)
    m = len(x)
    #TODO:
    val = f_wb - y
    dj_dw = np.dot(val,x)
    dj_db = np.sum(val)
    

    dj_dw /= m
    dj_db /= m
    
    return dj_dw,dj_db    

In lecture, *gradient descent* was described as:

$$\begin{align*} \text{repeat}&\text{ until convergence:} \; \lbrace \newline
\;  w &= w -  \alpha \frac{\partial J(w,b)}{\partial w} \tag{3}  \; \newline 
 b &= b -  \alpha \frac{\partial J(w,b)}{\partial b}  \newline \rbrace
\end{align*}$$
where, parameters $w$, $b$ are updated simultaneously.  


Here *simultaniously* means that you calculate the partial derivatives for all the parameters before updating any of the parameters.

In [None]:
def Gradient_Descent(x, y , w_init, b_init, lr, iteration):
    '''
    Input:
        w_init : Int, initial weight

        b_init : Int, initial bias

        x : NumpyArray, representing Size of the house 

        y : NumpyArray, representing Price of the house

    Output:
        cost_history: List, recording cost

        w_history: List, recording weight

        b_history: List, recording bias
    '''  
    #For drawing figure, do not care
    cost_history = []
    w_history = []
    b_history = []
    #Start Here
    w = w_init
    b = b_init
    
    for num in range(iteration):

        
        dj_dw, dj_db = gradient_function(x, y, w, b)
        w = w - dj_dw * lr
        b = b - dj_db * lr

        #For drawing figure, do not care
        cost = Cost_function(x,y,w,b)
        cost_history.append(cost)
        w_history.append(w)
        b_history.append(b)
        if (num % 100 == 0):
            print(f"Iter:{num:4}, Cost : {cost:0.4e}")
                  

    return w,cost,w_history,cost_history,b_history

# 4. Modulation Part
lr : learning rate \
iteration : the total amount of trainig \
You can justify lr and iteration to see the ultimate effect

In [None]:
#Hyperparametres
w_init = 0
b_init = 0
lr = 0.01
iteration  = 10000
w_new,cost_new,w_history,cost_history,b_history = Gradient_Descent(x_data,y_data,w_init,b_init,lr,iteration)


In [None]:
plt.plot(cost_history)
w_history[-1]