In [15]:
import numpy as np

### 2 Layers Neural Network

In [17]:
X = [[0.2, 0.3, 0.5]]

W = np.array(
    [[2, -1, 1, 5], [1, -1, 2, 3], [-2, 1, 2, -1], [1, 1, 1, 1]],
    # [[1,-1, 2, 3], [-1,-1,2,3],[-2,1,1,-2], [-1,-1,2,2]],
    # [[2, 1, -1, 1], [1, 1, 2, 3], [-2, 1, 2, 1], [1, 1, 1, 1]]
)

W

array([[ 2, -1,  1,  5],
       [ 1, -1,  2,  3],
       [-2,  1,  2, -1],
       [ 1,  1,  1,  1]])

In [5]:
def softmax(z):
    Z = np.exp(z) / np.sum(np.exp(z))
    return Z

def sigmoid(z):
    return 1 / (1 + np.exp(-z))
    
def forward(W, b1, b2, x_mul, y):
    z = np.dot(X, W)
    a = sigmoid(z)
    
    return a

def output_layer(X, W):
    z = np.dot(X, W)
    a = softmax(z) # sigmoid activation function
    print('W:',W)
    return a

def gradient_descent(W, dW1, alpha=0.01):
    gradient = W - alpha * dW1
    
    return gradient

def dW(y_pred, y_true, X, dW):
    
    return 


# Multi-Variable Gradient Descent

<a name="toc_15456_2"></a>
# 2 Problem Statement

You will use the motivating example of housing price prediction. The training dataset contains three examples with four features (size, bedrooms, floors and, age) shown in the table below.  Note that, unlike the earlier labs, size is in sqft rather than 1000 sqft. This causes an issue, which you will solve in the next lab!

| Size (sqft) | Number of Bedrooms  | Number of floors | Age of  Home | Price (1000s dollars)  |   
| ----------------| ------------------- |----------------- |--------------|-------------- |  
| 2104            | 5                   | 1                | 45           | 460           |  
| 1416            | 3                   | 2                | 40           | 232           |  
| 852             | 2                   | 1                | 35           | 178           |  

You will build a linear regression model using these values so you can then predict the price for other houses. For example, a house with **1200 sqft, 3 bedrooms, 1 floor, 40 years old.**  

Please run the following code cell to create your `X_train` and `y_train` variables.

<a name="toc_15456_2.1"></a>
## 2.1 Matrix X containing our examples
Similar to the table above, examples are stored in a NumPy matrix `X_train`. Each row of the matrix represents one example. When you have $m$ training examples ( $m$ is three in our example), and there are $n$ features (four in our example), $\mathbf{X}$ is a matrix with dimensions ($m$, $n$) (m rows, n columns).


$$\mathbf{X} = 
\begin{pmatrix}
 x^{(0)}_0 & x^{(0)}_1 & \cdots & x^{(0)}_{n-1} \\ 
 x^{(1)}_0 & x^{(1)}_1 & \cdots & x^{(1)}_{n-1} \\
 \cdots \\
 x^{(m-1)}_0 & x^{(m-1)}_1 & \cdots & x^{(m-1)}_{n-1} 
\end{pmatrix}
$$
notation:
- $\mathbf{x}^{(i)}$ is vector containing example i. $\mathbf{x}^{(i)}$ $ = (x^{(i)}_0, x^{(i)}_1, \cdots,x^{(i)}_{n-1})$
- $x^{(i)}_j$ is element j in example i. The superscript in parenthesis indicates the example number while the subscript represents an element.  

Display the input data.

In [20]:
x_train = np.array([[2104, 5, 1, 45], [1416, 3, 2, 40], [852, 2, 1, 35]])
y_train = np.array([460, 232, 178])

In [21]:
print(x_train.shape)
print(y_train.shape)

(3, 4)
(3,)


<a name="toc_15456_2.2"></a>
## 2.2 Parameter vector w, b

* $\mathbf{w}$ is a vector with $n$ elements.
  - Each element contains the parameter associated with one feature.
  - in our dataset, n is 4.
  - notionally, we draw this as a column vector

$$\mathbf{w} = \begin{pmatrix}
w_0 \\ 
w_1 \\
\cdots\\
w_{n-1}
\end{pmatrix}
$$
* $b$ is a scalar parameter.  

For desmostration, we will use the following values for $\mathbf{w}$ and $b$:

<a name="toc_15456_3"></a>
# 3 Model Prediction With Multiple Variables
The model's prediction with multiple variables is given by the linear model:

$$ f_{\mathbf{w},b}(\mathbf{x}) =  w_0x_0 + w_1x_1 +... + w_{n-1}x_{n-1} + b \tag{1}$$
or in vector notation:
$$ f_{\mathbf{w},b}(\mathbf{x}) = \mathbf{w} \cdot \mathbf{x} + b  \tag{2} $$ 
where $\cdot$ is a vector `dot product`

To demonstrate the dot product, we will implement prediction using (1) and (2).

<a name="toc_15456_4"></a>
# 4 Compute Cost With Multiple Variables
The equation for the cost function with multiple variables $J(\mathbf{w},b)$ is:
$$J(\mathbf{w},b) = \frac{1}{2m} \sum\limits_{i = 0}^{m-1} (f_{\mathbf{w},b}(\mathbf{x}^{(i)}) - y^{(i)})^2 \tag{3}$$ 
where:
$$ f_{\mathbf{w},b}(\mathbf{x}^{(i)}) = \mathbf{w} \cdot \mathbf{x}^{(i)} + b  \tag{4} $$ 

In contrast to previous labs, $\mathbf{w}$ and $\mathbf{x}^{(i)}$ are vectors rather than scalars supporting multiple features.


In [None]:
def fwb(w, x):
    pass

In [24]:
def compute_cost(x, y, w, b): # Jwb
    pass

<a name="toc_15456_5"></a>
# 5 Gradient Descent With Multiple Variables
Gradient descent for multiple variables:

$$\begin{align*} \text{repeat}&\text{ until convergence:} \; \lbrace \newline\;
& w_j = w_j -  \alpha \frac{\partial J(\mathbf{w},b)}{\partial w_j} \tag{5}  \; & \text{for j = 0..n-1}\newline
&b\ \ = b -  \alpha \frac{\partial J(\mathbf{w},b)}{\partial b}  \newline \rbrace
\end{align*}$$

where, n is the number of features, parameters $w_j$,  $b$, are updated simultaneously and where  

$$
\begin{align}
\frac{\partial J(\mathbf{w},b)}{\partial w_j}  &= \frac{1}{m} \sum\limits_{i = 0}^{m-1} (f_{\mathbf{w},b}(\mathbf{x}^{(i)}) - y^{(i)})x_{j}^{(i)} \tag{6}  \\
\frac{\partial J(\mathbf{w},b)}{\partial b}  &= \frac{1}{m} \sum\limits_{i = 0}^{m-1} (f_{\mathbf{w},b}(\mathbf{x}^{(i)}) - y^{(i)}) \tag{7}
\end{align}
$$
* m is the number of training examples in the data set
* $x_{j}^{(i)}$: x value of feature j (column) in example i (row)
* $y^{(i)}$: target value of example i
*  $f_{\mathbf{w},b}(\mathbf{x}^{(i)})$ is the model's prediction, while $y^{(i)}$ is the target value


In [None]:
def dw(w, b, x, y):
    pass

def db(w, b, x, y):
    pass



In [23]:
m, n = np.shape(x_train)
print(x_train)
for i in range(m):
    print(f'Get all examples of feature {i}')
    print(x_train[i, :])

[[2104    5    1   45]
 [1416    3    2   40]
 [ 852    2    1   35]]
Get all examples of feature 0
[2104    5    1   45]
Get all examples of feature 1
[1416    3    2   40]
Get all examples of feature 2
[852   2   1  35]


**Implementation Explaination**
The w.r.t $w_j$ and $b$ are implemented in the following functions:

* $f_{w,b} X^{(i)} - y^{(i)}$ is the cost function for a single example. (single value)
* $x^{(i)} _j$ is x value of feature j (column) in example i (row)

notice: make sure how you implement the dot product, attention to array and matrix dimension for vector operations is utterly important. Question like "is this multiply by row or column vector?" is also vital to answer.

In [None]:
def gradient_descent():
  """
      Performs batch gradient descent to learn w and b. Updates w and b by taking 
      num_iters gradient steps with learning rate alpha
      
      Args:
      X (ndarray (m,n))   : Data, m examples with n features
      y (ndarray (m,))    : target values
      w_in (ndarray (n,)) : initial model parameters  
      b_in (scalar)       : initial model parameter
      cost_function       : function to compute cost
      gradient_function   : function to compute the gradient
      alpha (float)       : Learning rate
      num_iters (int)     : number of iterations to run gradient descent
      
      Returns:
      w (ndarray (n,)) : Updated values of parameters 
      b (scalar)       : Updated value of parameter 
  """
  
#   w = w - learning_rate*dJdw()
#   b = b - learning_rate*dJdb()
  
#   return w, b, J_history

In [None]:
# initialize parameters
initial_w = np.zeros_like(w_init)
initial_b = 0.
# some gradient descent settings
iterations = 1000
alpha = 5.0e-7

wj, bj, history = gradient_descent(x_train, y_train, initial_w, initial_b, iter=iterations, learning_rate=alpha)
print(f'w after gradient descent: {wj}')
print(f'b after gradient descent: {bj}')