# Semana 2: Neural Network Basics

## Logistic Regression as NN

### NN programing
There's forward propagation and backward propagation

* Logistic Regression: Binary classification algorithm

Cat exemple: 
            0 = no cat
            1 = cat   
So y = 0 or 1

Input = x[n]
(vector with RGB color density of each px in the picture) <br/>
**n** = 3*#px

### Trainig exemple
Notation $(x, y)$, x in $IR^n$

A training set is made of **m** trainig exemples. <br/> ${(x_1, y_1),(x_2,y_2),...,(x_m,y_m)}$

$$X_(n*m) =  \begin{equation}
  \begin{bmatrix}
    | & | & |\\
    x_1 & x_2 & x_m\\
    | & | & |
  \end{bmatrix}
  \label{eq:aeqn}
\end{equation}$$



$$Y_(1*m)$ =  $\begin{equation}
  \begin{bmatrix}
    y_1 & y_2 & y_m  
  \end{bmatrix}
  \label{eq:aeqn}
\end{equation}$$

### Logistic regression
 Given $x$ find $\hat{y} = $ probability of $(y=1)$
 
 $0 \le \hat{y} \le 1$
 
 Output parameters:<br/>
 $w$ in $IR, b$ in $IR$
 
 $$\hat{y} = \sigma(w^T*x+b)$$ 
 
 $$\sigma(z) = 1/(1+e^(-z))$$

### Cost function
To train parameters w and b. It measures how well the NN is doing on the training set.

$$J(w, b) = \frac{1}{m}\sum_{i=0}^m *L(\hat{y_i}, y_i)$$

Loss function measures how well the NN did on a particular training exemple.

$$L(\hat{y}, y) = - (y*log(\hat{y})+(1-y)*log(1-\hat{y})$$


### Gadient descent 
Main objective is to find $w$ and $b$ that minimize $J(w, b)$. <br/>
It's a convex function (at least should be) with only one minimal. $w$ and $b$ can be initialized wtih any value and will always end up at the global minimun.   
![image.png](https://miro.medium.com/max/1400/1*j4-2rXBSDEYPTQgefHX5qw.png)
![image.png](https://hackernoon.com/hn-images/0*rBQI7uBhBKE8KT-X.png)



Repeat <br/>
$$w:=w-\alpha*\frac{dJ(w)}{dw}$$
 
or $$w:=w-\alpha*dw$$

$\alpha$ is the learning rate. It controls the step after each reetition

### Computation graph
Explains why forward / backward propagation

$$J(a, b, c) = 3(a+b*c)$$

$u = b*c$<br/>
$v = a+u$<br/>
$J = 3*v$<br/>

$\frac{dJ}{dv} = 3$

$\frac{dJ}{da} = 3 = \frac{dJ}{dv}*\frac{dv}{da} = da$

$\frac{dv}{da} = 1$

$\frac{dJ}{du} = \frac{dJ}{dv}*\frac{dv}{du} = 3 $

$\frac{dJ}{db} = \frac{dJ}{du}*\frac{du}{db} = 6$

$\frac{dJ}{dc} = \frac{dJ}{du}*\frac{du}{dc} = 9$

### Logistic regression gradient descent 

 $$\hat{y} = \sigma(w^T*x+b)$$ 
 $$L(\hat{y}, y) = - (y*log(\hat{y})+(1-y)*log(1-\hat{y})$$

$$z = w_1*x_1+w_2*x_2 + b  -->  \hat{y} = a = \sigma(y)  -->  $L(a, y)$$

$da = \frac{-y}{a} + \frac{1-y}{1-a}$

$dz = \frac{dL}{da}*\frac{da}{dz} = a-y$

$dw_1 = x_1*dz$

$dw_2 = x_2*dz$

$db = dz$


### Grdient descent on m exemples
$$J(w, b) = \frac{1}{m}\sum_{i=0}^m *L(\hat{y_i}, y_i)$$

$$\frac{dJ}{dw_1} = \frac{1}{m}*\sum_{i=0}^m *\frac{L(a_i, y_i}{dw_1}$$



In [None]:
impport numpy as np 
J = 0
w1 = 0
w2 = 0

for i in range(m):
    z[i] = w.T*x[i]+b
    a[i] = 1(1+np.exp(-z[i]))
    J += -(y[i]*np.log(a[i]+(1-y[i])*np.log(1-a[i])))
    dz[i] = a[i] - y[i]
    
    dw1 += x1[i]*dz[i]
    dw2 += x2[i]*dz[i]
    db += dz[i]
    
J /= m
dw1 /= m
dw2 |= m 
db /= m

w1 := w1 - alpha*dw1 
w2 := w2 - alpha*dw2 
b := b - alpha*db 

BUUUUT should never use for loops because they reduce the efficiency. Too slow for big data sets. Use vectorization instead. But they are necessary for iterating through the exemples.

## Python and Vectorization
### Vectorization
$z = w^T*x+b$ 

$w$ and $b$ in $IR^n$

In [None]:
impport numpy as np 
J = 0
dw = np.zeros(n, 1)

for i in range(m):
    z[i] = w.T*x[i]+b
    a[i] = 1(1+np.exp(-z[i]))
    J += -(y[i]*np.log(a[i]+(1-y[i])*np.log(1-a[i])))
    dz[i] = a[i] - y[i]
    
    dw += x[i]*dz[i]
    db += dz[i]
    
J /= m
dw /= m
db /= m

w1 := w1 - alpha*dw1 
w2 := w2 - alpha*dw2 
b := b - alpha*db

### Vectorizing Logistic Regression
$Z = [z_1, z_2, ..., z_m] = w^T*X+B$

$B = [b, b, ..., b]$

$A = [a_1, a_2, ..., a_m]$

Z = np.dot(w.T, X)+b

### Vectorizing logistic regression's gradient output
$dZ = [dz_1, dz_2, ..., dz_m] = A -Y$

$Y = [y_1, y_2, ..., y_m]$ (y between 1 and 0) 

$db = \frac{1}{m}\sum_{i=0}^m dz_i$

In [None]:
db = 1/m * np.sum(dZ)
dw = 1/m * X*dZ.T

$dw$ in $IR$

### Final result

In [None]:
z = np.dot(w.T, X)+b
A = sigmoid(Z)
dZ = A - Y
dw = 1/m * X*dZ.T
db = 1/m * np.sum(dZ)

w := w - alpha*dw
b := b - alpah*db