# Binary Classification

## Notation

Without use a explicit for loop.

* Forward propagation
* Backpropagation

unroll all these pixel values.

X = [255, 251,  ..., 255, 134, ...] (64*64*3)  
Y = [$y^{1}$, $y^{2}$, $y^{3}$, $y^{4}$, $y^{5}$, ...]

x $\Large \longrightarrow$ y


Notation

m tranining examples

M_test = #test examples
M = M_train

$x \epsilon R^{nx}$

$X \epsilon R^{nx x m}$

$Y \epsilon R^{1, m}$

## Logistic Regression

$$x  = \hat{y} = P(y=1|x)$$

parameters 

$w \epsilon R ^{nx}, b\epsilon R$

Output $\hat{y} = \sigma (w^Tx + b)$

$0 \leq \hat{y} \leq 1$

Sigmoid

$\sigma(z) = \frac{1}{1+e^{-z}}$

if z large $\sigma(z) = 1$  
if z large negative $\sigma(z) = 0$

### Cost Function

want $\hat{y}^{(i)} = y{(i)}$

$z^{i} = w^Tx^{i} + b$

i-th example


Loss (error) functions:

$L(\hat{y}, y) = \frac{1}{2} (\hat{y} - y)^2$ (Non - convex)

$L(\hat{y}, y) = -\left(ylog\hat{y}+ (1-y)log(1-\hat{y})\right)$

Cost function

$J(w, b) = \frac{1}{m}\sum_{i=1}^{m}L(\hat{y}, y) = \frac{-1}{m} \sum^{m}_{i=1}[\left(y^{i}log\hat{y}^{i}+ (1-y^{i})log(1-\hat{y}^{i})\right)]$

## Gradient descent

repeat {
$$w : = w - \alpha \frac{\partial J(w, b)}{\partial w}$$

$$w : = w - \alpha \frac{\partial J(w, b)}{\partial b}$$


$$dw = \frac{\partial J(w, b)}{\partial w}$$

$$db = \frac{\partial J(w, b)}{\partial b}$$
}


# Computation graph

<img align='left' src='images/computing_derivatives.PNG' width='800'/>

# Gradient Descent

$z = w_zx_1 + w_2x_2 + b$

$L(\hat{y}, y) =-\left(ylog\hat{y}+ (1-y)log(1-\hat{y})\right)$

$\hat{y}  =  a$
$\hat{y} = a = \sigma(z)$


$$da = \frac{dL(a, y)}{da} = -\frac{y}{a}+\frac{1-y}{1-a}$$

$$\frac{da}{dz} =  a (1 -a) = \frac{d}{dz}\left(\frac{1}{1+e^{-z}}\right)$$

$$dz = \frac{dL(a, y)}{dz} = a - y$$


$$\frac{\partial L}{\partial w_1} = "dw_1" = x_1 * dz$$
$$\frac{\partial L}{\partial w_2} = "dw_2" = x_1 * dz$$

$$\frac{\partial L}{\partial b} = "db" = dz$$


$w_1 = w_1 - \alpha dw_1$  
$w_2 = w_1 - \alpha dw_2$  
$b = b - \alpha db$

## Gradient descent on m examples

$J(w, b) = \frac{1}{m}\sum_{i=1}^{m}L(\hat{y}, y) = \frac{-1}{m} \sum^{m}_{i=1}[\left(y^{i}log\hat{y}^{i}+ (1-y^{i})log(1-\hat{y}^{i})\right)]$


$a{i} = \hat{y}^{i} = \sigma(z^{i}) = \sigma(w^Tx^{i}+b)$


$$\frac{\partial}{\partial w_1}J(w, b) = \frac{1}{m}\sum_{i=1}^{m}\frac{\partial}{\partial w_1}L(\hat{y}, y) $$


$$dw^{i} = \frac{\partial}{\partial w_1}L(\hat{y}, y) $$


<img align='left' src='images/gradient_descent_m_examples.PNG' width='800'/>

# Vectorizacion

In [3]:
import time
import numpy as np
a = np.random.rand(1000000)
b = np.random.rand(1000000)
tic = time.time() 
c=np.dot(a, b) 
toc  =  time.time()
print("Vectorized version:" + str(1000*(toc-tic)) +" ms")

c=0 
tic  =  time.time()
for i in range(1000000):
    c += a[i]*b[i]
toc  =  time.time()
print("For Loop:" + str(1000*(toc-tic)) +" ms")

Vectorized version:1.9958019256591797 ms
For Loop:687.1130466461182 ms


## Vectorizing logistic regression

# Vectorizing Logistic Regression's Gradient Output

 # Broadcasting in Python

Python code run faster

# A note on python/numpy vectors

In [6]:
import numpy as np
a = np.random.randn(4, 3) # a.shape = (4, 3)
b = np.random.randn(3, 2) # b.shape = (3, 2)
c = a * b
c.shape

ValueError: operands could not be broadcast together with shapes (4,3) (3,2) 