# Deep learning specialization

## Chapter 1. Neural Networks and Deep Learning

### Neural Network basics

#### Tips on numpy

1) Avoid using rank 1 arrays, they are not column vectors nor row vectors

In [3]:
import numpy as np

a = np.random.rand(5)

print(a.shape)
print(a)

a = a.reshape(1, 5)
print(a.shape)
print(a)

(5,)
[ 0.50396686  0.68641748  0.99727146  0.78588024  0.1727076 ]
(1, 5)
[[ 0.50396686  0.68641748  0.99727146  0.78588024  0.1727076 ]]


2) Use assertions

In [4]:
assert(a.shape == (1, 5))

#### Explanation of Logistic Regression cost function

Our task is to model the probability $p(y|x)$, so we need to come up with the function, lets name it $\hat{y}$, that will satisfy the requirements: when $y = 1$, $p(y|x) = \hat{y}$ and when $y = 0$, $p(y|x) = 1 - \hat{y}$. These 2 cases described $p(y|x)$ so we are now going to try to write $p(y|x)$ in a single equation: 

\begin{align}
p(y|x) = \hat{y}^y (1 - \hat{y})^{1 - y}
\end{align}

Since $\log$ is strictly monotonically increasing function, instead of modeling $p(y|x)$ we can model $\log(p(y|x)$ and that would give us the same results. Introducing the log function will give us something like this:

\begin{align}
\log{p(y|x)} = y\log{\hat{y}} + (1 - y)\log{(1 - \hat{y})}
\end{align}

In machine learning we usually want to minimize the loss and that means maximizing the $p(y|x)$ and thus we define $L(y, \hat{y}) = -p(y|x)$. We can now define $\hat{y}$ as a simple polinomial function and add sigmoid to make it satisify probability requirements to be between $0$ and $1$. Thus $\hat{y} = \sigma(\theta^Tx + b)$. Also, when caluclating gradients sigmoid function has a convinient property $\sigma'(x) = \sigma(x) (1 - \sigma(x))$ so that is something that we should keep in mind.