# 3.2 神经网络表示 Neural Network Representation

* Input layer
* Hidden layer
* Output layer

> 一般input layer 不计入层数 比如 下边的 2 Layer NN

![image.png](attachment:image.png)

### 2 Layer NN

$X=a^{[0]}=\begin{bmatrix}x_1 \\ x_2 \\ x_3\end{bmatrix}$ $\longrightarrow$ Input layer

$a^{[1]}=\begin{bmatrix}a_1^{[1]} \\ \vdots \\ a_m^{[1]}\end{bmatrix}$ $\longrightarrow$ Hidden layer

$a^{[2]}=\hat{y}$ $\longrightarrow$ Output layer

hidden layer 有两个隐藏参数 $w^{[1]}, b^{[1]}$

* $w^{[1]}$  shape $(4,3)$ means 4 nodes and 3 input parameters

* $b^{[1]}$ shape $(4,1)$ means 4 nodes 

output layer 的两个参数 $w^{[2]}, b^{[2]}$
* $w^{[2]}$  shape $(1,4)$ means 4 输入 and 3 输出 parameters

* $b^{[2]}$ shape $(1,1)$ means 1 输入 1 个输出 

In [3]:
import numpy as np

In [2]:
np.random.randn(5,1)

array([[ 0.30642787],
       [ 0.35995064],
       [ 0.17370543],
       [-0.35375639],
       [ 0.05706733]])

In [3]:
np.random.randn(1,5)

array([[ 0.65621786,  0.21875675, -1.52862104,  0.27477083,  0.37541728]])

> 维度：二维数据，列，行

> 可以把节点看成是一个黑盒子，输入黑盒子输出

In [16]:
a = [1,2,3,4]
a = np.array(a)

In [25]:
b=np.append([a],[a],axis=0)
print(b)
print(b.shape)

[[1 2 3 4]
 [1 2 3 4]]
(2, 4)


In [26]:
b=np.append([a],[a],axis=1)
print(b)
print(b.shape)

[[1 2 3 4 1 2 3 4]]
(1, 8)


In [51]:
b = [a],[a]
d = a,a,a

In [40]:
print(b)

([array([1, 2, 3, 4])], [array([1, 2, 3, 4])])


In [41]:
c = np.array(b)
print(c.shape)

(2, 1, 4)


In [42]:
c.reshape(2,4)

array([[1, 2, 3, 4],
       [1, 2, 3, 4]])

In [52]:
print(d)
d = np.array(d)

(array([1, 2, 3, 4]), array([1, 2, 3, 4]), array([1, 2, 3, 4]))


In [53]:
print(d.shape)

(3, 4)


# 3.9 Gradient descent

### 2 Layer NN

Parameters: $w^{[1]}, b^{[1]},w^{[2]},b^{[2]}$

dims: $n_x=n^{[0]},n^{[1]},n^{[2]}=1$

$w^{[1]}: (n^{[1]},n^{[0]})$, 

$b^{[1]}: (n^{[1]},1),$

$w^{[2]}: (n^{[2]},n^{[1]})$

$b^{[2]}: (n^{[2]},1)$

cost function: $\mathcal{J}(w^{[1]}, b^{[1]},w^{[2]},b^{[2]})=\frac{1}{m}\sum\limits^{n}_{i=1}\mathcal{L}(\hat{y},y), where (\hat{y}=a^{[2]})$

$\mathrm{d}w^{[1]}=\frac{\partial{\mathcal{J}}}{\partial{w^{[1]}}}$

$\mathrm{d}b^{[1]}=\frac{\partial{\mathcal{J}}}{\partial{b^{[1]}}}$

$w^{[1]}:=w^{[1]}-\alpha \mathrm{d}w^{[1]}$

$b^{[1]}:=b^{[1]}-\alpha \mathrm{d}b^{[1]}$

back propagation (反向传播) and forward propagation (正向传播)

### Forward propagation

$Z^{[1]}=w^{[1]}X+b^{[1]}$;

$A^{[1]}=g^{[1]}(Z^{[1]})$;

$Z^{[2]}=w^{[2]}A^{[1]}+b^{[2]}$

$A^{[2]}=g^{[2]}(Z^{[2]})=\sigma{(Z^{[2]})}$;

### Backward propagation

$\mathrm{d}Z^{[2]}=A^{[2]}-T; Y=[y^{(1)},y^{(2)},\ldots,y^{(m)}]$

$\mathrm{d}W^{[2]}=\frac{1}{m}\mathrm{d}Z^{[2]}A^{[1]T}$

$\mathrm{d}b^{[2]}=\frac{1}{m}np.sum(\mathrm{d}Z^{[2]},axis=1,keepdims=True)$

> np.sum <br>axis=0(column) axis=1(row) <br>
> keepdims=True -> (n,...,1)<br>
> keepdims=False -> (n,)

$\mathrm{d}Z^{[1]}=W^{[2]T}\mathrm{d}Z^{[2]}\times g'^{[1]}(Z^{[1]}) $

$\mathrm{d}W^{[1]}=\frac{1}{m}\mathrm{d}Z^{[1]}X^T$

$db^{[1]}=\frac{1}{m}np.sum(\mathrm{d}Z^{[1]},axis=1,keepdims=True)$