<img src="images/L-Layer-Model.png">

**Initialization:**

```initialize_parameters_deep(layer_dims)```, returns ```parameters```

**Forward Propagation:**

```L_model_forward(X, parameters)```, returns ```AL, caches```
- ```linear_forward(A, W, b)```, returns ```Z, cache```
- *```sigmoid(Z)```*, *```relu(Z)```*, returns ```A, activation_cache```
- ```linear_activation_forward(A_prev, W, b, activation)```, returns ```A, cache```

**Cost Function:**

```compute_cost(AL, Y)```, returns ```cost```

**Backward Propagation:**

```L_model_backward(AL, Y, caches)```, returns ``` ```
- ```linear_backward(dZ, cache)```, returns ```dA_prev, dW, db```
- *```relu_backward(dA, activation_cache)```*, *```sigmoid_backward(dA, activation_cache)```*, returns ```dZ```
- ```linear_activation_backward(dA, cache, activation)```, returns ```dA_prev, dW, db```

**Update Parameters:**
- ```update_parameters(parameters, grads, learning_rate)```, returns ``` ```

## Initialization

### 3.2: ```initialize_parameters_deep(layer_dims)```
Initializes the parameters for the network.

**Equations:**
$$W^{[1]}, b^{[1]}, ..., W^{[L]}, b^{[L]}$$

**Returns:**
- ```parameters```, a dictionary with randomly initialized values for ```W1```, ```b1```, ... for ```L-1``` layers

## Forward Propagation

### 4.1 ```linear_forward(A, W, b)```
Performs the linear step of forward propagation for a single layer.

**Equations:**
$$Z = WA_{prev} + b$$

**Returns:**
- ```Z```, the linear activation
- ```cache```, which just stores the given ```A```, ```W```, and ```b``` values for later use

### 4.2 ```sigmoid(Z)```, ```relu(Z)```
The sigmoid and ReLU activation functions, which are provided.

**Equations:**
$$A = g(Z)$$

**Returns:**
- ```A```, the non-linear activation value
- ```activation_cache```, which just stores ```Z``` for later use

### 4.2 ```linear_activation_forward(A_prev, W, b, activation)```
Calls ```linear_forward(A,W,b)``` and ```sigmoid(Z)``` or ```relu(Z)``` to perform a complete forward propagation step.

**Equations:**
$$A = g(WA_{prev} + b)$$

**Returns:**
- ```A```, the activation value for the current layer
- ```cache```, which stores A_prev, W, b, and Z for use in backward propagation

### 4.2 ```L_model_forward(X, parameters)```

Calls ```linear_activation_forward``` a total of ```L-1``` times using relu activation to compute the output of the hidden layers, then calls ```linear_activation_forward``` once more with a sigmoid activation to compute the final activation value of the network, completing an entire pass of forward propagation. Computes $A^{[L]}$

**Returns:**
- ```AL```, the value of $A^{[L]}$, the final vector of activation values for the network. This is the probability vector for this iteration.
- ```caches```, which is just an array of all the caches from each layer

## Cost calculation

### 5 ```compute_cost(AL, Y)```

Computes the cost function by averaging the binary cross-entropy loss function, $-[y\log(\hat{y})+(1-y)\log(1-\hat{y})]$, across all $m$ training examples.

**Equations:**

$$-\frac{1}{m} \sum\limits_{i = 1}^{m} (y^{(i)}\log\left(a^{[L] (i)}\right) + (1-y^{(i)})\log\left(1- a^{[L](i)}\right))$$

**Returns:**
- ```cost```, the cost for the current iteration

## Backward Propagation

### 6.1 ```linear_backward(dZ, cache)```

Performs the linear portion of backward propagation.

**Equations:**

```dW```:$\large{\frac{\partial \mathcal{J} }{\partial W^{[l]}} = \frac{1}{m} dZ^{[l]} A^{[l-1] T}}$
- - -
```db```:$\large{\frac{\partial \mathcal{J} }{\partial b^{[l]}} = \frac{1}{m} \sum_{i = 1}^{m} dZ^{[l](i)}}$
- - -
```dA_prev```:$\large{\frac{\partial \mathcal{L} }{\partial A^{[l-1]}} = W^{[l] T} dZ^{[l]}}$\
**Returns:**
- ```dW```, ```db```, and ```dA_prev```, the gradients for the current layer

### 6.2 ```relu_backward(dA, activation_cache```, ```sigmoid_backward(dA, activation_cache)```
Performs the backward propagation step for the ReLU and sigmoid activation functions. Provided.

### 6.2 ```linear_activation_backward(dA, cache, activation)```
Calls ```linear_backward``` and either ```relu_backward``` or ```sigmoid_backward``` to perform a full backward propagation step for a single layer.

**Returns:**
- ```dW```, ```db```, and ```dA_prev```, the gradients for the current layer

### 6.3 ```L_model_backward(AL, Y, caches)```
Calls ```linear_activation_backward``` a total of ```L``` times to compute the gradients for the current iteration, completing an entire backward propagation pass.

**Returns:**
- ```grads```, a dictionary containing the gradients ```dA1```, ```dW1```, ```db1```, ..., ```dAL```, ```dWL```, ```dbL```

## Update Parameters

### 6.4 ```update_parameters(parameters, grads, learning_rate)```
**Equations:**
$$ W^{[l]} = W^{[l]} - \alpha dW^{[l]}$$
$$ b^{[l]} = b^{[l]} - \alpha db^{[l]}$$
**Returns:**
- ```parameters```, a dictionary containing the updated parameters after one full iteration, ```W1```, ```b1```, ..., ```WL```, ```BL```