**Feed Forward Network**

A feed forward network, also known as a feed forward neural network (FFNN), is a type of artificial neural network where the data flows only in one direction, from input layer to output layer, without any feedback loops or cycles. In other words, the data flows "forward" through the network, from the input layer to the hidden layers (if any) and finally to the output layer.

Here's a high-level overview of a feed forward network:

1. **Input Layer**: The input layer receives the input data, which is then propagated to the next layer.
2. **Hidden Layers**: One or more hidden layers process the input data through complex transformations, using activation functions such as sigmoid, ReLU, or tanh.
3. **Output Layer**: The final output layer generates the predicted output based on the processed data from the hidden layers.

**Backward Propagation**

Backward propagation, also known as backpropagation, is an essential algorithm used to train feed forward neural networks. It's a method for computing the gradients of the loss function with respect to the model's parameters, which is necessary for optimizing the model's performance.

Here's a step-by-step overview of the backward propagation algorithm:

1. **Forward Pass**: The input data is propagated through the network, and the predicted output is generated.
2. **Error Calculation**: The difference between the predicted output and the actual output (target) is calculated, resulting in an error or loss.
3. **Backward Pass**: The error is propagated backwards through the network, from the output layer to the input layer, to compute the gradients of the loss function with respect to each layer's weights and biases.
4. **Weight Update**: The gradients are used to update the model's weights and biases, using an optimization algorithm such as stochastic gradient descent (SGD), Adam, or RMSProp.

The backward propagation algorithm is based on the chain rule of calculus, which allows us to compute the gradients of the loss function with respect to the model's parameters. This process is repeated iteratively, with the model's weights and biases being updated after each iteration, until convergence or a stopping criterion is reached.

**Example Code**

Here's a simple example of a feed forward network with backward propagation using Python and the NumPy library:
```python
import numpy as np

# Define the number of inputs, hidden units, and outputs
n_inputs = 2
n_hidden = 2
n_outputs = 1

# Initialize the weights and biases
weights1 = np.random.rand(n_inputs, n_hidden)
weights2 = np.random.rand(n_hidden, n_outputs)
bias1 = np.zeros((1, n_hidden))
bias2 = np.zeros((1, n_outputs))

# Define the activation functions
def sigmoid(x):
    return 1 / (1 + np.exp(-x))

def sigmoid_derivative(x):
    return x * (1 - x)

# Define the forward pass
def forward_pass(inputs):
    hidden_layer = sigmoid(np.dot(inputs, weights1) + bias1)
    output_layer = sigmoid(np.dot(hidden_layer, weights2) + bias2)
    return hidden_layer, output_layer

# Define the backward pass
def backward_pass(inputs, targets, hidden_layer, output_layer):
    error = targets - output_layer
    d_output_layer = error * sigmoid_derivative(output_layer)
    d_hidden_layer = d_output_layer.dot(weights2.T) * sigmoid_derivative(hidden_layer)
    d_weights2 = hidden_layer.T.dot(d_output_layer)
    d_weights1 = inputs.T.dot(d_hidden_layer)
    d_bias2 = np.sum(d_output_layer, axis=0, keepdims=True)
    d_bias1 = np.sum(d_hidden_layer, axis=0, keepdims=True)
    return d_weights1, d_weights2, d_bias1, d_bias2

# Train the network
for epoch in range(1000):
    inputs = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])
    targets = np.array([[0], [1], [1], [0]])
    hidden_layer, output_layer = forward_pass(inputs)
    d_weights1, d_weights2, d_bias1, d_bias2 = backward_pass(inputs, targets, hidden_layer, output_layer)
    weights1 += 0.1 * d_weights1
    weights2 += 0.1 * d_weights2
    bias1 += 0.1 * d_bias1
    bias2 += 0.1 * d_bias2
    print("Epoch:", epoch, "Error:", np.mean(np.abs(targets - output_layer)))
```
This code defines a simple feed forward network with two inputs, two hidden units, and one output. The network is trained using the backward propagation algorithm to learn the XOR function.


**Activation Function**
======================

An activation function is a mathematical function that is applied to the output of a neural network node or layer. It is used to introduce non-linearity into the model, allowing it to learn and represent more complex relationships between the inputs and outputs.

**Why We Use Activation Functions**
----------------------------------

1. **Introduce Non-Linearity**: Activation functions introduce non-linearity into the model, allowing it to learn and represent more complex relationships between the inputs and outputs. Without non-linearity, the model would only be able to learn linear relationships.
2. **Improve Model Capacity**: Activation functions increase the capacity of the model, allowing it to fit more complex data distributions.
3. **Allow Backpropagation**: Activation functions are required for backpropagation, which is the process of adjusting the model's parameters to minimize the error.

**Three Common Activation Functions**
-------------------------------------

### 1. Sigmoid Activation Function

The sigmoid activation function maps the input to a value between 0 and 1.

```python
import numpy as np

def sigmoid(x):
    return 1 / (1 + np.exp(-x))
```

### 2. ReLU (Rectified Linear Unit) Activation Function

The ReLU activation function maps all negative values to 0 and all positive values to the same value.

```python
import numpy as np

def relu(x):
    return np.maximum(0, x)
```

### 3. Tanh (Hyperbolic Tangent) Activation Function

The tanh activation function maps the input to a value between -1 and 1.

```python
import numpy as np

def tanh(x):
    return np.tanh(x)
```

**Code Example**
---------------

```python
import numpy as np

# Example input
x = np.array([1, 2, 3, 4, 5])

# Apply sigmoid activation function
output_sigmoid = 1 / (1 + np.exp(-x))
print("Sigmoid Output:", output_sigmoid)

# Apply ReLU activation function
output_relu = np.maximum(0, x)
print("ReLU Output:", output_relu)

# Apply tanh activation function
output_tanh = np.tanh(x)
print("Tanh Output:", output_tanh)
```

In this example, we apply the sigmoid, ReLU, and tanh activation functions to a simple input array `x`. The output of each activation function is different, but all three functions introduce non-linearity into the model.
