```mermaid
graph LR
    A(Input)
    F(Output)
    
    A --> B
    B --> C
    C --> D
    D --> E
    E --> F

    
    subgraph Hidden_Layer
        B((Weighted Sum))
        C((Sigmoid))
    end
    
    subgraph Output_Layer
        D((Weighted Sum))
        E((Sigmoid))
    end
    
    
    style A fill:#f9f,stroke:#333,stroke-width:2px
    style B fill:#f9f,stroke:#333,stroke-width:2px
    style C fill:#f9f,stroke:#333,stroke-width:2px
```

Let's consider a simple neural network with two layers, each one with one neuron. We'll use the sigmoid activation function for both neurons. Here's an example of backpropagation with this network:

1. **Initialize the Network Parameters**:
   Let's assume we have:
   - Input data: $x = 0.5$
   - Target output: $y_{\text{target}} = 1$
   - Random initial weights and biases for the hidden layer:
     - $w_1 = 0.4$
     - $b_1 = 0.5$
   - Random initial weights and bias for the output layer:
     - $w_2 = 0.8$
     - $b_2 = -0.2$

2. **Forward Pass**:
   - Compute the weighted sum and activation for the hidden layer:
   
      $z_1 = w_1 \cdot x + b_1 = 0.4 \cdot 0.5 + 0.5 = 0.7$
     
     $a_1 = \text{sigmoid}(z_1) = \frac{1}{1 + e^{-z_1}} = \frac{1}{1 + e^{-0.7}} \approx 0.668$

   - Compute the weighted sum and activation for the output layer:

     $z_2 = w_2 \cdot a_1 + b_2 = 0.8 \cdot 0.668 - 0.2 = 0.333$
     
     $a_2 = \text{sigmoid}(z_2) = \frac{1}{1 + e^{-z_2}} = \frac{1}{1 + e^{-0.333}} \approx 0.583$

3. **Compute Loss**:
   - Compute the loss using a simple squared error loss function:
     
     $L = \frac{1}{2}(y_{\text{target}} - a_2)^2 = \frac{1}{2}(1 - 0.583)^2 \approx 0.083$

4. **Backpropagation**:
   - Compute the gradient of the loss with respect to the output layer activation:

     $\frac{\partial L}{\partial a_2} = -(y_{\text{target}} - a_2) = -(1 - 0.583) = -0.417$

   - Compute the gradient of the output layer activation with respect to its weighted sum:

     $\frac{\partial a_2}{\partial z_2} = a_2 \cdot (1 - a_2) = 0.583 \cdot (1 - 0.583) \approx 0.244$
   
   - Compute the gradient of the loss with respect to the output layer weights and bias:

     $\frac{\partial L}{\partial w_2} = \frac{\partial L}{\partial a_2} \cdot \frac{\partial a_2}{\partial z_2} \cdot a_1 = (-0.417) \cdot 0.244 \cdot 0.668 \approx -0.067$
     
     $\frac{\partial L}{\partial b_2} = \frac{\partial L}{\partial a_2} \cdot \frac{\partial a_2}{\partial z_2} = (-0.417) \cdot 0.244 \approx -0.102$

   - Compute the gradient of the loss with respect to the hidden layer activation:
     
     $\frac{\partial L}{\partial a_1} = \frac{\partial L}{\partial a_2} \cdot \frac{\partial a_2}{\partial z_2} \cdot w_2 = (-0.417) \cdot 0.244 \cdot 0.8 \approx -0.080$

   - Compute the gradient of the hidden layer activation with respect to its weighted sum:
     
     $\frac{\partial a_1}{\partial z_1} = a_1 \cdot (1 - a_1) = 0.668 \cdot (1 - 0.668) \approx 0.221$
     
   - Compute the gradient of the loss with respect to the hidden layer weights and bias:
     
     $\frac{\partial L}{\partial w_1} = \frac{\partial L}{\partial a_1} \cdot \frac{\partial a_1}{\partial z_1} \cdot x = (-0.080) \cdot 0.221 \cdot 0.5 \approx -0.009$
     
     $\frac{\partial L}{\partial b_1} = \frac{\partial L}{\partial a_1} \cdot \frac{\partial a_1}{\partial z_1} = (-0.080) \cdot 0.221 \approx -0.018$

5. **Update Weights and Bias**:

   - Update the weights and bias using gradient descent (using $\alpha = 0.1$ as the learning rate):

   $w_1 \leftarrow w_1 - \alpha \cdot \frac{\partial L}{\partial w_1} = 0.4 - 0.1 \cdot (-0.009) \approx 0.401$
   
   $b_1 \leftarrow b_1 - \alpha \cdot \frac{\partial L}{\partial b_1} = 0.5 - 0.1 \cdot (-0.018) \approx 0.502$
   
   $w_2 \leftarrow w_2 - \alpha \cdot \frac{\partial L}{\partial w_2} = 0.8 - 0.1 \cdot (-0.067) \approx 0.807$
   
   $b_2 \leftarrow b_2 - \alpha \cdot \frac{\partial L}{\partial b_2} = -0.2 - 0.1 \cdot (-0.102) \approx -0.189$

6. **Repeat**:

   - Repeat steps 2-5 for a number of iterations or until convergence. 
   
This example demonstrates a single iteration of forward pass, backpropagation, and weight update. In practice, you would typically perform multiple iterations of this process to train the network on a larger dataset.

# Example 2 - single neuron

```mermaid
graph TD
    A((Input Layer: x=2))
    B((Neuron))
    
    A --> B
    
    subgraph Neuron
        B
        B --> C((Weighted Sum))
        B --> D((Activation))
    end
    
    C --> D
    
    style A fill:#f9f,stroke:#333,stroke-width:2px
    style B fill:#f9f,stroke:#333,stroke-width:2px
```

Of course! Let's consider a simple neural network with a single neuron, one weight, and one bias. We'll perform backpropagation to update the weight and bias using the chain rule.

Here's a step-by-step example:

1. **Initialize the Network Parameters**:
   Let's assume we have:
   - Input data: \(x = 2\)
   - Target output: \(y_{\text{target}} = 1\)
   - Learning rate: \(\alpha = 0.1\)
   - Random initial weight: \(w = 0.5\)
   - Random initial bias: \(b = 0.3\)

2. **Forward Pass**:
   - Compute the weighted sum:
     \[
     z = w \cdot x + b = 0.5 \cdot 2 + 0.3 = 1.3
     \]
   - Compute the activation using a linear function (since we have only one neuron, there is no activation function):
     \[
     a = z = 1.3
     \]
   - Compute the loss using a simple squared error loss function:
     \[
     L = \frac{1}{2}(y_{\text{target}} - a)^2 = \frac{1}{2}(1 - 1.3)^2 = 0.045
     \]

3. **Backpropagation**:
   - Compute the gradient of the loss with respect to the activation:
     \[
     \frac{\partial L}{\partial a} = -(y_{\text{target}} - a) = -(1 - 1.3) = 0.3
     \]
   - Compute the gradient of the activation with respect to the weighted sum:
     \[
     \frac{\partial a}{\partial z} = 1 \quad \text{(since it's a linear function)}
     \]
   - Compute the gradient of the loss with respect to the weighted sum:
     \[
     \frac{\partial L}{\partial z} = \frac{\partial L}{\partial a} \cdot \frac{\partial a}{\partial z} = 0.3 \cdot 1 = 0.3
     \]
   - Compute the gradient of the weighted sum with respect to the weight and bias:
     \[
     \frac{\partial z}{\partial w} = x = 2
     \]
     \[
     \frac{\partial z}{\partial b} = 1
     \]
   - Compute the gradient of the loss with respect to the weight and bias using the chain rule:
     \[
     \frac{\partial L}{\partial w} = \frac{\partial L}{\partial z} \cdot \frac{\partial z}{\partial w} = 0.3 \cdot 2 = 0.6
     \]
     \[
     \frac{\partial L}{\partial b} = \frac{\partial L}{\partial z} \cdot \frac{\partial z}{\partial b} = 0.3 \cdot 1 = 0.3
     \]

4. **Update Weights and Bias**:
   Update the weight and bias using gradient descent:
   \[
   w \leftarrow w - \alpha \cdot \frac{\partial L}{\partial w} = 0.5 - 0.1 \cdot 0.6 = 0.44
   \]
   \[
   b \leftarrow b - \alpha \cdot \frac{\partial L}{\partial b} = 0.3 - 0.1 \cdot 0.3 = 0.27
   \]

5. **Repeat**:
   Repeat steps 2-4 for a number of iterations or until convergence.

This example demonstrates a single iteration of forward pass, backpropagation, and weight update for a single neuron with one weight and one bias. In practice, you would typically perform multiple iterations (epochs) of this process to train the network on a larger dataset.

```mermaid
graph TD
    A((Input Layer: x=2))
    B((Neuron))
    
    A --> B
    
    subgraph Neuron
        B
        B --> C((Weighted Sum))
        B --> D((Activation))
        B --> E((dL/dz))
    end
    
    C --> E
    E --> D
    
    style A fill:#f9f,stroke:#333,stroke-width:2px
    style B fill:#f9f,stroke:#333,stroke-width:2px
```

```mermaid
graph TD
    A((Input Layer: x=2))
    B((Neuron))
    
    A --> B
    
    subgraph Neuron
        B
        B --> C((Weighted Sum))
        B --> D((Activation))
        B --> E((dL/dz))
        B --> F((dz/dw))
    end
    
    C --> F
    E --> F
    F --> D
    
    style A fill:#f9f,stroke:#333,stroke-width:2px
    style B fill:#f9f,stroke:#333,stroke-width:2px
```

```mermaid
graph TD
    A((Input Layer: x=2))
    B((Neuron))
    
    A --> B
    
    subgraph Neuron
        B
        B --> C((Weighted Sum))
        B --> D((Activation))
        B --> E((dL/dz))
        B --> F((dz/dw))
        B --> G((dz/db))
    end
    
    C --> F
    E --> F
    F --> D
    G --> D
    
    style A fill:#f9f,stroke:#333,stroke-width:2px
    style B fill:#f9f,stroke:#333,stroke-width:2px
```

```mermaid
graph TD
    A((Input Layer: x=2))
    B((Neuron))
    
    A --> B
    
    subgraph Neuron
        B
        B --> C((Weighted Sum))
        B --> D((Activation))
        B --> E((dL/dz))
        B --> F((dz/dw))
        B --> G((dz/db))
        B --> H((dw))
        B --> I((db))
    end
    
    C --> F
    E --> F
    F --> D
    G --> D
    H --> J((Updated Weight))
    I --> K((Updated Bias))
    
    style A fill:#f9f,stroke:#333,stroke-width:2px
    style B fill:#f9f,stroke:#333,stroke-width:2px
```