## Part 1: Conceptual Warm-Up (No Coding)

**Goal:** Ensure students understand the foundational concepts through explanation and derivation.

### Tasks

#### 1. Short Answer Questions

a) Derive the gradient update rule for the weight(s) for a single neuron with one input feature (i.e.,  
field, or attribute) and the sigmoid activation function:

$\sigma(x) = \frac{1}{1 + e^{-x}}$

b) Explain the role of the activation function in a neural network. Please guess without searching  
textbook/the Internet.

c) What is overfitting, and how can it be mitigated? List a couple of techniques to reduce overfitting.

d) Can you guess what might be an underfitting problem? How can that be mitigated? List a couple of  
techniques to reduce underfitting.


**a.)**
First we require the following inputs

For the gradient rule update we need three main components:

The Weighted Sum:
$z = wx+b$

Activation:
$a = \sigma(z) = \frac{1}{1 + e^{-z}}$

Loss:
$L = 1/2(a-y)^2$

Forward pass:
$^y = a = \sigma(z)$

$w = w - \eta *  $

**b.)**
An activation function such as sigmoid or ReLu is used in neural networks to make sense of the weights and inputs in a given neuron of a neural network. In a Sigmoid it squishes the outputs into a value between 0 and 1 to help determine the magnitude of neuron in a network.

**c.)**
Overfitting is the problem of training a model too well on a dataset where it preforms incredibly well on the same dataset but then falls short outside of it, preforming worse than expected. Overfitting is largly caused by over trainning on the same data set and having too long. Ways to mitigate overfitting is to limit training time to not overtrain on the same data, provide different data sets (potentially instead of feeding one large dataset at once break it into pieces and train it on each set separately), and providing more data. Another alternative is to change the network structure like changing the number of weights and their values.

**d.)**
Underfitting would presumably be the opposite to overfitting where the model is not training at all to the dataset provided. My assumption would be that similar methods to rectify the issue would apply: provide more quality data, maybe the model has not had enough time to train, and also adjustments to the structure may be required. Further reading on the subject my assumptions are roughly correct, a reason of underfitting could be too simple of a data set to create meaningful connections and relationships in the network. Looks like eit it is also due to poor feature engineering or excessive regulation of a dataset, ommiting key datapoints of a data set which would cause poor trainning.

#### 2. Math Problem

a) For the following small binary classification dataset (4 samples each with two input features  
$(x_1, x_2)$ and one target feature $(y)$):  

Compute the forward passes and the gradients manually for a neural network with 1 hidden layer  
(with 2 neurons) and 1 output neuron.

1. Structure
2. Recognize Variables
3. Forward Pass
   1. Solve $z$
      1. $z = wx+b$
   2. Activate $z (a)$
      1. $a = \sigma(z) = \frac{1}{1 + e^{-z}}$
   3. Solve all neurons within the layer
   4. When all layer is complete, pass all and repeat until complete
4. Backpropagation (Update gradient descent)
   1. Which weight are we using

      1. $ MSE = \dfrac{1}{n} \sum_{i=1}^n (z_i-y_i)^2$

   2. Expand $ \dfrac{\partial L}{\partial w}$ with the chain rule

      1. $L = \frac{1}{2}(a-y)^2$  
   
      2. $\dfrac{\partial L}{\partial w} = \dfrac{\partial L}{\partial a} * \dfrac{\partial a}{\partial z} * \dfrac{\partial z}{\partial w}$

      3. $\dfrac{\partial L}{\partial a}$ = $(a-y)$

      4. $\dfrac{\partial a}{\partial z}$ = $a(1-a)$

      5. $\dfrac{\partial z}{\partial w}$ = $x$

      6. $\dfrac{\partial L}{\partial a}$ = $(a-y)*a(1-y)*x$

   3. $w_{new} = w - \eta((a-y)*a(1-y)*x)$
   4. $b = b - \eta((a-y)*a(1-y))$


$x_1$ | $x_2$ | $y$
------ | ------| -----
0 | 0 | 0
0 | 1 | 1
1 | 0 | 1
1 | 1 | 0

![image.png](attachment:image.png)

$z = w_1x_1 + w_2x_2+b$

$z_1 = w_1(0) + w_2(0)+b$

$z_2 = w_1(0) + w_2(0)+b$

$a_1 = \sigma(z_1)$

$a_2 = \sigma(z_2)$

$w_{new} = w - \eta * \dfrac{\partial L}{\partial w}$

$w_{new} = w - \eta (\dfrac{\partial L}{\partial a} * \dfrac{\partial a}{\partial z} * \dfrac{\partial z}{\partial w})$

$w_{new} = w - \eta ((a-y) * a(1-y)*x)$


## Part 2: Build a Neural Network from Scratch (Coding)

**Goal:** Students implement a simple neural network without relying on deep learning libraries like TensorFlow or PyTorch.

### Tasks

#### 3. Implement a neural network with:

a) 1 input layer, 1 hidden layer (with 2 neurons), and 1 output layer.  

b) Use a non-linear activation function (e.g., sigmoid, ReLU, tanh, etc.) for each of the neurons.  

c) Do not forget to introduce the bias inputs for the two neural layers (i.e., hidden layer and output layer).  

d) Consider **mean squared error (MSE)** as the loss/error function.  

#### 4. Train it on the dataset:

- Small dataset of 4 samples (denoted by rows in *Fig. 1*).  
- Each sample has two input features $(x_1, x_2)$ and one target feature $(y)$.  

### Requirements

- Implement **forward propagation** and the **backpropagation algorithm** manually.  
  (Refer to the boilerplate code for placeholders where your implementations should go.)  
- Train the network for a fixed number of **epochs (10,000)** and plot the **loss over time**.  

### Specific Guidelines

- Do **not** use any pre-built neural network libraries. Only use libraries for basic operations (e.g., NumPy).  
- Write **detailed comments** in your code to explain each step of your implementation.  

## Part 3: Experimentation and Analysis

**Goal:** Encourage critical thinking.

### Tasks

#### 5. Hyperparameter Tuning

- Experiment with different **learning rates** and **hidden layer sizes**.  
- Analyze how these changes impact the **convergence** and **performance** of your model.  

#### 6. Visualization

- Plot **decision boundaries** after training your model.  
- Provide insights into how the model separates the data.  

## Part 4: Reflective Questions

**Goal:** Make students reflect on the learning process.

### Tasks

#### 7. Backpropagation Challenges
- What challenges did you face in implementing backpropagation?  
- How did you overcome them?  

#### 8. Debugging
- Explain the importance of **debugging** in neural network training.  
- Provide one strategy that helped you debug effectively.  

#### 9. Activation Function Choice
- Discuss how the training process would change if you used a different activation function  
  (e.g., tanh instead of ReLU, or vice versa).
