## Please read the following DO's and DON'Ts carefully:
1. Please read each task carefully. Make sure you fill in (with your answer) on any place that says `YOUR CODE HERE`.
2. Please `do not` copy any existing notebook cells into the current notebook. Instead, create a new cell by clicking the insert icon (+) on the top-right corner of each cell. Here is a reference to it:
![](cell-insert-before-after.png)
    - You can add any number of new cells into your jupyter notebbook.
    - Make sure your new cells are of type either `Code` or `Markdown`. Please `do not` choose type `Raw.
3. Please `do not` delete any existing cell(s) from the notebook.
4. Please `do not` change the type of
5. Before you turn this problem in, make sure everything runs as expected. First, **restart the kernel** (in the menubar, select Kernel $\rightarrow$ Restart) and then **run all cells** (in the menubar, select Cell $\rightarrow$ Run All).
    - Alternatively, you can hit the `Validate` button from the `Nbgrader > Assignment List`.
6. Once done, please hit the blue `submit` button in the `Nbgrader > Assignment List`.
7. Please write your name and collaborators below between the double quotes:

In [None]:
NAME = ""
COLLABORATORS = ""

---

For this problem set, we'll be using the Jupyter notebook:

![](jupyter.png)

---
## Task A - Regression (10 points)

In this task you are given a simple regression dataset located at `dataset/q6_regression.csv`. It has 100 samples. Each sample is 5 dimensional (i.e., having 5 input features namely `x0`, `x1`, `x2`, `x3`, and `x4`). The target `y` is the dependent variable (feature) that you are going to map with the input features. And, you are going to build the following Artificial Neural Network to model it. Also, we are going to use the `PyTorch` compute framework. If this is your first time doing something like this, no worries. I'll walk you through the process. In the next quiz, I'll quiz you on this experience. Does that sound fair?

Alrighty then, let's proceed.

I hope you are now well aware of what a regression task is:

> A regression task is a type of supervised learning problem where the goal is to predict a continuous numerical value (rather than a discrete category) based on the input features.

So, in the dataset clearly `y` is such continuous numerical value that your artificial neural network should be able to predict based in given 5 input features: `x0`, `x1`, `x2`, `x3`, and `x4`. 

Let's work on building simple feedforward artificial neural network:
* **Input layer**: taking 5 dim numerical inputs.
* **Hidden layer 1**: 10 neurons, Sigmoid activated.
* **Hidden layer 2**: 3 neurons, ReLU activated.
* **Output layer**: 1 neurons, no activation used as it's a regression task. We don't want to squeesh it or scale it. We want the raw net output. Got it?

### Things to note:
* We will be importing the `torch.nn.functional` class to be able to call some utility functions, like sigmoid, and other activation functions: `import torch.nn.functional as F`
- Now, you can call `F.sigmoid()` for sigmoid activation.[sigmoid documentation](https://docs.pytorch.org/docs/stable/generated/torch.nn.functional.sigmoid.html)
- Likewise, you can call `F.relu()` for the ReLU activation. More info [relu documentation](https://docs.pytorch.org/docs/stable/generated/torch.nn.functional.relu.html)

* We will be using `torch.nn` class and its subclasses from PyTorch for building the layers of artificial neural network: `import torch.nn as nn`
* For this task, we will be calling `nn.Linear()` to build 
- You may want to call `nn.Linear()` based on the packages impoted below. More info [Linear documentation](https://docs.pytorch.org/docs/stable/generated/torch.nn.Linear.html)

### Enough said, let's get our hands dirty...
* We will first import the packages below

In [None]:
import numpy as np
import pandas as pd
import torch
import torch.nn as nn #used to build neural network 
import torch.nn.functional as F #used for many utility functions needed in your network

#The following package is needed to read data samples from external files and
#   prepares, and present (in batches) to PyTorch compute framework during training
from torch.utils.data.dataloader import Dataset, DataLoader

#You guessed it right. The following is used to split the given dataset into 2 parts:
#  One is for training and other for testing/evaluation. As I said in class, you must not train a model
#  with the whole given dataset. Set aside a split, which we call a test split so that you can test out
#  your model once training finished. It's like you are being prepared for the midterm with a set of training
#  problems. You don't expect me to train you with exact same problems I put in the real midterm exam, right?
#  So, the problems/splits in your midterm will test how good you learned.
#  The artificial neural network should be training in a similar fashion. Got it?
from torch.utils.data import random_split

#The following is a popular package to draw stuff. You'll see later.
import matplotlib.pyplot as plt

### `torch.utils.data.Dataset` class and `torch.utils.data.DataLoader` class

In PyTorch, the `torch.utils.data` module provides the **building blocks** for handling datasets and feeding them efficiently into models during training. The two most important classes here are **`Dataset`** and **`DataLoader`**.


#### 1. **`torch.utils.data.Dataset`**

* **Purpose:** Represents a dataset (the data itself + how to access it).
* **What it does:** Defines **how to get one sample** from your data and **how many samples** the dataset has.
* **You usually subclass it** and implement two key methods:

  * `__len__(self)` → returns the total number of samples.
  * `__getitem__(self, idx)` → retrieves one sample (features + label) by index.

#### A quick Example:

```python
from torch.utils.data import Dataset

class MyDataset(Dataset):
    def __init__(self, data, labels):
        self.data = data
        self.labels = labels
    
    def __len__(self):
        return len(self.data)   # number of samples
    
    def __getitem__(self, idx):
        x = self.data[idx]
        y = self.labels[idx]
        return x, y
```


#### 2. **`torch.utils.data.DataLoader`**

* **Purpose:** Wraps a `Dataset` and makes it easier to load data **in batches**, **shuffle it**, and even **load in parallel** using multiple worker processes.
* **Why useful:** Instead of manually writing loops over the dataset, you just iterate over a `DataLoader`.

Key parameters:

* `dataset`: the dataset object (must implement `__getitem__` and `__len__`).
* `batch_size`: number of samples per batch.
* `shuffle`: whether to shuffle data each epoch.
* `num_workers`: how many subprocesses to use for data loading (for speed).

#### Another quick Example:

```python
from torch.utils.data import DataLoader
import torch

# Assume dataset is an instance of MyDataset class above
dataset = MyDataset(torch.arange(10), torch.arange(10)*2)

# Wrap it in DataLoader
loader = DataLoader(dataset, batch_size=3, shuffle=True)

# Iterate over it like this:
for batch in loader:
    x, y = batch
    print(x, y)
```



## `Q6_Regression_Dataset` class, which is a subclass of the `Dataset` class
* The `Q6_Regression_Dataset` should be able to read from a csv file:
   - In the constructor, `__init__` method, you can see we simply read from the dataset. Here, you can pass any data/feature transformation function to work as a preprocessor. We are ignoring that in this task.
   - We implemented the `__len___()` member function that simply returns number of samples in the dataset.
   - We implemented the `__getitem__()` member function that takes an integer index `idx`, and returns the sample at that index. Here, we are open to return the specific data sample however we want. But, for this task, we are returning the samples as a dictionary so that it's easier to interpret during training. Please note: the values were converted to `torch.tensor` type so that PyTorch can send it to GPU easily.

In [None]:
class Q6_Regression_Dataset(Dataset):
    def __init__(self, csv_file, transform=None):
        """
        Args:
            csv_file (string): Path to the csv file with annotations.
                transform (callable, optional): Optional transform to be applied
                    on a sample.
        """
        self.dataframe = pd.read_csv(csv_file)
        self.transform = transform

    def __len__(self):
        return len(self.dataframe)

    def __getitem__(self, idx):
        # Example: Assuming last column is the target, all preceding columns are input features.
        features = torch.tensor(self.dataframe.iloc[idx, :-1].values, dtype=torch.float32)
        label = torch.tensor(self.dataframe.iloc[idx, -1], dtype=torch.float) # or float for regression
                                                                              # and long for classification

        sample = {'features': features, 'labels': label}

        if self.transform:
            sample = self.transform(sample)

        return sample

## `Q6_Regression_Net` -- let's define the Artificial Neural Network architecture
* Remember, we are to define this network:

```
* **Input layer**: taking 5 dim numerical inputs.
* **Hidden layer 1**: 10 neurons, Sigmoid activated.
* **Hidden layer 2**: 3 neurons, ReLU activated.
* **Output layer**: 1 neurons, no activation
```

* In PyTorch, neural networks are typically defined by subclassing `nn.Module` class (assuming you imported with `import torch.nn as nn` above.
  - You must define the `forward` function that would demonstrate a forward propagation given a data sample, `x`.



In [None]:
# YOUR CODE HERE

Let’s carefully walk through the **layers** and **activations** defined in the constructor (`__init__`) of the `Q6_Regression_Net` class.

### 1. **Layers (`nn.Linear`)**

Each **linear layer** in PyTorch performs a transformation of the form:

$$
y = xW^T + b
$$

where $W$ are learnable weights and $b$ is a learnable bias.

* **`self.fc1 = nn.Linear(5, 10)`**

  * Input: 5-dimensional feature vector (since each data sample has 5 features).
  * Output: 10-dimensional vector (hidden layer with 10 neurons).

* **`self.fc2 = nn.Linear(10, 3)`**

  * Takes the 10 outputs from the previous layer.
  * Reduces them to 3 outputs (hidden layer with 3 neurons).

* **`self.fc3 = nn.Linear(3, 1)`**

  * Takes the 3 outputs from hidden layer 2.
  * Produces 1 output (final regression value).


### 2. **Activations**

Activations introduce **non-linearity**, allowing the network to learn (very) complex functions.

* **`self.sigmoid = nn.Sigmoid()`**

  * Applied after `fc1`.
  * Squashes values into the range (0, 1).
  * Useful for modeling nonlinear relationships.

* **`self.relu = nn.ReLU()`**

  * Applied after `fc2`.
  * ReLU (Rectified Linear Unit) outputs:

    $$
    \text{ReLU}(x) = \max(0, x)
    $$
  * Keeps positive values, zeros out negatives.
  * Helps with gradient flow and avoids vanishing gradients.

* **Final layer (`fc3`) has *no activation***

  * Since this is a **regression task**, we want a raw continuous output (could be negative or positive), so we leave it linear.

---

### 3. **Forward Pass Summary**

When a batch of inputs `x` (shape `[batch_size, 5]`) goes through the network:

1. `x → fc1 → sigmoid`
   Input (5) → Linear (5→10) → Nonlinear sigmoid → Output (10).
2. `x → fc2 → relu`
   Input (10) → Linear (10→3) → Nonlinear ReLU → Output (3).
3. `x → fc3`
   Input (3) → Linear (3→1) → Output (1).

Final shape per sample: **1 value** (predicted regression target).

Here’s a simple diagram of your regression network:

![Q6_Regression_Net.png](Q6_Regression_Net.png)

* **Blue circles** → Input layer (5 features)
* **Green circles** → Hidden layer 1 (10 neurons, **Sigmoid**)
* **Orange circles** → Hidden layer 2 (3 neurons, **ReLU**)
* **Red circle** → Output layer (1 neuron, **Linear**)

This shows how data flows step by step from inputs → hidden layers → output.



## Now, let's train the network ~ learn the weights

In [None]:
#Let's instantiate our model
model = Q6_Regression_Net()



In [None]:
#Speaking of loss function. We have plenty to choose from
#Reference: https://docs.pytorch.org/docs/stable/generated/torch.nn.MSELoss.html
criterion = nn.MSELoss()



In [None]:
# Gradient descent optimizer, with learning rate,`lr`
optimizer = torch.optim.SGD(model.parameters(), lr=0.01)

In [None]:
# Get the dataset ready for training
full_dataset = Q6_Regression_Dataset(csv_file='dataset/q6_regression.csv')


# Define the lengths for the splits (e.g., 80 for training, 20 for validation)
train_len = 80
test_len = 20
split_lengths = [train_len, test_len]

# Perform the random split
train_dataset, test_dataset = random_split(full_dataset, split_lengths)


# Prepare two data loaders to fetch batches of samples from the two datasets
train_dataloader = DataLoader(train_dataset, batch_size=8, shuffle=True)
test_dataloader = DataLoader(test_dataset, batch_size=8, shuffle=True)

In [None]:
for b in train_dataloader:
    print(b['labels'])

In [None]:
# Now, the training iterations begin below
num_epochs = 100
for epoch in range(num_epochs):

    for a_batch in train_dataloader:
        #Remember? a batch is a dictinary of features & labels
        batch_features = a_batch['features']
        batch_labels = a_batch['labels']

        
        # Forward pass the batch
        outputs = model(batch_features)

        #Calculate loss (i.e., error) based on the ground true labels current model outputs
        loss = criterion(outputs, batch_labels)

        # Zero out previously computed gradients... just remember this step before computing new gradients below
        optimizer.zero_grad()

        # Compute gradients of the loss with respect to the weights
        loss.backward()

        # Adjust the weights based on the gradient update formula
        optimizer.step()

    print(f'Epoch [{epoch+1}/{num_epochs}], Loss: {loss.item():.4f}')
    

## Now evaluate your trained model

In [None]:
#Set the model into evaluation model. No further "accidental" weight adjustments won't happen in this mode.
model.eval()

In [None]:
loss = 0 #total MSE score
with torch.no_grad():
    # Your evaluation code here
    for a_batch in test_dataloader:
        batch_features = a_batch['features']
        batch_labels = a_batch['labels']

        # Forward pass the batch
        outputs = model(batch_features)

        #Calculate & aggregate loss (i.e., error) based on the ground true labels current model outputs
        loss += criterion(outputs, batch_labels)

In [None]:
print(f'Test loss = {loss:.4f}')

Is it good enough?