### 1. Load and inspect data

In [138]:
import pandas as pd

# load csv file using pandas
data_path = 'data.csv'
data = pd.read_csv(data_path)           # data is a dataframe/2d tabular representation

print(data)

    x_0  x_1  x_2  y
0  1.00    0    0  0
1  0.00    0    5  0
2  1.00    1    3  1
3  0.00    1    1  0
4  0.00    1    1  1
5  0.00    1    1  0
6  3.71    0    1  1
7  1.10    0    1  0
8  1.00    0    0  1
9  1.00    1    1  0


In [139]:
# check number of data points (rows) and number of features (columns except target 'y')

print(data.shape)  # returns tuple (rows, columns)

(10, 4)


- Number of rows: 10
- Number of columns: 4
<br><br>
- Number of features: 3 (since 'y' is a column in this case)
- Number of data points: rows x features 
    - 3 x 10 : 30 data points
<br><br>

Features: measurable properties/attributes we can use to predict


### Range of features

- By knowing the range of each feature, we can apply proper normalization (e.g., min-max scaling or standardization) to ensure all features contribute proportionately during training
    - For ex., if the range of one feature is 10 times larger than that of another, then during loss minimization, the gradients associated with the larger-scaled feature will likely be larger. This disproportion can cause the optimization process to overemphasize that feature, even though that feature might not actually be too influential in the prediction, potentially skewing weight updates and adversely affecting the overall training process

    

In [140]:
# determine range of each feature

features_columns = [col for col in data if col != 'y']
feature_ranges = {}
for feature in features_columns:
    min_val = data[feature].min()
    max_val = data[feature].max()
    feature_ranges[feature] = float(max_val - min_val)

print("range of features: ")
for feature in feature_ranges.items():
    print(feature)

range of features: 
('x_0', 3.71)
('x_1', 1.0)
('x_2', 5.0)


### Model and package selection

- Because the target column consists of 0s and 1s, this is likely a binary classification problem (predicting y from x features)
    - Use a feedforward neural network
- Use  pytorch for defining the model, training, evals
- Use the scikit-learn package to split the data

In [141]:
# -------------------------
# prepare the data for pytorch
# -------------------------

import torch
import numpy as np

# separate features and target, converting them to numpy arrays with type float32
features_values = data[features_columns].values
target_values = data['y'].values

# convert the numpy arrays to pytorch tensors
tensor_features = torch.Tensor(features_values)
tensor_target = torch.Tensor(target_values)

### Split the data (80% train, 20% test)

In [142]:
from sklearn.model_selection import train_test_split


# split data into training and test sets
x_train, x_test, y_train, y_test = train_test_split(features, target, test_size=0.2, random_state=42) 

tensor_x_train = torch.Tensor(x_train)
tensor_x_test = torch.Tensor(x_test)
tensor_y_train = torch.Tensor(y_train)
tensor_y_test = torch.Tensor(y_test)

### Create pytorch datasets and dataloaders for the train and test sets

TODO:
- Why? also know exactly what that code is doing
- what is batch size? why 2?
- why set shuffle to true for train and false for test?

In [143]:
from torch.utils.data import TensorDataset, DataLoader

train_dataset = TensorDataset(tensor_x_train, tensor_y_train)
test_dataset = TensorDataset(tensor_x_test, tensor_y_test)

train_dataloader = DataLoader(train_dataset, batch_size=2, shuffle=True)        
test_dataloader = DataLoader(train_dataset, batch_size=2, shuffle=False)      

### Build the neural network

TODO:
- how to determine number of hidden layers? and number of neurons they each take? 
- why relu?

In [144]:
import torch.nn as nn

class NeuralNetwork(nn.Module):
    def __init__(self, input_features=3):
        super().__init__()

        # number of features = number of input neurons

        # Input layer (3 features) 
            # -> Hidden layer1 (10 neurons) 
            # -> Hidden layer2 (5 neurons) 
            # -> output (2 neurons, 1 output value)

        self.model = nn.Sequential(
            nn.Linear(input_features, 10),   # first hidden layer with 10 neurons
            nn.ReLU(),                       # activation function RELU: max(0, x) 
            nn.Linear(10, 5),                # second hidden layer with 5 neurons
            nn.ReLU(),
            nn.Linear(5, 1),                 # output layer
            nn.Sigmoid()                     # final activation for binary classification (probabilities between 0 and 1)
        )  
        
    def forward(self, x): 
        return self.model(x)


- why BCEloss?
- why adam optimizer?

In [145]:
import torch.optim as optim

# intialize the model, loss function, and optimizer

input_dim = len(features_columns)
model = NeuralNetwork(input_dim)
loss_criterion = nn.BCELoss()
optimizer = optim.Adam(model.parameters(), lr=0.01)

### Train method to train the model & print training accuracy

- what is epoch? how to determine its value?
- what is batch?
- what is .zero_grad()?
- determine what EACH line of code is doing in this

In [156]:
def train_model(model, dataloader, loss_criterion, optimizer, num_epochs=50):
    model.train()   # set to training mode

    for epoch in range(num_epochs):
        total_loss = 0
        for batch_features, batch_labels in dataloader:
            optimizer.zero_grad()                           # reset gradients
            y_pred = model(batch_features)                  # forward pass using batch data
            loss = loss_criterion(y_pred.squeeze(), batch_labels)  # compute loss using batch data
            loss.backward()                                 # backpropagation
            optimizer.step()                                # update weights
            total_loss += loss.item()
        
        
        avg_loss = total_loss / len(dataloader)
        print(f"Epoch {epoch+1}/{num_epochs}, Average Loss: {avg_loss:.4f}")


train_model(model, train_dataloader, loss_criterion, optimizer, num_epochs=50)

Epoch 1/50, Average Loss: 0.3010
Epoch 2/50, Average Loss: 0.2996
Epoch 3/50, Average Loss: 0.2967
Epoch 4/50, Average Loss: 0.2916
Epoch 5/50, Average Loss: 0.2895
Epoch 6/50, Average Loss: 0.2865
Epoch 7/50, Average Loss: 0.2847
Epoch 8/50, Average Loss: 0.2831
Epoch 9/50, Average Loss: 0.2809
Epoch 10/50, Average Loss: 0.2773
Epoch 11/50, Average Loss: 0.2769
Epoch 12/50, Average Loss: 0.2753
Epoch 13/50, Average Loss: 0.2750
Epoch 14/50, Average Loss: 0.2729
Epoch 15/50, Average Loss: 0.2707
Epoch 16/50, Average Loss: 0.2733
Epoch 17/50, Average Loss: 0.2690
Epoch 18/50, Average Loss: 0.2665
Epoch 19/50, Average Loss: 0.2657
Epoch 20/50, Average Loss: 0.2639
Epoch 21/50, Average Loss: 0.2648
Epoch 22/50, Average Loss: 0.2634
Epoch 23/50, Average Loss: 0.2610
Epoch 24/50, Average Loss: 0.2608
Epoch 25/50, Average Loss: 0.2616
Epoch 26/50, Average Loss: 0.2591
Epoch 27/50, Average Loss: 0.2609
Epoch 28/50, Average Loss: 0.2584
Epoch 29/50, Average Loss: 0.2571
Epoch 30/50, Average Lo

- How can this be done better? That improves accuracy? (add those changes as comments)
    - bigger scale/more layers/neurons
    - dropout... (LEARN WHAT THIS IS)
    - 

- If this was time series how would you take that into account?