<a href="https://colab.research.google.com/github/VicentePina7210/DataMiningCleaningExercise/blob/main/copy_of_neuralnetexercise.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Neural Network Implementation and Experimentation

Instructions:
1. This script loads a dataset using Pandas and implements a minimal neural network in PyTorch.
2. The default architecture is basic, but you will modify it to experiment with different choices.
3. Your goal is to observe the effects of architecture, activation functions, and optimization strategies on training.
4. Answer the experimental and deep-dive questions at the end based on your findings.

In [None]:
# Required Libraries
import pandas as pd
import torch
import torch.nn as nn
import torch.optim as optim
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

In [None]:
# ======================== 1. Load Data ========================

# Load any dataset of choice (ensure it's a regression dataset)
df = pd.read_csv("/content/sample_data/california_housing_train.csv")
df["median_house_value"] = df["median_house_value"] / 100_000
df.head(5)

In [None]:
# Assume last column is target, rest are features
x = df[["longitude", "latitude", "housing_median_age", "total_rooms", "total_bedrooms", "population", "households", "median_income"]].values
y = df["median_house_value"].values

# Convert to PyTorch tensors
x = torch.tensor(x, dtype=torch.float32)
y = torch.tensor(y, dtype=torch.float32).view(-1, 1)  # Ensure y is 2D

# Split data
x_train, x_valid, y_train, y_valid = train_test_split(x, y, test_size=0.2, random_state=42)

In [None]:
# ======================== 2. Define Minimal Neural Network ========================
class CustomNN(nn.Module):
    def __init__(self, input_dim):
        super(CustomNN, self).__init__()
        self.hidden_0 = nn.Linear(input_dim, 150)
        self.hidden_1 = nn.Linear(150,150)
        self.hidden_2 = nn.Linear(150,150)
        self.output = nn.Linear(150, 1)

    def forward(self, x):
        x = torch.relu(self.hidden_0(x))
        x = torch.relu(self.hidden_1(x))
        x = torch.relu(self.hidden_2(x))
        x = self.output(x)
        return x

# Initialize Model
input_dim = x.shape[1]
model = CustomNN(input_dim)

# Loss Function & Optimizer
loss_function = nn.MSELoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)


In [None]:
# ======================== 3. Training Loop ========================
epochs = 1500
train_losses = []
valid_losses = []

for epoch in range(epochs):
    # Forward Pass
    y_pred = model.forward(x_train)
    loss = loss_function(y_pred, y_train)

    # Backpropagation
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

    # Store loss for plotting
    train_losses.append(loss.item())
    with torch.no_grad():
        y_pred_valid = model.forward(x_valid)
        valid_loss = loss_function(y_pred_valid, y_valid)
        valid_losses.append(valid_loss.item())

    if epoch % 50 == 0:
        print(f"Epoch {epoch}, Loss: {loss.item():.4f}")

# Plot Loss Curve
plt.plot(train_losses)
plt.plot(valid_losses)
plt.legend(["Training Loss", "Validation Loss"])
plt.xlabel("Epochs")
plt.ylabel("Loss")
plt.title("Training Loss Curve")
plt.show()

In [None]:
# ======================== 4. Evaluation ========================
with torch.no_grad():
    y_pred_test = model(x_valid)
    test_loss = loss_function(y_pred_test, y_valid)
    print(f"Final Valid Loss: {test_loss.item():.4f}")

# Print some predicted vs actual prices
print("===== Example outputs =====")
for i in range(10):
    print(f"Predicted: ${100_000 * y_pred_test[i].item():.2f}, Actual:${100_000 * y_valid[i].item():.2f}")

**Part 1: Experimentation **
1. Change the number of neurons in the hidden layer. What is the effect and what is the ideal number?

It seems like the higher the number of neurons in the network the better it performs, maybe because it is able to pick up on patterns better. It also minimizes the loss, at 1000 neurons it does not perform well, 100 seems to be better

2. Add a few extra layers in the neural network. How does this affect performance and what is a good number?
adding more layers into the network made the models plot look like it was more accurate, but the overall loss was still higher than what would be ideal. I found that 2 layers work nicely


3. Play around with the learning rate, how does this affect learning. What is an ideal number?

Making the learning rate too high makes the model take large steps in either direction when it comes to the error graph, then the model predicts the same thing for many inputs, it is likely it preents it from picking up a pattern accurately perhaps it is taking too many large of steps.
Making the learning rate too small, made

4. Play around with the number of epochs, how does this affect learning. What is an ideal number?

It seems that it is nice to have a larger number of epochs, it allows the model to make more iterations and learn the data better. However I noticed that adding too many epochs can cause the model to overfit.

*** EXTRA CREDIT***
After some trial and error and logging the results it seems that for the best model with the best looking curve, 150 neurons, 2 layers, lr =.001 and epoch of 1500 was the best with a high .4 valid loss. After the layers increased too much or the epochs went over 1500 the lowest valid loss I got was in the high .3.

To avoid overfitting I left the code as is with 250 neurons, 2 layers and 1500 epochs with a .001 learning rate. This resulted in a nice curve and a model that generalized well without overfitting. Valid loss = 4.627
******************


5. Try using Tanh, Sigmoid, or LeakyReLU instead of ReLU. How does the activation function affect learning?
I think this question is worded wrong because we are already using Sigmoid

With ReLu, the functions learning curve was a lot steeper and once it got to a spot where not much more adjustment was needed, it stayed flat. The initial loss was also much higher at the start, and ReLU was also much faster than sigmoid and tanh

With tanh the learning curve was much smoother, but the final valid loss was higher than both sigmoid and ReLU. It was also much slower, it does not seem useful for this case. Maybe it would be useful for some scenarios that require minor fine tuned adjustment




6. Try removing all activation functions. How does this affect learning? What is happening mathematically?
Without the activation functions, there is no longer a threshold for neurons to be activated. The model ends up becoming linear, adding layers does not help. Overall the model becomes simplified.


7. What is dropout in neural networks? Use dropout after the hidden layer. How does it affect performance?



8. How can modifying the scale of the input values help learning?



9. Plot the predicted values vs. actual values. Are there any patterns you notice? What are ways this can be addressed?



**Part 2: Deeper Questions**
1. Why do activation functions matter? Why wouldn't we use an activation function on the output layer for regression?

Activation functions keep the outputs within a realistic range, for continous values, sometimes the range is unpredictable or should not have much of a bound. For example the price of houses in the housing market. The price should not be set to a bound between 0 and 1

2. What’s the tradeoff between using more layers vs. a single-layer model?

The more layers there are in a model, the better its ability to learn more complex patterns, such as deep learning  and image analyzation.
Single layer models are good for smaller datasets and for a quicker compilation if resources are in question.


3. How does the choice of optimizer affect convergence? Why do some optimizers work better than others?

The optimizer you choose, can influence how fast and how well the model learns. Some optimizers have built in variable learning rates while others such as the one we are using, have a fixed learning rate. The ones with a built in learning rate typically move towards the local minimum faster.


4. If you see overfitting, what changes can you make to the model?

To combat overfitting, the first thing in most cases would be to add more data if available. In this case its not an option, so for this model we can use dropout, to set some neurons to 0, this can make the model learn more general patterns instead of overfitting.


5. In a real-world scenario, how would you determine if a neural network is the right model for a regression problem compared to, say, a Random Forest?


