![image.png](https://i.imgur.com/a3uAqnb.png)

# Neural Network Regression for California Housing Prices - Homework Assignment

In this homework, you will implement a **Neural Network for regression** to predict the median value of California houses. This project will help you understand the fundamentals of neural networks applied to regression tasks.

## 📌 Project Overview
- **Task**: Predict median house values in California
- **Architecture**: Multi-layer Perceptron (MLP) for regression
- **Dataset**: California Housing dataset (provided)
- **Goal**: Build an accurate regression model using PyTorch

## 📚 Learning Objectives
By completing this assignment, you will:
- Understand neural networks for regression problems
- Learn data preprocessing and feature engineering
- Implement a custom neural network architecture
- Practice training, validation, and evaluation
- Learn about regression metrics and model performance


## 1️⃣ Initial Setup and Library Installation

**Task**: Set up the environment and install necessary libraries.

In [None]:
from IPython.display import clear_output

In [None]:
# Incase you run this notebook outside colab (where the libraries aren't already pre-installed)

# %pip install torch
# %pip install matplotlib
# %pip install scikit-learn

clear_output()


## 2️⃣ Import Libraries and Configuration

**Task**: Import all necessary libraries and set up configuration parameters.

**Requirements**:
- Import PyTorch and neural network modules
- Import data processing libraries (pandas, sklearn)
- Import visualization libraries
- Set random seeds for reproducibility
- Configure hyperparameters with reasonable values

In [None]:
import torch
import torch.nn as nn
import torch.optim as optim
import kagglehub
import os
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.preprocessing import LabelEncoder, StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score

In [None]:
torch.manual_seed(42)
np.random.seed(42)

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f"Using device: {device}")

BATCH_SIZE = 64          # Batch size for training
LEARNING_RATE = 0.001    # Learning rate for optimizer
NUM_EPOCHS = 100         # Number of training epochs
HIDDEN_SIZE = 128        # Size of hidden layers
NUM_HIDDEN_LAYERS = 3    # Number of hidden layers
VALIDATION_SPLIT = 0.3   # Validation split ratio

## 3️⃣ Data Loading and Exploration

**Task**: Load the California housing dataset and explore its structure.

**Requirements**:
- Download and load the dataset
- Display basic information about the data
- Check for missing values
- Understand the features and target variable

In [None]:
# Download latest version
path = kagglehub.dataset_download("camnugent/california-housing-prices")

print("Path to dataset files:", path)

In [None]:
# TODO: List files in the dataset directory

In [None]:
# TODO: Load the dataset
california_data = None

In [None]:
california_data.head()

In [None]:
print("\nMissing values:")
print(california_data.isnull().sum())
print("\nBasic statistics:")
california_data.describe()


## 4️⃣ Data Preprocessing and Feature Engineering

**Task**: Clean and prepare the data for neural network training.

**Requirements**:
- Handle missing values if any
- Encode categorical variables
- Scale numerical features
- Split features and target
- Create train/validation/test splits

In [None]:
california_data = california_data.copy()

In [None]:
# TODO: Handle missing values (fill with median for numerical columns)

# TODO: Feature engineering - create new features - optional

# TODO: Separate features from target

# TODO: Encode the categorical ocean_proximity column

# TODO: Scale the features for better neural network performance

# TODO: Convert to numpy arrays

# TODO: Split the data into train, validation, and test sets

# TODO: Convert to PyTorch tensors


## 5️⃣ Neural Network Architecture

**Task**: Design and implement a neural network for regression.

**Requirements**:
- Create a multi-layer perceptron (MLP)
- Use appropriate activation functions
- Ensure proper input/output dimensions
- Use suitable architecture for regression

In [None]:
class HousePricePredictor(nn.Module):
    def __init__(self, input_size, hidden_size, num_hidden_layers):
        super(HousePricePredictor, self).__init__()
        # TODO: Input layer

        # TODO: Hidden layers

        # TODO: Output layer (single neuron for regression)

        # TODO: Create sequential model
        pass

    def forward(self, x):
        # TODO: Forward pass through the network
        pass

# TODO: Initialize the model

# TODO: Test the model with a sample input



## 6️⃣ Loss Function and Optimizer

**Task**: Set up appropriate loss function and optimizer for regression.

**Requirements**:
- Choose suitable loss function for regression
- Initialize optimizer with hyperparameters

In [None]:
# TODO: Define loss function for regression

# TODO: Define optimizer


## 7️⃣ Training Loop with Validation

**Task**: Implement a comprehensive training loop with validation.

**Requirements**:
- Train the model for specified epochs
- Track training and validation losses
- Save the best model based on validation performance
- Display training progress

In [None]:
# TODO: Create DataLoader for batch processing

# TODO: Initialize tracking variables

# TODO: Training loop
for epoch in range(NUM_EPOCHS):
    # Training phase
    model.train()
    train_loss = 0.0

    for batch_X, batch_y in train_loader:
        batch_X, batch_y = batch_X.to(device), batch_y.to(device)

        # TODO: Forward pass

        # TODO: Backward pass and optimization


        train_loss += loss.item()

    # Validation phase
    model.eval()
    val_loss = 0.0

    with torch.no_grad():
        for batch_X, batch_y in val_loader:
            batch_X, batch_y = batch_X.to(device), batch_y.to(device)
            # TODO: Forward pass

            val_loss += loss.item()

    # Calculate average losses
    train_loss = train_loss / len(train_loader)
    val_loss = val_loss / len(val_loader)

    train_losses.append(train_loss)
    val_losses.append(val_loss)


    if val_loss < best_val_loss:
        best_val_loss = val_loss
        torch.save(model.state_dict(), 'best_model.pth')


    # TODO: Print progress





## 8️⃣ Model Evaluation and Testing

**Task**: Evaluate the trained model on the test set and calculate regression metrics.

**Requirements**:
- Make predictions on test set
- Calculate multiple regression metrics
- Analyze model performance

In [None]:
# TODO: Set the model to eval mode

# TODO: Make predictions on test set

# TODO: Calculate regression metrics


## 9️⃣ Visualization and Analysis

**Task**: Create visualizations to analyze model performance and training progress.

**Requirements**:
- Plot training and validation loss curves
- Analyze residuals

In [None]:
# TODO: Plot training and validation loss curves

# TODO: Plot predictions vs actual values

# TODO: Show some sample predictions


## 📝 Evaluation Criteria

Your homework will be evaluated based on:

1. **Implementation Correctness (50%)**
   - Proper neural network architecture
   - Correct data preprocessing and feature engineering(if any)
   - Working training loop with validation
   - Appropriate loss function and optimizer

2. **Model Performance (25%)**
   - Reasonable regression metrics (RMSE, MAE, R²)
   - Convergence during training
   - Generalization to test set

3. **Code Quality (25%)**
   - Clean, readable code with comments
   - Efficient implementation
   - Good coding practices
