# PyTorch Regression Tutorial: Predicting California Housing Prices

## Introduction:

In this tutorial, we will walk through the process of using PyTorch, a popular deep learning framework, to build a regression model for predicting housing prices in California. The dataset we'll be using contains various features like the median income, housing median age, average rooms, etc., for different blocks in California. Our goal is to predict the median housing price for these blocks.

## Step 1: Import Necessary Libraries
Before we begin, we need to import the necessary libraries that will be used throughout this tutorial.

In [None]:
import torch
import torch.nn as nn
import torch.optim as optim
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from torch.utils.data import DataLoader, TensorDataset

## Step 2: Load and Preprocess the Data
The California housing dataset is a popular dataset for regression tasks. It contains data about housing in California and is used to predict the median house value for California districts.

In [None]:
# Load data
data = fetch_california_housing()
X, y = data.data, data.target
y = y.reshape(-1, 1)  # Convert to 2D array for compatibility with PyTorch

# Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Standardize the data to have zero mean and unit variance
scaler_X = StandardScaler().fit(X_train)
X_train = scaler_X.transform(X_train)
X_test = scaler_X.transform(X_test)

## Step 3: Prepare Data Loaders
Data loaders in PyTorch allow for efficient data loading and batching.

In [None]:
# Convert data to PyTorch tensors
X_train_tensor = torch.tensor(X_train, dtype=torch.float32)
y_train_tensor = torch.tensor(y_train, dtype=torch.float32)

# Create a DataLoader for batching and shuffling
train_dataset = TensorDataset(X_train_tensor, y_train_tensor)
train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True)

## Step 4: Define the Neural Network Model
We will define a simple feed-forward neural network with three layers for our regression task.

In [None]:
class MyModel(nn.Module):
    def __init__(self):
        super(MyModel, self).__init__()
        self.net = nn.Sequential(
            nn.Linear(8, 64),
            nn.ReLU(),
            nn.Linear(64, 32),
            nn.ReLU(),
            nn.Linear(32, 1)
        )

    def forward(self, x):
        return self.net(x)

Here, we have defined a class MyModel that inherits from nn.Module. This model consists of three linear layers with ReLU activation functions in between.

## Step 5: Define Loss Function and Optimizer
For our regression task, we will use the Mean Squared Error (MSE) as the loss function. The optimizer we'll use is Stochastic Gradient Descent (SGD).

In [None]:
model = MyModel()
criterion = nn.MSELoss()
optimizer = optim.SGD(model.parameters(), lr=0.01)

## Step 6: Train the Model
Now, we will train our model using the training data.

In [None]:
num_epochs = 100
for epoch in range(num_epochs):
    for i, (inputs, labels) in enumerate(train_loader):
        outputs = model(inputs)
        loss = criterion(outputs, labels)

        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

    if (epoch+1) % 10 == 0:
        print(f'Epoch [{epoch+1}/{num_epochs}], Loss: {loss.item():.4f}')


Epoch [10/100], Loss: 0.6150
Epoch [20/100], Loss: 0.3312
Epoch [30/100], Loss: 0.1732
Epoch [40/100], Loss: 0.3570
Epoch [50/100], Loss: 0.3490
Epoch [60/100], Loss: 0.2939
Epoch [70/100], Loss: 0.3566
Epoch [80/100], Loss: 0.3910
Epoch [90/100], Loss: 0.1870
Epoch [100/100], Loss: 0.2011


## Step 7: Test the Model
After training, it's essential to evaluate the model's performance on unseen data.

In [None]:
model.eval()
with torch.no_grad():
    X_test_tensor = torch.tensor(X_test, dtype=torch.float32)
    y_test_tensor = torch.tensor(y_test, dtype=torch.float32)
    test_outputs = model(X_test_tensor)
    test_loss = criterion(test_outputs, y_test_tensor)
    print(f'Test Loss: {test_loss.item():.4f}')


Test Loss: 0.2876


## Step 8: Visualize Predictions
After evaluating the model's overall performance, it's often useful to visualize some of the predictions to get a sense of how well the model is doing on individual data points.
This code will print the predicted and actual values for the first five samples in the test set. By comparing these values, you can get a sense of how close the model's predictions are to the actual values.

In [None]:
# Get predictions for a subset of the test data
sample_inputs = X_test_tensor[:5]
sample_labels = y_test[:5]
sample_outputs = model(sample_inputs)

for i in range(len(sample_inputs)):
    print(f"Predicted: {sample_outputs[i][0]:.2f}, Actual: {sample_labels[i][0]:.2f}")


Predicted: 0.58, Actual: 0.48
Predicted: 1.14, Actual: 0.46
Predicted: 4.53, Actual: 5.00
Predicted: 2.57, Actual: 2.19
Predicted: 2.93, Actual: 2.78


## Conclusion:
In this tutorial, we walked through the process of building a regression model using PyTorch. We used the California housing dataset, preprocessed the data, defined a neural network model, trained it, evaluated its performance, and visualized some of its predictions. This serves as a foundational example for anyone looking to get started with regression tasks using deep learning. As always, there's room for improvement, and one can experiment with different architectures, optimizers, and other hyperparameters to achieve better results. The visualization step provides a more granular view of the model's performance, helping to identify areas for further refinement.