## Forecast California Housing Prices

Bronwyn Bowles-King

### Practical task 1: Long Short-Term Memory (LSTM) Neural Network

In [33]:
# Import libraries
import pandas as pd
import numpy as np
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import TensorDataset, DataLoader
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import mean_squared_error

# Dataset
from sklearn.datasets import fetch_california_housing

In [34]:
# Set random seeds fopr NumPy and PyTorch data processing
np.random.seed(50)
torch.manual_seed(50)

<torch._C.Generator at 0x17ef6a16350>

In [35]:
# Load the dataset
data = fetch_california_housing()
features = pd.DataFrame(data.data, columns=data.feature_names)
targets = pd.Series(data.target, name='Price').values.astype(float).reshape(-1, 1)

# Combine features and target into one DataFrame to preview the data
df_full = features.copy()
df_full['Price'] = targets.flatten()

print(df_full.head())

print(type(data))

   MedInc  HouseAge  AveRooms  AveBedrms  Population  AveOccup  Latitude  \
0  8.3252      41.0  6.984127   1.023810       322.0  2.555556     37.88   
1  8.3014      21.0  6.238137   0.971880      2401.0  2.109842     37.86   
2  7.2574      52.0  8.288136   1.073446       496.0  2.802260     37.85   
3  5.6431      52.0  5.817352   1.073059       558.0  2.547945     37.85   
4  3.8462      52.0  6.281853   1.081081       565.0  2.181467     37.85   

   Longitude  Price  
0    -122.23  4.526  
1    -122.22  3.585  
2    -122.24  3.521  
3    -122.25  3.413  
4    -122.25  3.422  
<class 'sklearn.utils._bunch.Bunch'>


**Questions**

1. What is a `Bunch`?
2. What is the default type for `data`?
3. What is the default type for `target`?

**Answers**

1. A Bunch is a container object used for scikit-learn functions to store datasets. It behaves like a Python dictionary so that there are keys and attributes to access the data.
2. Printing the type of data above that we loaded shows that it is a Bunch. By default, the data attribute in a Bunch is a NumPy array, unless the function is called with different parameters.
3. The target attribute is also by default a NumPy array unless different parameters are set (scikit-learn developers, 2025).

### 1.1 Data cleaning and preprocessing

In [36]:
def create_sequences(data, targets, seq_length):
   X, y = [], []
   for i in range(len(data) - seq_length):
       X.append(data[i:i+seq_length])
       y.append(targets[i+seq_length])
   return np.array(X), np.array(y)

features = data.data         # shape (n_samples, n_features)
targets = data.target        # shape (n_samples,)

scaler_features = MinMaxScaler(feature_range=(0, 1))
features_normalized = scaler_features.fit_transform(features)

targets = targets.reshape(-1, 1)  # 2D shape for the scaler
scaler_targets = MinMaxScaler(feature_range=(0, 1))
targets_normalized = scaler_targets.fit_transform(targets).flatten()

seq_length = 20
X, y = create_sequences(features_normalized, targets_normalized, seq_length)

train_size = int(len(X) * 0.67)
X_train, X_test = X[:train_size], X[train_size:]
y_train, y_test = y[:train_size], y[train_size:]

# Convert to tensors
X_train_tensor = torch.tensor(X_train, dtype=torch.float32)
y_train_tensor = torch.tensor(y_train, dtype=torch.float32).unsqueeze(1)

# Create TensorDataset
train_dataset = TensorDataset(X_train_tensor, y_train_tensor)

# Create DataLoader
loader = DataLoader(train_dataset, batch_size=32, shuffle=True)

### 1.2 Create and run a baseline LSTM model with L2 regularisation

In [37]:
class LSTMModel(nn.Module):
    """
    LSTMModel is a Long Short-Term Memory (LSTM) neural network for sequence regression tasks.

    Attributes:
        lstm (nn.LSTM): The LSTM layer(s) that process input sequences.
        fc (nn.Linear): Fully connected layer to produce output predictions from LSTM output.

    Arguments:
        input_size (int): Number of input features per time step.
        hidden_size (int): Number of features in the hidden state of the LSTM.
        output_size (int): Number of output features (e.g., 1 for regression).
        num_layers (int): Number of recurrent LSTM layers stacked.

    Forward method:
        Processes input tensor of shape (batch_size, sequence_length, input_size) through LSTM layers,
        applies a linear layer on the last time step's output to produce predictions of shape
        (batch_size, output_size).
        """


    def __init__(self, input_size, hidden_size, output_size, num_layers):
        super(LSTMModel, self).__init__()
        self.lstm = nn.LSTM(input_size=input_size,
                            hidden_size=hidden_size,
                            num_layers=num_layers,
                            batch_first=True)
        self.fc = nn.Linear(hidden_size, output_size)


    def forward(self, x):
        output, _ = self.lstm(x) # output[:, -1, :] takes the last output in the sequence
        out = self.fc(output[:, -1, :])
        return out

In [38]:
# Set baseline hyperparameters
input_size = X.shape[2]  # Set as the same as the number of features
hidden_size = 64
output_size = 1
num_layers = 2
num_epochs = 100
learning_rate = 0.001
l2_regularization = 0.001

In [39]:
# Instantiate model
model = LSTMModel(input_size, hidden_size, output_size, num_layers)

# Loss function
criterion = nn.MSELoss()

# Optimiser with L2 regularisation
optimizer = torch.optim.Adam(model.parameters(),
                             lr=learning_rate,
                             weight_decay=l2_regularization)

In [40]:
for epoch in range(num_epochs):
    model.train()  # Training mode
    total_loss = 0

    for batch_X, batch_y in loader:
        optimizer.zero_grad()  # Reset gradients
        output = model(batch_X)  # Forward pass
        loss = criterion(output, batch_y)  # Compute loss
        loss.backward()  # Backpropagation
        optimizer.step()

        total_loss += loss.item() * batch_X.size(0)

    avg_loss = total_loss / len(data)

    if (epoch + 1) % 5 == 0:
      print(f"Epoch {epoch + 1}/{num_epochs} | Loss: {avg_loss:.4f}")

Epoch 5/100 | Loss: 58.5172
Epoch 10/100 | Loss: 57.0067
Epoch 15/100 | Loss: 56.0682
Epoch 20/100 | Loss: 54.4676
Epoch 25/100 | Loss: 53.3761
Epoch 30/100 | Loss: 51.9395
Epoch 35/100 | Loss: 51.4691
Epoch 40/100 | Loss: 51.3245
Epoch 45/100 | Loss: 51.1690
Epoch 50/100 | Loss: 50.7187
Epoch 55/100 | Loss: 50.5893
Epoch 60/100 | Loss: 50.5240
Epoch 65/100 | Loss: 50.8703
Epoch 70/100 | Loss: 49.9330
Epoch 75/100 | Loss: 50.1615
Epoch 80/100 | Loss: 49.8897
Epoch 85/100 | Loss: 49.7979
Epoch 90/100 | Loss: 49.8755
Epoch 95/100 | Loss: 49.7951
Epoch 100/100 | Loss: 49.8809


In [41]:
# Convert test data to tensors
X_test_tensor = torch.tensor(X_test, dtype=torch.float32)
y_test_tensor = torch.tensor(y_test, dtype=torch.float32).unsqueeze(1)

model.eval()  # Set model to evaluation mode
with torch.no_grad():  # No gradient computation here
    y_pred_tensor = model(X_test_tensor)
    y_pred = y_pred_tensor.numpy()
    y_true = y_test_tensor.numpy()

# Reshape to 2D for inverse_transform
y_pred_unscaled_1 = scaler_targets.inverse_transform(y_pred.reshape(-1, 1))
y_true_unscaled_1 = scaler_targets.inverse_transform(y_true.reshape(-1, 1))

# Calculate baseline model's MSE
mse_1 = mean_squared_error(y_true_unscaled_1, y_pred_unscaled_1)
print(f"Test set MSE for LSTM with L2 regularisation: {mse_1:.4f}")
print(f"Average error margin: ${np.sqrt(mse_1) * 100000:.0f}")

Test set MSE for LSTM with L2 regularisation: 0.6148
Average error margin: $78410


Although random seeds were set, there is an element of randomness in the process. There can be some variation in the results at times due to factors such as non-determinism (Geeks4Geeks, 2025a). The test set result of 0.6148 above for the MSE concerns the target variable, which are median house values (in dollars) in the dataset. The typical squared error between predictions and the original data is the square root of 0.6148, which is around 0.7841. So, the average prediction is off by about $78 400. 

This is not a good sign as the median house prices in this dataset range from \$45 800 to \$500 000. The average difference between actual and predicted values is too great. We can also see the loss values tracked above for the baseline model with L2 regularisation. They remain high and much the same (~49) across all epochs, so the model is not learning over training rounds.

### 1.3 Run LSTM model without L2 regularisation

The LSTM model is now run without L2 regularisation, which will likely make the model perform poorly, but this needs to be checked in any case to see the effect of L2 regularisation.

In [42]:
# Optimiser without L2 regularisation (weight_decay=0)
optimizer = torch.optim.Adam(model.parameters(),
                             lr=learning_rate,
                             weight_decay=0)

for epoch in range(num_epochs):
    model.train()  # Training mode
    total_loss = 0

    for batch_X, batch_y in loader:
        optimizer.zero_grad()
        output = model(batch_X)
        loss = criterion(output, batch_y)
        loss.backward()
        optimizer.step()

        total_loss += loss.item() * batch_X.size(0)

    avg_loss = total_loss / len(data)

    if (epoch + 1) % 5 == 0:
      print(f"Epoch {epoch + 1}/{num_epochs} | Loss: {avg_loss:.4f}")

Epoch 5/100 | Loss: 47.6477
Epoch 10/100 | Loss: 46.1644
Epoch 15/100 | Loss: 45.4000
Epoch 20/100 | Loss: 44.7677
Epoch 25/100 | Loss: 44.2369
Epoch 30/100 | Loss: 43.0339
Epoch 35/100 | Loss: 42.7067
Epoch 40/100 | Loss: 41.4925
Epoch 45/100 | Loss: 41.1778
Epoch 50/100 | Loss: 40.5651
Epoch 55/100 | Loss: 39.5890
Epoch 60/100 | Loss: 38.7754
Epoch 65/100 | Loss: 38.0005
Epoch 70/100 | Loss: 37.5138
Epoch 75/100 | Loss: 36.4070
Epoch 80/100 | Loss: 35.6938
Epoch 85/100 | Loss: 34.6950
Epoch 90/100 | Loss: 33.6920
Epoch 95/100 | Loss: 32.8615
Epoch 100/100 | Loss: 32.4240


In [43]:
X_test_tensor = torch.tensor(X_test, dtype=torch.float32)
y_test_tensor = torch.tensor(y_test, dtype=torch.float32).unsqueeze(1)

model.eval()
with torch.no_grad():
    y_pred_tensor = model(X_test_tensor)
    y_pred = y_pred_tensor.numpy()
    y_true = y_test_tensor.numpy()

y_pred_unscaled_2 = scaler_targets.inverse_transform(y_pred.reshape(-1, 1))
y_true_unscaled_2 = scaler_targets.inverse_transform(y_true.reshape(-1, 1))

mse_2 = mean_squared_error(y_true_unscaled_2, y_pred_unscaled_2)
print(f"Test set MSE for LSTM without L2 regularisation: {mse_2:.4f}")
print(f"Average error margin: ${np.sqrt(mse_2) * 100000:.0f}")

Test set MSE for LSTM without L2 regularisation: 0.7466
Average error margin: $86405


When L2 regularisation is not included, the LSTM model performs more poorly than when it is included, with an MSE of 0.7466. The model's predictions are now off by around $86 400 on average, a wider error than the same model but with L2 regularisation that was run in section 1.2.

### 1.4 Re-run LSTM model with L2 regularisation and hyperparameter tuning

In one experiment, the size of the hidden layer (128) and number of epochs (200) were doubled in the training run, and the learning rate was increased to 0.01. This LSTM model, which included L2 regularisation performed much the same, with an MSE of 0.6799, which is an average error of $82 400, similar to the previous two attempts. 

The model was also trained for 500 epochs to give it more time to learn while keeping the other parameters the same as the baseline. However, this did not help, as the MSE is worse (0.7203). Thus, a high learning rate, larger hidden layer and increased learning epochs are not necessarily effective.

When experimenting by increasing the learning rate or L2 regularisation to around 0.1 and 0.01, the model's performance also deteriorated rapidly. The error was very large in this case, and so a faster learning rate and a regularisation value that is too high does not help.

I decided to test the model for different L2 values, holding all the values the same as the baseline, and the results are as below.

| L2 Regularisation | MSE    |
|-------------------|--------|
| 0.002             | 0.6822 |
| 0.001             | 0.6004 |
| 0.0005            | 0.5813 |
| 0.0001            | 0.5412 |
| 0.00001           | 0.6315 |


When the L2 value is decreased, the MSE improves, but only to a certain point (0.0001), where the value is too small to make any meaningful difference and has the opposite effect.

After settling on an L2 value of 0.0001, and dropping the training rounds to 70, the MSE finally reached the lowest at 0.552, which is shown below. This is an error margin of about $74 300. Again, although random seeds were set, there is still randomness in the process and there can be variation in the results when rerun. 

In [44]:
input_size = X.shape[2]
hidden_size = 64
output_size = 1
num_layers = 2
num_epochs = 70
learning_rate = 0.001
l2_regularization = 0.0001

# Optimiser with L2 regularisation
optimizer = torch.optim.Adam(model.parameters(),
                             lr=learning_rate,
                             weight_decay=l2_regularization)

for epoch in range(num_epochs):
    model.train()
    total_loss = 0

    for batch_X, batch_y in loader:
        optimizer.zero_grad()
        output = model(batch_X)
        loss = criterion(output, batch_y)
        loss.backward()
        optimizer.step()

        total_loss += loss.item() * batch_X.size(0)

    avg_loss = total_loss / len(data)
    if (epoch + 1) % 5 == 0:
      print(f"Epoch {epoch + 1}/{num_epochs} | Loss: {avg_loss:.4f}")

Epoch 5/70 | Loss: 41.8525
Epoch 10/70 | Loss: 42.8302
Epoch 15/70 | Loss: 43.3365
Epoch 20/70 | Loss: 43.4540
Epoch 25/70 | Loss: 43.6418
Epoch 30/70 | Loss: 43.8008
Epoch 35/70 | Loss: 43.7122
Epoch 40/70 | Loss: 44.3713
Epoch 45/70 | Loss: 44.2916
Epoch 50/70 | Loss: 44.0099
Epoch 55/70 | Loss: 43.8977
Epoch 60/70 | Loss: 44.1508
Epoch 65/70 | Loss: 43.6155
Epoch 70/70 | Loss: 43.8619


In [45]:
X_test_tensor = torch.tensor(X_test, dtype=torch.float32)
y_test_tensor = torch.tensor(y_test, dtype=torch.float32).unsqueeze(1)

model.eval()
with torch.no_grad():
    y_pred_tensor = model(X_test_tensor)
    y_pred = y_pred_tensor.numpy()
    y_true = y_test_tensor.numpy()

y_pred_unscaled_3 = scaler_targets.inverse_transform(y_pred.reshape(-1, 1))
y_true_unscaled_3 = scaler_targets.inverse_transform(y_true.reshape(-1, 1))

mse_3 = mean_squared_error(y_true_unscaled_3, y_pred_unscaled_3)
print(f"Test set MSE: {mse_3:.4f}")
print(f"Average error margin: ${np.sqrt(mse_3) * 100000:.0f}")

Test set MSE: 0.5520
Average error margin: $74295


## Practical task 2: Gated Recurrent Unit (GRU) Neural Network

### 2.1 Run GRU model with L2 regularisation

In [46]:
class GRUModel(nn.Module):
    """
    GRUModel is a Gated Recurrent Unit (GRU) neural network for sequence regression.

    Attributes:
        gru (nn.GRU): The GRU layer(s) that process input sequences.
        fc (nn.Linear): Fully connected layer to produce output predictions from GRU output.

    Arguments:
        input_size (int): Number of input features per time step.
        hidden_size (int): Number of features in the hidden state of the GRU.
        output_size (int): Number of output features (e.g., 1 for regression).
        num_layers (int): Number of recurrent GRU layers stacked.

    Forward method:
        Processes input tensor of shape (batch_size, sequence_length, input_size) through GRU layers,
        then applies a linear layer on the last time step's output to produce predictions of shape
        (batch_size, output_size).
    """

    def __init__(self, input_size, hidden_size, output_size, num_layers):
        super(GRUModel, self).__init__()
        self.gru = nn.GRU(input_size=input_size,
                          hidden_size=hidden_size,
                          num_layers=num_layers,
                          batch_first=True)
        self.fc = nn.Linear(hidden_size, output_size)


    def forward(self, x):
        output, _ = self.gru(x)
        out = self.fc(output[:, -1, :])
        return out


In [47]:
# Use the same baseline parameters so that the two models can be compared

input_size = X.shape[2]
hidden_size = 64
output_size = 1
num_layers = 2
num_epochs = 100
learning_rate = 0.001
l2_regularization = 0.001

model = GRUModel(input_size, hidden_size, output_size, num_layers)
model = model
criterion = nn.MSELoss()
optimizer = torch.optim.Adam(model.parameters(),
                             lr=learning_rate,
                             weight_decay=l2_regularization)

In [48]:
for epoch in range(num_epochs):
    model.train()
    total_loss = 0

    for batch_X, batch_y in loader:
        optimizer.zero_grad()
        output = model(batch_X)
        loss = criterion(output, batch_y)
        loss.backward()
        optimizer.step()
        total_loss += loss.item() * batch_X.size(0)

    avg_loss = total_loss / len(train_dataset)
    if (epoch + 1) % 5 == 0:
      print(f"Epoch {epoch + 1}/{num_epochs} | Loss: {avg_loss:.4f}")

Epoch 5/100 | Loss: 0.0267
Epoch 10/100 | Loss: 0.0263
Epoch 15/100 | Loss: 0.0259
Epoch 20/100 | Loss: 0.0259
Epoch 25/100 | Loss: 0.0251
Epoch 30/100 | Loss: 0.0240
Epoch 35/100 | Loss: 0.0239
Epoch 40/100 | Loss: 0.0234
Epoch 45/100 | Loss: 0.0231
Epoch 50/100 | Loss: 0.0230
Epoch 55/100 | Loss: 0.0227
Epoch 60/100 | Loss: 0.0229
Epoch 65/100 | Loss: 0.0226
Epoch 70/100 | Loss: 0.0226
Epoch 75/100 | Loss: 0.0226
Epoch 80/100 | Loss: 0.0225
Epoch 85/100 | Loss: 0.0223
Epoch 90/100 | Loss: 0.0223
Epoch 95/100 | Loss: 0.0223
Epoch 100/100 | Loss: 0.0221


In [49]:
X_test_tensor = torch.tensor(X_test, dtype=torch.float32)
y_test_tensor = torch.tensor(y_test, dtype=torch.float32).unsqueeze(1)

model.eval()
with torch.no_grad():
    y_pred_tensor = model(X_test_tensor)
    y_pred = y_pred_tensor.numpy()
    y_true = y_test_tensor.numpy()

y_pred_unscaled_4 = scaler_targets.inverse_transform(y_pred.reshape(-1, 1))
y_true_unscaled_4 = scaler_targets.inverse_transform(y_true.reshape(-1, 1))

mse_4 = mean_squared_error(y_true_unscaled_4, y_pred_unscaled_4)
print(f"Test set MSE for GRU with L2 regularisation: {mse_4:.4f}")
print(f"Average error margin: ${np.sqrt(mse_4) * 100000:.0f}")

Test set MSE for GRU with L2 regularisation: 0.6513
Average error margin: $80703


Running the GRU model with L2 regularisation shows an error loss of around 0.02 over the epochs run. The MSE for this model is 0.6513, so that the margin for error from the mean is about $80 700. This model is not learning very well yet.

### 2.2 Run the GRU model without L2 regularisation

In [50]:
optimizer = torch.optim.Adam(model.parameters(),
                             lr=learning_rate,
                             weight_decay=0)

for epoch in range(num_epochs):
    model.train()
    total_loss = 0

    for batch_X, batch_y in loader:
        optimizer.zero_grad()
        output = model(batch_X)
        loss = criterion(output, batch_y)
        loss.backward()
        optimizer.step()
        total_loss += loss.item() * batch_X.size(0)

    avg_loss = total_loss / len(train_dataset)
    if (epoch + 1) % 5 == 0:
        print(f"Epoch {epoch + 1}/{num_epochs} | Loss: {avg_loss:.4f}")

X_test_tensor = torch.tensor(X_test, dtype=torch.float32)
y_test_tensor = torch.tensor(y_test, dtype=torch.float32).unsqueeze(1)

Epoch 5/100 | Loss: 0.0212
Epoch 10/100 | Loss: 0.0205
Epoch 15/100 | Loss: 0.0201
Epoch 20/100 | Loss: 0.0196
Epoch 25/100 | Loss: 0.0189
Epoch 30/100 | Loss: 0.0185
Epoch 35/100 | Loss: 0.0182
Epoch 40/100 | Loss: 0.0177
Epoch 45/100 | Loss: 0.0175
Epoch 50/100 | Loss: 0.0169
Epoch 55/100 | Loss: 0.0164
Epoch 60/100 | Loss: 0.0159
Epoch 65/100 | Loss: 0.0153
Epoch 70/100 | Loss: 0.0146
Epoch 75/100 | Loss: 0.0137
Epoch 80/100 | Loss: 0.0128
Epoch 85/100 | Loss: 0.0121
Epoch 90/100 | Loss: 0.0108
Epoch 95/100 | Loss: 0.0100
Epoch 100/100 | Loss: 0.0092


In [51]:
model.eval()
with torch.no_grad():
    y_pred_tensor = model(X_test_tensor)
    y_pred = y_pred_tensor.numpy()
    y_true = y_test_tensor.numpy()

y_pred_unscaled_5 = scaler_targets.inverse_transform(y_pred.reshape(-1, 1))
y_true_unscaled_5 = scaler_targets.inverse_transform(y_true.reshape(-1, 1))

mse_5 = mean_squared_error(y_true_unscaled_5, y_pred_unscaled_5)
print(f"Test set MSE for GRU without L2 regularisation: {mse_5:.4f}")
print(f"Average error margin: ${np.sqrt(mse_5) * 100000:.0f}")

Test set MSE for GRU without L2 regularisation: 0.7343
Average error margin: $85693


**a. Has performance gotten worse in both models?**

Yes, in both cases the LSTM and GRU models perform worse without L2 regularisation.

**b. What is the importance of regularisation for optimising the efficiency of models?**

Regularisation is important for optimising the efficiency of a model and how well it can generalise to unseen data. This applies to both LSTM and GRU neural networks. Regularisation adds a penalty to the model's loss function, such as the sum of weights when L1 regularisation is applied or the sum of squared weights for L2. This gets the model to avoid fitting noise or patterns from the training data that are unhelpful for the task the model was designed for. The model is less likely to 'memorise' or hold onto the wrong data, and more likely to learn the underlying trends that will allow it to generalise to new data (Lawton, 2024).

Regularisation, particularly L2, keeps parameters from growing too large by reducing the size of weights. Smaller weights typically mean simpler models, which are less likely to be highly sensitive to small variations in the data, so they are less likely to overfit, and they are generally more robust and stable (Geeks4Geeks, 2025b).

### 2.3 Re-run GRU model with L2 regularisation and hyperparameter tuning

The GRU model responded similarly to the LSTM model when testing different hyperparameters. It was found that a learning rate or L2 regularisation value over 0.001 also made the GRU model perform poorly. More training rounds and hidden layers equally did not help the GRU model adapt. 

With 70 epochs and an L2 value of 0.0001, the MSE finally reached 0.5387. This is an error margin of about $73 300. Focusing on training the model further with examples where it performs most poorly may be the best next step at this stage.

In [52]:
input_size = X.shape[2]
hidden_size = 64
output_size = 1
num_layers = 2
num_epochs = 70
learning_rate = 0.001
l2_regularization = 0.0001

optimizer = torch.optim.Adam(model.parameters(),
                             lr=learning_rate,
                             weight_decay=l2_regularization)

for epoch in range(num_epochs):
    model.train()
    total_loss = 0

    for batch_X, batch_y in loader:
        optimizer.zero_grad()
        output = model(batch_X)
        loss = criterion(output, batch_y)
        loss.backward()
        optimizer.step()
        total_loss += loss.item() * batch_X.size(0)

    avg_loss = total_loss / len(train_dataset)
    if (epoch + 1) % 5 == 0:
      print(f"Epoch {epoch + 1}/{num_epochs} | Loss: {avg_loss:.4f}")

Epoch 5/70 | Loss: 0.0179
Epoch 10/70 | Loss: 0.0191
Epoch 15/70 | Loss: 0.0194
Epoch 20/70 | Loss: 0.0196
Epoch 25/70 | Loss: 0.0195
Epoch 30/70 | Loss: 0.0194
Epoch 35/70 | Loss: 0.0196
Epoch 40/70 | Loss: 0.0195
Epoch 45/70 | Loss: 0.0195
Epoch 50/70 | Loss: 0.0195
Epoch 55/70 | Loss: 0.0194
Epoch 60/70 | Loss: 0.0196
Epoch 65/70 | Loss: 0.0195
Epoch 70/70 | Loss: 0.0193


In [53]:
X_test_tensor = torch.tensor(X_test, dtype=torch.float32)
y_test_tensor = torch.tensor(y_test, dtype=torch.float32).unsqueeze(1)

model.eval()
with torch.no_grad():
    y_pred_tensor = model(X_test_tensor)
    y_pred = y_pred_tensor.numpy()
    y_true = y_test_tensor.numpy()

y_pred_unscaled_6 = scaler_targets.inverse_transform(y_pred.reshape(-1, 1))
y_true_unscaled_6 = scaler_targets.inverse_transform(y_true.reshape(-1, 1))

mse_6 = mean_squared_error(y_true_unscaled_6, y_pred_unscaled_6)
print(f"Test set MSE for GRU with L2 regularisation: {mse_6:.4f}")
print(f"Average error margin: ${np.sqrt(mse_6) * 100000:.0f}")

Test set MSE for GRU with L2 regularisation: 0.5387
Average error margin: $73397


**References**

CodeSignal. (2025). Data Handling: Preparing the California Housing Dataset. https://codesignal.com/learn/courses/building-and-applying-your-neural-network-library/lessons/data-handling-preparing-the-california-housing-dataset

Geeks4Geeks. (2025a). Difference between Deterministic and Non-deterministic Algorithms. https://www.geeksforgeeks.org/dsa/difference-between-deterministic-and-non-deterministic-algorithms

Geeks4Geeks. (2025b). Start learning PyTorch for Beginners. https://www.geeksforgeeks.org/python/start-learning-pytorch-for-beginners

Geeks4Geeks. (2025c). Regularization in Machine Learning. https://www.geeksforgeeks.org/machine-learning/regularization-in-machine-learning

Geeks4Geeks. (2025d). Why Do We Need to Call zero_grad() in PyTorch? https://www.geeksforgeeks.org/deep-learning/why-do-we-need-to-call-zerograd-in-pytorch

Lawton, G. (2024). Machine learning regularization explained with examples. https://www.techtarget.com/searchenterpriseai/feature/Machine-learning-regularization-explained-with-examples

noplaxochia. (2024). LSTM from scratch. Medium. https://medium.com/@wangdk93/lstm-from-scratch-c8b4baf06a8b

Pace, R. K., & Barry, R. (1997). Sparse spatial autoregressions. *Statistics & Probability Letters*, 33(3), 291-297.

scikit-learn. (2025a). Bunch. https://scikit-learn.org/stable/modules/generated/sklearn.utils.Bunch.html

scikit-learn. (2025b). fetch_california_housing. https://scikit-learn.org/stable/modules/generated/sklearn.datasets.fetch_california_housing.html
