- let's build some more complex Artificial Neural Network (ANN) examples using PyTorch. We'll increase the complexity by adding more layers, introducing techniques like dropout for regularization, and incorporating a validation loop into our training process.

#### A. More Complex ANN for Classification: FashionMNIST Dataset

The FashionMNIST dataset is a good step up from MNIST. It consists of 28x28 grayscale images of 10 different fashion categories (e.g., T-shirt, trouser, pullover, dress). It's more challenging than MNIST, requiring a slightly more capable network.

Complexity Additions:

- Deeper network (2 hidden layers).
- nn.Dropout layer for regularization to prevent overfitting.
- Validation loop within training to monitor performance on a validation set.
- Plotting of training and validation loss/accuracy.

In [None]:
import torch
import torch.nn as nn
import torch.optim as optim
import torchvision
import torchvision.transforms as transforms
from torch.utils.data import DataLoader, random_split
import matplotlib.pyplot as plt
import numpy as np

# --- 1. Device Configuration ---
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f"Using device: {device}")

# --- 2. Hyperparameters ---
input_size = 784  # FashionMNIST images are 28x28 pixels -> flattened to 784
hidden_size1 = 512 # Neurons in first hidden layer
hidden_size2 = 256 # Neurons in second hidden layer
num_classes = 10  # 10 fashion categories
num_epochs = 20   # Train for more epochs
batch_size = 128
learning_rate = 0.001
dropout_prob = 0.5 # Dropout probability

# --- 3. Load and Prepare FashionMNIST Dataset ---
print("Loading FashionMNIST dataset...")
transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.2860,), (0.3530,)) # Mean and std for FashionMNIST (approximate)
])

# Download and load the full training data
full_train_dataset = torchvision.datasets.FashionMNIST(root='./data',
                                                       train=True,
                                                       transform=transform,
                                                       download=True)

# Download and load the test data
test_dataset = torchvision.datasets.FashionMNIST(root='./data',
                                                 train=False,
                                                 transform=transform)

# Split full training data into training and validation sets
train_size = int(0.8 * len(full_train_dataset)) # 80% for training
val_size = len(full_train_dataset) - train_size   # 20% for validation
train_dataset, val_dataset = random_split(full_train_dataset, [train_size, val_size])

# Create DataLoaders
train_loader = DataLoader(dataset=train_dataset, batch_size=batch_size, shuffle=True)
val_loader = DataLoader(dataset=val_dataset, batch_size=batch_size, shuffle=False)
test_loader = DataLoader(dataset=test_dataset, batch_size=batch_size, shuffle=False)

print(f"Dataset loaded. Train samples: {len(train_dataset)}, Validation samples: {len(val_dataset)}, Test samples: {len(test_dataset)}")
fashion_mnist_classes = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat',
                         'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']

# --- 4. Define the Deeper Neural Network Model with Dropout ---
class DeeperANN(nn.Module):
    def __init__(self, input_size, hidden_size1, hidden_size2, num_classes, dropout_prob):
        super(DeeperANN, self).__init__()
        self.fc1 = nn.Linear(input_size, hidden_size1)
        self.relu1 = nn.ReLU()
        self.dropout1 = nn.Dropout(dropout_prob) # Dropout layer after first hidden layer
        self.fc2 = nn.Linear(hidden_size1, hidden_size2)
        self.relu2 = nn.ReLU()
        self.dropout2 = nn.Dropout(dropout_prob) # Dropout layer after second hidden layer
        self.fc3 = nn.Linear(hidden_size2, num_classes) # Output layer

    def forward(self, x):
        out = self.fc1(x)
        out = self.relu1(out)
        out = self.dropout1(out) # Apply dropout
        out = self.fc2(out)
        out = self.relu2(out)
        out = self.dropout2(out) # Apply dropout
        out = self.fc3(out)
        # No softmax here if using nn.CrossEntropyLoss
        return out

# --- 5. Instantiate the Model, Loss, and Optimizer ---
model = DeeperANN(input_size, hidden_size1, hidden_size2, num_classes, dropout_prob).to(device)
print("\nModel Architecture:")
print(model)

criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=learning_rate)

# --- 6. Training Loop with Validation ---
print("\nStarting Training...")
n_total_steps_train = len(train_loader)
train_losses = []
val_losses = []
train_accuracies = []
val_accuracies = []

for epoch in range(num_epochs):
    # Training phase
    model.train() # Set model to training mode (enables dropout)
    running_train_loss = 0.0
    n_correct_train = 0
    n_samples_train = 0

    for i, (images, labels) in enumerate(train_loader):
        images = images.reshape(-1, input_size).to(device) # Flatten images
        labels = labels.to(device)

        outputs = model(images)
        loss = criterion(outputs, labels)

        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        running_train_loss += loss.item() * images.size(0) # Multiply by batch size for weighted average
        _, predicted_train = torch.max(outputs.data, 1)
        n_samples_train += labels.size(0)
        n_correct_train += (predicted_train == labels).sum().item()

    epoch_train_loss = running_train_loss / n_samples_train
    epoch_train_acc = 100.0 * n_correct_train / n_samples_train
    train_losses.append(epoch_train_loss)
    train_accuracies.append(epoch_train_acc)

    # Validation phase
    model.eval() # Set model to evaluation mode (disables dropout)
    running_val_loss = 0.0
    n_correct_val = 0
    n_samples_val = 0
    with torch.no_grad(): # No need to compute gradients during validation
        for images_val, labels_val in val_loader:
            images_val = images_val.reshape(-1, input_size).to(device)
            labels_val = labels_val.to(device)

            outputs_val = model(images_val)
            loss_val = criterion(outputs_val, labels_val)
            running_val_loss += loss_val.item() * images_val.size(0)

            _, predicted_val = torch.max(outputs_val.data, 1)
            n_samples_val += labels_val.size(0)
            n_correct_val += (predicted_val == labels_val).sum().item()

    epoch_val_loss = running_val_loss / n_samples_val
    epoch_val_acc = 100.0 * n_correct_val / n_samples_val
    val_losses.append(epoch_val_loss)
    val_accuracies.append(epoch_val_acc)

    print(f'Epoch [{epoch+1}/{num_epochs}], Train Loss: {epoch_train_loss:.4f}, Train Acc: {epoch_train_acc:.2f}%, '
          f'Val Loss: {epoch_val_loss:.4f}, Val Acc: {epoch_val_acc:.2f}%')

print("Finished Training.")

# --- 7. Evaluation on Test Set ---
print("\nStarting Evaluation on Test Set...")
model.eval()
with torch.no_grad():
    n_correct_test = 0
    n_samples_test = 0
    for images_test, labels_test in test_loader:
        images_test = images_test.reshape(-1, input_size).to(device)
        labels_test = labels_test.to(device)
        outputs_test = model(images_test)
        _, predicted_test = torch.max(outputs_test.data, 1)
        n_samples_test += labels_test.size(0)
        n_correct_test += (predicted_test == labels_test).sum().item()

    accuracy_test = 100.0 * n_correct_test / n_samples_test
    print(f'Accuracy of the network on the {len(test_dataset)} test images: {accuracy_test:.2f} %')

# --- 8. Plot Training and Validation Loss and Accuracy ---
epochs_range = range(1, num_epochs + 1)
plt.figure(figsize=(14, 5))

plt.subplot(1, 2, 1)
plt.plot(epochs_range, train_losses, 'bo-', label='Training Loss')
plt.plot(epochs_range, val_losses, 'ro-', label='Validation Loss')
plt.title('Training and Validation Loss')
plt.xlabel('Epochs')
plt.ylabel('Loss (CrossEntropy)')
plt.legend()
plt.grid(True)

plt.subplot(1, 2, 2)
plt.plot(epochs_range, train_accuracies, 'bo-', label='Training Accuracy')
plt.plot(epochs_range, val_accuracies, 'ro-', label='Validation Accuracy')
plt.title('Training and Validation Accuracy')
plt.xlabel('Epochs')
plt.ylabel('Accuracy (%)')
plt.legend()
plt.grid(True)

plt.tight_layout()
plt.show()



# PyTorch "Complex" Classification ANN: Key Enhancements

This document details the key enhancements made to an Artificial Neural Network (ANN) for a more complex image classification task, focusing on improvements for better generalization and performance monitoring.

## 1. FashionMNIST Dataset
- **Challenge**: Uses the `FashionMNIST` dataset (`torchvision.datasets.FashionMNIST`).
- **Nature**: This dataset consists of 28x28 grayscale images of 10 different types of clothing (e.g., T-shirt, trousers, pullover, dress). It serves as a more challenging alternative to the simpler MNIST handwritten digit dataset, often used as a "drop-in" replacement to test model robustness.

## 2. Deeper Network Architecture (`DeeperANN` class)
- **Structure**: The neural network, `DeeperANN`, is designed with increased depth compared to a simpler MLP.
- **Layer Configuration**: It now features two hidden layers, each followed by a ReLU activation and a Dropout layer:
    - `Input -> Linear_1 (fc1) -> ReLU_1 (relu1) -> Dropout_1 (dropout1) -> Linear_2 (fc2) -> ReLU_2 (relu2) -> Dropout_2 (dropout2) -> Linear_3 (fc3) -> Output (Logits)`
- **Rationale**: Adding more layers (depth) can potentially allow the network to learn more complex hierarchical features from the input data.

## 3. Dropout (`nn.Dropout`)
- **Implementation**: Dropout layers (`nn.Dropout(p=dropout_prob)`) are strategically placed after the ReLU activation function in each hidden layer.
- **Mechanism**:
    - **During Training (`model.train()` mode)**: For each training sample and for each forward pass, dropout randomly sets a fraction (`dropout_prob`) of the input units (neurons) from the preceding layer to zero. The outputs of the remaining active neurons are typically scaled up by a factor of `1/(1-dropout_prob)` to maintain the expected sum.
    - **Purpose**: This technique acts as a form of regularization. By preventing neurons from co-adapting too much on the training data, it helps the network learn more robust and independent features. This, in turn, significantly reduces overfitting and improves the model's ability to generalize to unseen data.
    - **During Evaluation (`model.eval()` mode)**: Dropout is automatically disabled. All neurons in the layer are used (no units are zeroed out), and the scaling factor is not applied. This ensures deterministic output during inference.

## 4. Introduction of a Validation Set
- **Data Splitting**: The original `FashionMNIST` training dataset is further divided into two subsets: a new, smaller training set and a validation set.
- **Method**: `torch.utils.data.random_split` is used to perform this split, typically allocating a certain percentage (e.g., 80% for training, 20% for validation) of the original training data.
- **Purpose**:
    - The training set is used to update the model's weights (learn).
    - The validation set is used to tune hyperparameters (like learning rate, network architecture, regularization strength) and to monitor the model's generalization performance during training without "contaminating" the final test set.

## 5. Training Loop with Integrated Validation
- **Epoch-wise Evaluation**: After each full epoch of training on the training data (where `model.train()` is active):
    1.  **Switch to Evaluation Mode**: The model is set to evaluation mode using `model.eval()`. This ensures that layers like Dropout behave correctly for inference (i.e., are turned off).
    2.  **No Gradient Calculation**: Performance on the validation set is computed within a `with torch.no_grad():` block. This disables gradient calculations, saving memory and computation time, as gradients are not needed for validation.
    3.  **Performance Metrics**: Key metrics such as loss and accuracy are calculated on the `val_loader` (the DataLoader for the validation set).
- **Overfitting Monitoring**: This regular validation step is crucial for detecting overfitting.
    - **Signs of Overfitting**:
        - Training loss continues to decrease while validation loss starts to increase or stagnates.
        - Training accuracy continues to improve while validation accuracy stagnates or starts to decrease.
    - **Action**: Observing these signs allows for early intervention, such as stopping training early (early stopping), adjusting regularization (like dropout rate or weight decay), or simplifying the model.

## 6. Enhanced Plotting
- **Comparative Visualization**: The plotting functionality is extended to visualize both training metrics and validation metrics (loss and accuracy) against the number of epochs on the same graph.
- **Benefit**: This side-by-side comparison makes it much easier to:
    - Observe the learning trends for both sets.
    - Visually identify the onset of overfitting by comparing the divergence or stagnation of validation curves relative to training curves.
    - Make informed decisions about the training process and model adjustments.

## Summary
These enhancements—using a more challenging dataset, deepening the network, incorporating dropout for regularization, splitting data into training/validation sets, and performing regular validation—represent common practices in developing more robust and well-generalized neural network models. The integrated validation and comparative plotting provide critical insights into the learning process, especially for managing the bias-variance tradeoff and preventing overfitting.

---
- ANN for Regression: Diabetes Dataset

We'll use the Diabetes dataset from sklearn.datasets. It has 10 baseline variables (age, sex, body mass index, average blood pressure, and six blood serum measurements) for 442 diabetes patients, and the target is a quantitative measure of disease progression one year after baseline.

Complexity Additions:

- Deeper network (2-3 hidden layers).
- nn.Dropout for regularization.
- Validation loop within training.
- Plotting of training and validation loss (MSE).

In [None]:
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, TensorDataset, random_split
from sklearn.datasets import load_diabetes
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import mean_squared_error, r2_score
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

# --- 1. Device Configuration ---
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f"Using device: {device}")

# --- 2. Hyperparameters ---
# input_size will be determined by the dataset
hidden_size1 = 64
hidden_size2 = 32
# hidden_size3 = 16 # Optional third hidden layer
output_size = 1    # Predicting a single continuous value
num_epochs = 100   # More epochs for potentially better convergence
batch_size = 32
learning_rate = 0.001
dropout_prob = 0.3 # Dropout probability

# --- 3. Load and Prepare Diabetes Dataset ---
print("Loading Diabetes dataset...")
diabetes = load_diabetes()
X, y = diabetes.data, diabetes.target

X_df = pd.DataFrame(X, columns=diabetes.feature_names)
y_series = pd.Series(y, name='Progression')

input_size = X.shape[1]
print(f"Dataset loaded. Number of features: {input_size}")
print(f"Features shape: {X.shape}, Target shape: {y.shape}")

# Split data into training and testing sets (overall split)
X_train_val_raw, X_test_raw, y_train_val_raw, y_test_raw = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# Further split training_validation set into training and validation
X_train_raw, X_val_raw, y_train_raw, y_val_raw = train_test_split(
    X_train_val_raw, y_train_val_raw, test_size=0.2, random_state=42 # 0.2 of 0.8 = 0.16 of total
)


# Feature Scaling
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train_raw)
X_val_scaled = scaler.transform(X_val_raw) # Use transform on validation
X_test_scaled = scaler.transform(X_test_raw) # Use transform on test

# Convert data to PyTorch Tensors
X_train_tensor = torch.tensor(X_train_scaled, dtype=torch.float32)
y_train_tensor = torch.tensor(y_train_raw, dtype=torch.float32).unsqueeze(1)
X_val_tensor = torch.tensor(X_val_scaled, dtype=torch.float32)
y_val_tensor = torch.tensor(y_val_raw, dtype=torch.float32).unsqueeze(1)
X_test_tensor = torch.tensor(X_test_scaled, dtype=torch.float32)
y_test_tensor = torch.tensor(y_test_raw, dtype=torch.float32).unsqueeze(1)

# Create TensorDatasets and DataLoaders
train_dataset = TensorDataset(X_train_tensor, y_train_tensor)
val_dataset = TensorDataset(X_val_tensor, y_val_tensor)
test_dataset = TensorDataset(X_test_tensor, y_test_tensor)

train_loader = DataLoader(dataset=train_dataset, batch_size=batch_size, shuffle=True)
val_loader = DataLoader(dataset=val_dataset, batch_size=batch_size, shuffle=False)
test_loader = DataLoader(dataset=test_dataset, batch_size=batch_size, shuffle=False)

print(f"Data prepared. Train: {len(train_dataset)}, Validation: {len(val_dataset)}, Test: {len(test_dataset)}")

# --- 4. Define the Deeper Neural Network Model for Regression with Dropout ---
class DeeperANNRegression(nn.Module):
    def __init__(self, input_size, hidden_size1, hidden_size2, output_size, dropout_prob):
        super(DeeperANNRegression, self).__init__()
        self.fc1 = nn.Linear(input_size, hidden_size1)
        self.relu1 = nn.ReLU()
        self.dropout1 = nn.Dropout(dropout_prob)
        self.fc2 = nn.Linear(hidden_size1, hidden_size2)
        self.relu2 = nn.ReLU()
        self.dropout2 = nn.Dropout(dropout_prob)
        self.fc3 = nn.Linear(hidden_size2, output_size)
        # No output activation for regression

    def forward(self, x):
        out = self.fc1(x)
        out = self.relu1(out)
        out = self.dropout1(out)
        out = self.fc2(out)
        out = self.relu2(out)
        out = self.dropout2(out)
        out = self.fc3(out)
        return out

# --- 5. Instantiate the Model, Loss, and Optimizer ---
model = DeeperANNRegression(input_size, hidden_size1, hidden_size2, output_size, dropout_prob).to(device)
print("\nModel Architecture:")
print(model)

criterion = nn.MSELoss()
optimizer = optim.Adam(model.parameters(), lr=learning_rate)

# --- 6. Training Loop with Validation ---
print("\nStarting Training...")
train_losses = []
val_losses = []

for epoch in range(num_epochs):
    model.train()
    running_train_loss = 0.0
    for features, targets in train_loader:
        features = features.to(device)
        targets = targets.to(device)
        outputs = model(features)
        loss = criterion(outputs, targets)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        running_train_loss += loss.item() * features.size(0)
    
    epoch_train_loss = running_train_loss / len(train_dataset)
    train_losses.append(epoch_train_loss)

    model.eval()
    running_val_loss = 0.0
    with torch.no_grad():
        for features_val, targets_val in val_loader:
            features_val = features_val.to(device)
            targets_val = targets_val.to(device)
            outputs_val = model(features_val)
            loss_val = criterion(outputs_val, targets_val)
            running_val_loss += loss_val.item() * features_val.size(0)
            
    epoch_val_loss = running_val_loss / len(val_dataset)
    val_losses.append(epoch_val_loss)
    
    print(f'Epoch [{epoch+1}/{num_epochs}], Train Loss (MSE): {epoch_train_loss:.4f}, Val Loss (MSE): {epoch_val_loss:.4f}')

print("Finished Training.")

# --- 7. Evaluation on Test Set ---
print("\nStarting Evaluation on Test Set...")
model.eval()
all_predictions_test = []
all_targets_test = []
with torch.no_grad():
    test_loss_final = 0.0
    for features_test, targets_test in test_loader:
        features_test = features_test.to(device)
        targets_test = targets_test.to(device)
        outputs_test = model(features_test)
        loss_test = criterion(outputs_test, targets_test)
        test_loss_final += loss_test.item() * features_test.size(0)
        all_predictions_test.append(outputs_test.cpu().numpy())
        all_targets_test.append(targets_test.cpu().numpy())

avg_test_loss_final = test_loss_final / len(test_dataset)
all_predictions_test = np.concatenate(all_predictions_test, axis=0)
all_targets_test = np.concatenate(all_targets_test, axis=0)

mse_test_final = mean_squared_error(all_targets_test, all_predictions_test)
r2_test_final = r2_score(all_targets_test, all_predictions_test)

print(f'Average Test Loss (MSE) from loop: {avg_test_loss_final:.4f}')
print(f'Calculated Final Test MSE: {mse_test_final:.4f}')
print(f'Final Test R-squared (R2): {r2_test_final:.4f}')

# --- 8. Plot Training and Validation Loss ---
plt.figure(figsize=(10, 5))
plt.plot(range(1, num_epochs + 1), train_losses, 'bo-', label='Training Loss (MSE)')
plt.plot(range(1, num_epochs + 1), val_losses, 'ro-', label='Validation Loss (MSE)')
plt.title('Training and Validation Loss per Epoch (Diabetes Regression)')
plt.xlabel('Epochs')
plt.ylabel('Mean Squared Error Loss')
plt.legend()
plt.grid(True)
plt.show()

# --- 9. Plot Actual vs. Predicted Values (Test Set) ---
plt.figure(figsize=(10, 6))
plt.scatter(all_targets_test, all_predictions_test, alpha=0.5, edgecolors='k', label='Predictions')
min_val_diag = min(all_targets_test.min(), all_predictions_test.min())
max_val_diag = max(all_targets_test.max(), all_predictions_test.max())
plt.plot([min_val_diag, max_val_diag], [min_val_diag, max_val_diag], 'r--', lw=2, label='Perfect Prediction')
plt.xlabel("Actual Disease Progression")
plt.ylabel("Predicted Disease Progression")
plt.title("Actual vs. Predicted Values (Test Set - Diabetes Regression)")
plt.legend()
plt.grid(True)
plt.show()


# PyTorch "Complex" Regression ANN: Key Enhancements

This document details the key enhancements applied to an Artificial Neural Network (ANN) designed for a regression task, focusing on a more robust architecture and a comprehensive training/evaluation workflow.

## 1. Diabetes Dataset
- **Source**: Utilizes the standard `load_diabetes` dataset from `sklearn.datasets`.
- **Nature**: This is a regression dataset where the goal is to predict a quantitative measure of disease progression one year after baseline, based on ten baseline variables (age, sex, body mass index, average blood pressure, and six blood serum measurements).
- **Preprocessing**: As with most regression tasks involving neural networks, feature scaling (e.g., `StandardScaler`) is typically applied to the input features, and the target variable might also be scaled or transformed depending on its distribution.

## 2. Deeper Network Architecture with Dropout (`DeeperANNRegression` class)
- **Structure**: The `DeeperANNRegression` model is designed with increased depth and regularization capabilities.
- **Layer Configuration**: It features two hidden layers, each incorporating ReLU activation functions and Dropout layers:
    - `Input -> Linear_1 (fc1) -> ReLU_1 (relu1) -> Dropout_1 (dropout1) -> Linear_2 (fc2) -> ReLU_2 (relu2) -> Dropout_2 (dropout2) -> Linear_3 (fc3) -> Output (Single Continuous Value)`
- **Dropout (`nn.Dropout`)**:
    - **Implementation**: Dropout layers are added after each ReLU activation in the hidden layers.
    - **Purpose**: During training, dropout randomly deactivates a fraction of neurons. This regularization technique helps prevent overfitting by encouraging the network to learn more robust and less co-dependent features. During evaluation (`model.eval()`), dropout is automatically disabled.
- **Output Layer**: The final layer (`fc3`) outputs a single neuron with no activation function, as is standard for regression tasks to predict a continuous value.

## 3. Train/Validation/Test Split
- **Data Partitioning**: The dataset is divided into three distinct subsets:
    1.  **Training Set**: Used to fit the model parameters (i.e., learn the weights and biases).
    2.  **Validation Set**: Used to monitor the model's performance on unseen data *during* the training process. This helps in:
        - Tuning hyperparameters (e.g., learning rate, number of layers/neurons, dropout rate).
        - Early stopping (deciding when to stop training to prevent overfitting if validation performance starts to degrade).
    3.  **Test Set**: Used for a final, unbiased evaluation of the trained model's performance after all training and hyperparameter tuning are complete. This provides an estimate of how the model will perform on new, real-world data.
- **Method**: Typically achieved using functions like `train_test_split` from `sklearn.model_selection` multiple times or by carefully indexing the data.

## 4. Validation Loop Integrated with Training
- **Epoch-wise Evaluation**: Similar to the "complex" classification setup, a validation step is performed after each training epoch:
    1.  **Switch to Evaluation Mode**: The model is set to `model.eval()`.
    2.  **No Gradient Calculation**: Calculations on the validation set are performed within a `with torch.no_grad():` block.
    3.  **Performance Metric**: The Mean Squared Error (MSE) loss is calculated on the `val_loader` (DataLoader for the validation set).
- **Overfitting Monitoring**: Tracking validation MSE alongside training MSE allows for the detection of overfitting. If training MSE continues to decrease while validation MSE stagnates or increases, it indicates that the model is learning the training data too well (including its noise) and losing its ability to generalize.

## 5. Enhanced Plotting
- **Loss Visualization**:
    - Both training MSE and validation MSE are plotted against the number of epochs. This comparative plot is critical for visualizing:
        - The learning progress on both datasets.
        - The point at which overfitting might begin (divergence of training and validation loss curves).
- **Prediction Quality (Test Set)**:
    - An "Actual vs. Predicted" scatter plot is generated using the test set.
    - **Purpose**: This provides a visual assessment of the regression model's performance. For a good model, the points should cluster tightly around the diagonal line (where actual values equal predicted values).
    - Additional regression metrics like R-squared ($R^2$) score are typically reported alongside this plot for a quantitative evaluation.

## Summary
These enhancements—employing a standard regression dataset, building a deeper network with dropout for regularization, and implementing a rigorous train/validation/test split with integrated validation monitoring—lead to a more robust and reliable workflow for developing regression ANNs. This approach not only aims for better predictive performance by mitigating overfitting but also provides clearer insights into the model's learning behavior and generalization capabilities through comprehensive evaluation and visualization.