# Neural Network Capacity for Separating 2D Spot Clouds

This notebook demonstrates how a simple feed-forward neural network, built with PyTorch, can learn to classify points belonging to different clusters (spot clouds) in a 2D plane. We will visualize the data and the learned decision boundaries using Matplotlib.

We'll start with a simple case of 2 well-separated clouds and gradually increase the complexity by adding more clouds. We may need to adjust the network architecture (layers/neurons) as the task becomes harder.

## 1. Setup: Import Libraries and Define Helper Functions

In [None]:
import torch
import torch.nn as nn
import torch.optim as optim
from sklearn.datasets import make_blobs
import matplotlib.pyplot as plt
import numpy as np

# Check for GPU availability
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device}")

In [None]:
# Helper function to plot data and decision boundaries
def plot_decision_boundary(model, X, y, title="Decision Boundary"):
    """
    Plots the data points and the decision boundary learned by the model.

    Args:
        model (nn.Module): The trained PyTorch model.
        X (torch.Tensor): Feature data (input).
        y (torch.Tensor): Target labels.
        title (str): Title for the plot.
    """
    model.eval() # Set model to evaluation mode

    # Convert tensors to numpy for plotting if they are on GPU
    X_np = X.cpu().numpy()
    y_np = y.cpu().numpy()

    # Define the grid range based on data
    x_min, x_max = X_np[:, 0].min() - 1, X_np[:, 0].max() + 1
    y_min, y_max = X_np[:, 1].min() - 1, X_np[:, 1].max() + 1
    h = 0.02 # step size in the mesh

    xx, yy = np.meshgrid(np.arange(x_min, x_max, h),
                         np.arange(y_min, y_max, h))

    # Predict class for each point in the mesh grid
    grid_points = np.c_[xx.ravel(), yy.ravel()]
    grid_points_tensor = torch.tensor(grid_points, dtype=torch.float32).to(device)

    with torch.no_grad(): # No need to track gradients for inference
        Z = model(grid_points_tensor)
        _, predicted = torch.max(Z.data, 1)
        predicted_np = predicted.cpu().numpy()

    # Reshape predictions to match the grid shape
    Z = predicted_np.reshape(xx.shape)

    # Plot the contour and training examples
    plt.figure(figsize=(8, 6))
    plt.contourf(xx, yy, Z, cmap=plt.cm.Spectral, alpha=0.8)
    scatter = plt.scatter(X_np[:, 0], X_np[:, 1], c=y_np, cmap=plt.cm.Spectral, edgecolors='k')

    plt.xlabel('Feature 1')
    plt.ylabel('Feature 2')
    plt.title(title)
    plt.xlim(xx.min(), xx.max())
    plt.ylim(yy.min(), yy.max())
    plt.xticks(())
    plt.yticks(())

    # Add legend if number of classes is reasonable
    num_classes = len(np.unique(y_np))
    if num_classes <= 10:
       handles, labels = scatter.legend_elements()
       legend_labels = [f'Cloud {i}' for i in range(num_classes)]
       plt.legend(handles, legend_labels, title="Classes")

    plt.show()

In [None]:
# Define a simple Neural Network Model
class SimpleNet(nn.Module):
    def __init__(self, input_size, hidden_size, num_classes):
        super(SimpleNet, self).__init__()
        self.fc1 = nn.Linear(input_size, hidden_size)
        self.relu = nn.ReLU()
        self.fc2 = nn.Linear(hidden_size, num_classes)
        # No final activation, CrossEntropyLoss will handle it

    def forward(self, x):
        out = self.fc1(x)
        out = self.relu(out)
        out = self.fc2(out)
        return out

In [None]:
# Training Function
def train_model(model, X_train, y_train, learning_rate=0.01, num_epochs=1000, print_loss_every=100):
    """Trains the provided model."""
    model.train() # Set model to training mode
    criterion = nn.CrossEntropyLoss()
    optimizer = optim.Adam(model.parameters(), lr=learning_rate)

    print(f"\n--- Training Started ---")
    # Get model details dynamically
    if isinstance(model, SimpleNet):
        print(f"Model: SimpleNet")
        print(f"Input Size: {model.fc1.in_features}, Hidden Size: {model.fc1.out_features}, Output Size: {model.fc2.out_features}")
    elif isinstance(model, DeeperNet): # Add this check if DeeperNet is defined later
        print(f"Model: DeeperNet")
        print(f"Input Size: {model.fc1.in_features}, Hidden1: {model.fc1.out_features}, Hidden2: {model.fc2.out_features}, Output Size: {model.fc3.out_features}")
    else:
        print(f"Model: {model.__class__.__name__} (structure not explicitly printed)")
    print(f"Epochs: {num_epochs}, Learning Rate: {learning_rate}")

    for epoch in range(num_epochs):
        # Forward pass
        outputs = model(X_train)
        loss = criterion(outputs, y_train)

        # Backward and optimize
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        if (epoch + 1) % print_loss_every == 0 or epoch == 0:
            print(f'Epoch [{epoch+1}/{num_epochs}], Loss: {loss.item():.4f}')

    print("--- Training Finished ---")
    # Calculate final accuracy on training data (simple metric for demo)
    with torch.no_grad():
        model.eval()
        outputs = model(X_train)
        _, predicted = torch.max(outputs.data, 1)
        total = y_train.size(0)
        correct = (predicted == y_train).sum().item()
        print(f'Training Accuracy: {(100 * correct / total):.2f} %')
    return model # Return trained model

## 2. Scenario 1: Two Well-Separated Clouds

We start with the simplest case: two distinct elliptical clouds generated using `make_blobs`. A simple network with one hidden layer should easily learn to separate them.

In [None]:
# --- Scenario 1: Data Generation ---
N_SAMPLES = 300
N_FEATURES = 2
N_CLASSES_1 = 2
CENTERS_1 = [(-3, -3), (3, 3)] # Well-separated centers
CLUSTER_STD_1 = 1.0 # Standard deviation of the clusters (controls spread)

X1, y1 = make_blobs(n_samples=N_SAMPLES,
                    n_features=N_FEATURES,
                    centers=CENTERS_1,
                    cluster_std=CLUSTER_STD_1,
                    random_state=42) # Use random_state for reproducibility

# Convert to PyTorch tensors and move to device
X1_tensor = torch.tensor(X1, dtype=torch.float32).to(device)
y1_tensor = torch.tensor(y1, dtype=torch.long).to(device) # CrossEntropyLoss expects Long type

print(f"Generated {X1.shape[0]} samples for {N_CLASSES_1} classes.")

In [None]:
# Plot the raw data for Scenario 1
plt.figure(figsize=(8, 6))
plt.scatter(X1[:, 0], X1[:, 1], c=y1, cmap=plt.cm.Spectral, edgecolors='k')
plt.title('Scenario 1: Raw Data (2 Clouds)')
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.grid(True, linestyle='--', alpha=0.6)
plt.show()

Now, let's define and train a simple neural network for this task.

In [None]:
# --- Scenario 1: Model Definition, Training, and Visualization ---

# Model Parameters
INPUT_SIZE = N_FEATURES
HIDDEN_SIZE_1 = 8 # A small number of neurons should suffice
OUTPUT_SIZE_1 = N_CLASSES_1

# Instantiate the model and move to device
model1 = SimpleNet(INPUT_SIZE, HIDDEN_SIZE_1, OUTPUT_SIZE_1).to(device)

# Train the model
model1 = train_model(model1, X1_tensor, y1_tensor, num_epochs=500, learning_rate=0.02)

# Plot decision boundary
plot_decision_boundary(model1, X1_tensor, y1_tensor, title=f"Scenario 1: Decision Boundary (Hidden Size={HIDDEN_SIZE_1})")

As expected, the simple neural network with 8 hidden neurons easily finds a linear boundary (due to the ReLU activation and subsequent linear layer) to separate the two well-defined clouds. The training accuracy should be high (likely 100%).

## 3. Scenario 2: Three Clouds

Now, let's add a third cloud. The task becomes slightly more complex, requiring the network to learn non-linear boundaries.

In [None]:
# --- Scenario 2: Data Generation ---
N_CLASSES_2 = 3
CENTERS_2 = [(-4, 0), (4, 0), (0, 5)] # Three distinct centers
CLUSTER_STD_2 = 1.2

X2, y2 = make_blobs(n_samples=N_SAMPLES,
                    n_features=N_FEATURES,
                    centers=CENTERS_2,
                    cluster_std=CLUSTER_STD_2,
                    random_state=42)

# Convert to PyTorch tensors and move to device
X2_tensor = torch.tensor(X2, dtype=torch.float32).to(device)
y2_tensor = torch.tensor(y2, dtype=torch.long).to(device)

print(f"Generated {X2.shape[0]} samples for {N_CLASSES_2} classes.")

In [None]:
# Plot the raw data for Scenario 2
plt.figure(figsize=(8, 6))
plt.scatter(X2[:, 0], X2[:, 1], c=y2, cmap=plt.cm.Spectral, edgecolors='k')
plt.title('Scenario 2: Raw Data (3 Clouds)')
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.grid(True, linestyle='--', alpha=0.6)
plt.show()

Let's train a model for this 3-class problem. We might use slightly more neurons.

In [None]:
# --- Scenario 2: Model Definition, Training, and Visualization ---

# Model Parameters
OUTPUT_SIZE_2 = N_CLASSES_2
# We might keep the hidden size the same or slightly increase it if needed
HIDDEN_SIZE_2 = 10 # Let's try slightly more neurons

# Instantiate the model and move to device
model2 = SimpleNet(INPUT_SIZE, HIDDEN_SIZE_2, OUTPUT_SIZE_2).to(device)

# Train the model
model2 = train_model(model2, X2_tensor, y2_tensor, num_epochs=1000, learning_rate=0.01)

# Plot decision boundary
plot_decision_boundary(model2, X2_tensor, y2_tensor, title=f"Scenario 2: Decision Boundary (Hidden Size={HIDDEN_SIZE_2})")

The network with 10 hidden neurons successfully learns the boundaries for the three clouds. The boundaries are formed by combinations of the linear separators learned by the hidden neurons, resulting in non-linear regions overall. Accuracy should still be quite high.

## 4. Scenario 3: Four Clouds (Potentially Closer)

Let's increase the complexity further with four clouds. We can make them slightly closer or increase their standard deviation to make the separation task more challenging. This might require a network with more capacity (more neurons or layers).

In [None]:
# --- Scenario 3: Data Generation ---
N_CLASSES_3 = 4
CENTERS_3 = [(-3, -3), (3, 3), (-3, 3), (3, -3)] # Four corners
CLUSTER_STD_3 = 1.3 # Slightly larger spread, potential for overlap

X3, y3 = make_blobs(n_samples=N_SAMPLES,
                    n_features=N_FEATURES,
                    centers=CENTERS_3,
                    cluster_std=CLUSTER_STD_3,
                    random_state=42)

# Convert to PyTorch tensors and move to device
X3_tensor = torch.tensor(X3, dtype=torch.float32).to(device)
y3_tensor = torch.tensor(y3, dtype=torch.long).to(device)

print(f"Generated {X3.shape[0]} samples for {N_CLASSES_3} classes.")

In [None]:
# Plot the raw data for Scenario 3
plt.figure(figsize=(8, 6))
plt.scatter(X3[:, 0], X3[:, 1], c=y3, cmap=plt.cm.Spectral, edgecolors='k')
plt.title('Scenario 3: Raw Data (4 Clouds)')
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.grid(True, linestyle='--', alpha=0.6)
plt.show()

### 4.1 Attempt with the Previous Architecture

Let's first see how the previous architecture (10 hidden neurons) handles this slightly more complex scenario.

In [None]:
# --- Scenario 3a: Training with Previous Architecture ---

# Model Parameters
OUTPUT_SIZE_3 = N_CLASSES_3
HIDDEN_SIZE_3a = 10 # Same as before

# Instantiate the model and move to device
model3a = SimpleNet(INPUT_SIZE, HIDDEN_SIZE_3a, OUTPUT_SIZE_3).to(device)

# Train the model
# May need more epochs or adjusted learning rate if convergence is slow/unstable
model3a = train_model(model3a, X3_tensor, y3_tensor, num_epochs=2000, learning_rate=0.01)

# Plot decision boundary
plot_decision_boundary(model3a, X3_tensor, y3_tensor, title=f"Scenario 3a: Decision Boundary (Hidden Size={HIDDEN_SIZE_3a})")

Depending on the exact overlap and the initialization, the model with 10 hidden neurons might struggle to perfectly separate all four clouds, especially near the center. We might see some misclassified points or slightly irregular boundaries.

### 4.2 Attempt with Increased Capacity (More Neurons)

Let's try increasing the capacity of the network by adding more neurons to the hidden layer. This allows the network to potentially learn more complex boundary shapes.

In [None]:
# --- Scenario 3b: Increasing Hidden Neurons ---

# Model Parameters
HIDDEN_SIZE_3b = 20 # Increase hidden neurons

# Instantiate the model and move to device
model3b = SimpleNet(INPUT_SIZE, HIDDEN_SIZE_3b, OUTPUT_SIZE_3).to(device)

# Train the model
model3b = train_model(model3b, X3_tensor, y3_tensor, num_epochs=2000, learning_rate=0.01)

# Plot decision boundary
plot_decision_boundary(model3b, X3_tensor, y3_tensor, title=f"Scenario 3b: Decision Boundary (Hidden Size={HIDDEN_SIZE_3b})")

With more hidden neurons (e.g., 20), the network usually achieves better separation for the four-cloud scenario. The boundaries might look smoother or more appropriate for the data distribution.

### 4.3 (Optional) Attempt with a Deeper Network

If the separation is still challenging or for demonstration purposes, we could try adding another hidden layer. Deeper networks can sometimes capture hierarchical features, although for this relatively simple blob data, increasing width (neurons) is often sufficient.

In [None]:
# --- Scenario 3c: Adding a Hidden Layer ---

# Define a Deeper Network
class DeeperNet(nn.Module):
    def __init__(self, input_size, hidden_size1, hidden_size2, num_classes):
        super(DeeperNet, self).__init__()
        self.fc1 = nn.Linear(input_size, hidden_size1)
        self.relu1 = nn.ReLU()
        self.fc2 = nn.Linear(hidden_size1, hidden_size2)
        self.relu2 = nn.ReLU()
        self.fc3 = nn.Linear(hidden_size2, num_classes)

    def forward(self, x):
        out = self.fc1(x)
        out = self.relu1(out)
        out = self.fc2(out)
        out = self.relu2(out)
        out = self.fc3(out)
        return out

# Model Parameters
HIDDEN_SIZE1_3c = 16
HIDDEN_SIZE2_3c = 8

# Instantiate the model and move to device
model3c = DeeperNet(INPUT_SIZE, HIDDEN_SIZE1_3c, HIDDEN_SIZE2_3c, OUTPUT_SIZE_3).to(device)

# Train the model (might need more epochs for deeper nets)
model3c = train_model(model3c, X3_tensor, y3_tensor, num_epochs=2500, learning_rate=0.01)

# Plot decision boundary
plot_decision_boundary(model3c, X3_tensor, y3_tensor, title=f"Scenario 3c: Decision Boundary (Layers: {HIDDEN_SIZE1_3c}-{HIDDEN_SIZE2_3c})")

A deeper network can also learn the separation. Comparing the results visually (and via accuracy) from Scenarios 3a, 3b, and 3c helps illustrate the trade-offs between network width and depth for a given task complexity.

## 5. Conclusion

This notebook demonstrated the ability of classical feed-forward neural networks to learn separation boundaries for clustered data in 2D.

- For **well-separated clusters** (Scenario 1), a very simple network suffices.
- As the **number of clusters increases** (Scenario 2), the network needs to learn more complex, non-linear boundaries, but a single hidden layer network often still performs well if the clusters remain reasonably distinct.
- When the **task becomes more complex** (Scenario 3, with more clusters or closer proximity/overlap), we might need to increase the network's capacity:
    - **Increasing Neurons (Width):** Adding more neurons in a hidden layer allows the network to learn more complex combinations of features (more intricate boundaries). This was shown effectively in Scenario 3b.
    - **Adding Layers (Depth):** Adding more hidden layers allows the network to learn hierarchical features, potentially modeling more complex relationships in the data, as demonstrated optionally in Scenario 3c.

The choice of architecture (number of layers and neurons) depends on the complexity of the data distribution. Overly complex models for simple data can lead to overfitting (though not very apparent here as we train on the whole dataset), while models that are too simple may underfit complex data (as potentially seen in Scenario 3a). Finding the right balance often involves experimentation and validation.