# Neural Network Capacity for Separating Complex 2D Spot Clouds (Swirl/Vortex)

This notebook demonstrates how a feed-forward neural network handles a more complex classification task involving 2D spot clouds. We will generate four initial clusters and then apply a non-linear "swirl" transformation to distort them, making linear separation impossible.

We expect that simple network architectures might struggle, and increasing network capacity (neurons or layers) will be necessary to achieve good separation.

## 1. Setup: Import Libraries and Define Helper Functions

In [None]:
import torch
import torch.nn as nn
import torch.optim as optim
from sklearn.datasets import make_blobs
import matplotlib.pyplot as plt
import numpy as np
import math # For swirl calculation

# Check for GPU availability
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device}")

In [None]:
# Helper function to plot data and decision boundaries (same as before)
def plot_decision_boundary(model, X, y, title="Decision Boundary"):
    """
    Plots the data points and the decision boundary learned by the model.
    """
    model.eval()
    X_np = X.cpu().numpy()
    y_np = y.cpu().numpy()

    x_min, x_max = X_np[:, 0].min() - 1, X_np[:, 0].max() + 1
    y_min, y_max = X_np[:, 1].min() - 1, X_np[:, 1].max() + 1
    h = 0.05 # Adjust step size if needed for performance/resolution

    xx, yy = np.meshgrid(np.arange(x_min, x_max, h),
                         np.arange(y_min, y_max, h))

    grid_points = np.c_[xx.ravel(), yy.ravel()]
    grid_points_tensor = torch.tensor(grid_points, dtype=torch.float32).to(device)

    with torch.no_grad():
        Z = model(grid_points_tensor)
        # Handle potential multi-class output formats
        if Z.shape[1] > 1:
             _, predicted = torch.max(Z.data, 1) # For CrossEntropyLoss/multi-class
        else:
             # This case might not be needed for this specific multi-class notebook, but good practice
             predicted = (torch.sigmoid(Z.data) > 0.5).long() # For BCEWithLogitsLoss/binary
             predicted = predicted.squeeze() # Remove extra dimension

        predicted_np = predicted.cpu().numpy()

    Z = predicted_np.reshape(xx.shape)

    plt.figure(figsize=(10, 8)) # Slightly larger figure for complex boundaries
    plt.contourf(xx, yy, Z, cmap=plt.cm.Spectral, alpha=0.8)
    scatter = plt.scatter(X_np[:, 0], X_np[:, 1], c=y_np, cmap=plt.cm.Spectral, edgecolors='k', s=20) # Smaller points

    plt.xlabel('Feature 1')
    plt.ylabel('Feature 2')
    plt.title(title)
    plt.xlim(xx.min(), xx.max())
    plt.ylim(yy.min(), yy.max())
    plt.xticks(())
    plt.yticks(())

    num_classes = len(np.unique(y_np))
    if num_classes <= 10:
       handles, labels = scatter.legend_elements()
       legend_labels = [f'Cloud {i}' for i in range(num_classes)]
       plt.legend(handles, legend_labels, title="Classes")

    plt.show()

In [None]:
# Simple Neural Network Model (same as before)
class SimpleNet(nn.Module):
    def __init__(self, input_size, hidden_size, num_classes):
        super(SimpleNet, self).__init__()
        self.fc1 = nn.Linear(input_size, hidden_size)
        self.relu = nn.ReLU()
        self.fc2 = nn.Linear(hidden_size, num_classes)

    def forward(self, x):
        out = self.fc1(x)
        out = self.relu(out)
        out = self.fc2(out)
        return out

In [None]:
# Deeper Neural Network Model (same as before)
class DeeperNet(nn.Module):
    def __init__(self, input_size, hidden_size1, hidden_size2, num_classes):
        super(DeeperNet, self).__init__()
        self.fc1 = nn.Linear(input_size, hidden_size1)
        self.relu1 = nn.ReLU()
        self.fc2 = nn.Linear(hidden_size1, hidden_size2)
        self.relu2 = nn.ReLU()
        self.fc3 = nn.Linear(hidden_size2, num_classes)

    def forward(self, x):
        out = self.fc1(x)
        out = self.relu1(out)
        out = self.fc2(out)
        out = self.relu2(out)
        out = self.fc3(out)
        return out

In [None]:
# Training Function (same as before, returning the trained model)
def train_model(model, X_train, y_train, learning_rate=0.01, num_epochs=1000, print_loss_every=100):
    """Trains the provided model."""
    model.train()
    criterion = nn.CrossEntropyLoss()
    optimizer = optim.Adam(model.parameters(), lr=learning_rate)

    print(f"\n--- Training Started ---")
    # Get model details dynamically
    if isinstance(model, SimpleNet):
        print(f"Model: SimpleNet")
        print(f"Input Size: {model.fc1.in_features}, Hidden Size: {model.fc1.out_features}, Output Size: {model.fc2.out_features}")
    elif isinstance(model, DeeperNet):
        print(f"Model: DeeperNet")
        print(f"Input Size: {model.fc1.in_features}, Hidden1: {model.fc1.out_features}, Hidden2: {model.fc2.out_features}, Output Size: {model.fc3.out_features}")
    else:
        print(f"Model: {model.__class__.__name__} (structure not explicitly printed)")

    print(f"Epochs: {num_epochs}, Learning Rate: {learning_rate}")

    for epoch in range(num_epochs):
        outputs = model(X_train)
        loss = criterion(outputs, y_train)

        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        if (epoch + 1) % print_loss_every == 0 or epoch == 0 or epoch == num_epochs -1:
             # Print more frequently or at the end for long training
            print(f'Epoch [{epoch+1}/{num_epochs}], Loss: {loss.item():.4f}')

    print("--- Training Finished ---")
    with torch.no_grad():
        model.eval()
        outputs = model(X_train)
        _, predicted = torch.max(outputs.data, 1)
        total = y_train.size(0)
        correct = (predicted == y_train).sum().item()
        print(f'Final Training Accuracy: {(100 * correct / total):.2f} %')
    return model # Return trained model

## 2. Scenario 4: Four Swirled Clouds

We generate four initial blobs and then apply a swirl transformation.

In [None]:
# --- Scenario 4: Data Generation ---

def swirl_data(X, y, strength=0.75):
    """Applies a swirl distortion to the data points."""
    X_swirled = np.zeros_like(X)
    center_x, center_y = 0.0, 0.0 # Swirl around the origin

    for i in range(X.shape[0]):
        x, y_coord = X[i, 0], X[i, 1]
        dx = x - center_x
        dy = y_coord - center_y
        
        # Calculate distance and angle from the center
        dist = np.sqrt(dx*dx + dy*dy)
        angle = np.arctan2(dy, dx)
        
        # Apply swirl: rotation angle increases with distance
        swirl_angle = strength * dist / 5.0 # Adjust divisor to control swirl tightness
        
        new_angle = angle + swirl_angle
        
        # Convert back to Cartesian coordinates
        X_swirled[i, 0] = center_x + dist * np.cos(new_angle)
        X_swirled[i, 1] = center_y + dist * np.sin(new_angle)
        
    return X_swirled, y # Labels remain the same

# Parameters for initial blobs
N_SAMPLES = 500 # More samples might help define the boundaries better
N_FEATURES = 2
N_CLASSES_4 = 4
# Start with centers relatively far out, the swirl will pull them in/around
CENTERS_4_initial = [(-4, -4), (4, 4), (-4, 4), (4, -4)]
CLUSTER_STD_4 = 0.7 # Keep initial clusters relatively tight

# Generate initial blobs
X_initial, y_initial = make_blobs(n_samples=N_SAMPLES,
                                  n_features=N_FEATURES,
                                  centers=CENTERS_4_initial,
                                  cluster_std=CLUSTER_STD_4,
                                  random_state=42)

# Apply the swirl transformation
SWIRL_STRENGTH = 5 # Controls how much rotation/distortion is applied (try 5)
X4, y4 = swirl_data(X_initial, y_initial, strength=SWIRL_STRENGTH)


# Convert to PyTorch tensors and move to device
X4_tensor = torch.tensor(X4, dtype=torch.float32).to(device)
y4_tensor = torch.tensor(y4, dtype=torch.long).to(device)

print(f"Generated {X4.shape[0]} swirled samples for {N_CLASSES_4} classes.")

In [None]:
# Plot the raw swirled data
plt.figure(figsize=(8, 6))
plt.scatter(X4[:, 0], X4[:, 1], c=y4, cmap=plt.cm.Spectral, edgecolors='k', s=15)
plt.title(f'Scenario 4: Raw Data ({N_CLASSES_4} Swirled Clouds, Strength={SWIRL_STRENGTH})')
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.grid(True, linestyle='--', alpha=0.6)
plt.axhline(0, color='grey', lw=0.5)
plt.axvline(0, color='grey', lw=0.5)
plt.show()

As you can see, the clouds are now distorted and intertwined, making simple linear or polygonal separation impossible.

### 2.1 Attempt with a Shallow Network (Increased Neurons)

Let's see how the `SimpleNet` (one hidden layer) performs, but we'll give it significantly more neurons than before.

In [None]:
# --- Scenario 4a: Training Shallow but Wider Network ---

# Model Parameters
INPUT_SIZE = N_FEATURES
HIDDEN_SIZE_4a = 32 # Significantly more neurons needed (try 5)
OUTPUT_SIZE_4 = N_CLASSES_4

# Instantiate the model
model4a = SimpleNet(INPUT_SIZE, HIDDEN_SIZE_4a, OUTPUT_SIZE_4).to(device)

# Train the model - May need significantly more epochs and potentially smaller LR
# Increased epochs because the optimization landscape is more complex
model4a = train_model(model4a, X4_tensor, y4_tensor,
                     num_epochs=5000, # Increased epochs
                     learning_rate=0.005, # Possibly smaller LR
                     print_loss_every=500)

# Plot decision boundary
plot_decision_boundary(model4a, X4_tensor, y4_tensor,
                       title=f"Scenario 4a: Decision Boundary (SimpleNet, Hidden={HIDDEN_SIZE_4a})")

The shallow network, even with increased width, might struggle to capture the complex curves accurately. It might create somewhat reasonable boundaries but likely misclassifies points where the swirls are tightest or overlap. The accuracy will likely be lower than in previous scenarios.

### 2.2 Attempt with a Deeper Network

A deeper network might be better suited to learning the hierarchical features needed to untangle the swirl. Let's try the `DeeperNet`.

In [None]:
# --- Scenario 4b: Training Deeper Network ---

# Model Parameters for Deeper Network
HIDDEN_SIZE1_4b = 24 # Reasonable number for first layer
HIDDEN_SIZE2_4b = 12 # Second hidden layer

# Instantiate the model
model4b = DeeperNet(INPUT_SIZE, HIDDEN_SIZE1_4b, HIDDEN_SIZE2_4b, OUTPUT_SIZE_4).to(device)

# Train the model - Again, likely needs many epochs
model4b = train_model(model4b, X4_tensor, y4_tensor,
                     num_epochs=6000, # Maybe even more epochs
                     learning_rate=0.005,
                     print_loss_every=500)

# Plot decision boundary
plot_decision_boundary(model4b, X4_tensor, y4_tensor,
                       title=f"Scenario 4b: Decision Boundary (DeeperNet, Layers: {HIDDEN_SIZE1_4b}-{HIDDEN_SIZE2_4b})")

## 3. Conclusion (Swirled Data)

This swirled dataset presents a significantly more challenging task for the neural network.

- **Shallow Network (Wider):** While increasing the number of neurons in a single hidden layer (`SimpleNet`) provides more capacity than a very narrow network, it often struggles to capture highly non-linear, curved boundaries like those in the swirl pattern. The boundaries learned might be jagged approximations or fail in complex regions.
- **Deeper Network:** The `DeeperNet`, with multiple hidden layers, is generally better equipped to handle such complex geometric transformations. The hierarchical processing allows it to potentially learn intermediate representations that help untangle the swirls before the final classification. We typically see smoother and more accurate decision boundaries compared to the shallow network for this type of data, likely achieving higher training accuracy.

This demonstrates that for datasets where classes are separated by complex, non-linear boundaries, increasing network **depth** (adding layers) can be more effective than just increasing **width** (adding neurons to a single layer), although often a combination of both is optimal. The increased complexity also necessitates more training data (or augmentation) and longer training times (more epochs) for the model to converge effectively.