# **Build & Train a 2-Layer Neural Network on Tabular Data ~ PyTorch**

### **Outline**
1. Setup and reproducibility
2. Create a synthetic tabular dataset
3. Train/test split + `DataLoader`
4. Define a 2-layer network
5. Train and track metrics
6. Evaluate the final model
7. Exercise + common pitfalls + extension

In [1]:
# If torch is missing, install it in a notebook cell:
# %pip install torch

import random
import torch
from torch import nn
from torch.utils.data import TensorDataset, DataLoader

# Reproducibility
SEED = 42
random.seed(SEED)
torch.manual_seed(SEED)

# CPU is enough for this lab
device = torch.device("cpu")
print("Using device:", device)

Using device: cpu


## Step 1 - Create Tabular Data
We will generate a small binary-classification dataset with 10 numeric features.
The target is based on a nonlinear rule so the hidden layer has a purpose.

In [2]:
num_samples = 1000
num_features = 10

X = torch.randn(num_samples, num_features)

# Nonlinear signal + noise
signal = (
    1.4 * X[:, 0]
    - 1.0 * X[:, 1]
    + 0.8 * X[:, 2] * X[:, 3]
    - 0.5 * X[:, 4] ** 2
)
noise = 0.25 * torch.randn(num_samples)
logits_true = signal + noise

probs_true = torch.sigmoid(logits_true)
y = (probs_true > 0.5).float().unsqueeze(1)

print("X shape:", X.shape)
print("y shape:", y.shape)
print("Positive class rate:", y.mean().item())

X shape: torch.Size([1000, 10])
y shape: torch.Size([1000, 1])
Positive class rate: 0.39500001072883606


## Step 2 - Split Data and Build DataLoaders
We will keep 80% for training and 20% for testing.
`DataLoader` gives us shuffled mini-batches for training.

In [3]:
train_ratio = 0.8
train_size = int(num_samples * train_ratio)

indices = torch.randperm(num_samples)
train_idx = indices[:train_size]
test_idx = indices[train_size:]

X_train, y_train = X[train_idx], y[train_idx]
X_test, y_test = X[test_idx], y[test_idx]

train_ds = TensorDataset(X_train, y_train)
test_ds = TensorDataset(X_test, y_test)

train_loader = DataLoader(train_ds, batch_size=32, shuffle=True)
test_loader = DataLoader(test_ds, batch_size=64, shuffle=False)

print("Train batches:", len(train_loader))
print("Test batches:", len(test_loader))

Train batches: 25
Test batches: 4


## Step 3 - Define a 2-Layer Neural Network
Architecture:
- Layer 1: `Linear(input_dim -> hidden_dim)`
- Activation: `ReLU`
- Layer 2: `Linear(hidden_dim -> 1)` (outputs logits)

In [4]:
class TwoLayerNN(nn.Module):
    def __init__(self, input_dim: int, hidden_dim: int = 32):
        super().__init__()
        self.net = nn.Sequential(
            nn.Linear(input_dim, hidden_dim),
            nn.ReLU(),
            nn.Linear(hidden_dim, 1),
        )

    def forward(self, x):
        return self.net(x)

model = TwoLayerNN(input_dim=num_features, hidden_dim=32).to(device)

num_params = sum(p.numel() for p in model.parameters())
print(model)
print("Trainable parameters:", num_params)

TwoLayerNN(
  (net): Sequential(
    (0): Linear(in_features=10, out_features=32, bias=True)
    (1): ReLU()
    (2): Linear(in_features=32, out_features=1, bias=True)
  )
)
Trainable parameters: 385


## Step 4 - Loss, Optimizer, and Helper Functions
For binary classification with logits, use `BCEWithLogitsLoss`.
This is more numerically stable than `sigmoid` + `BCELoss`.

In [5]:
loss_fn = nn.BCEWithLogitsLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=1e-2)

def batch_accuracy_from_logits(logits, targets):
    preds = (torch.sigmoid(logits) >= 0.5).float()
    return (preds == targets).float().mean().item()

def run_epoch(model, loader, loss_fn, optimizer=None):
    training = optimizer is not None
    model.train() if training else model.eval()

    total_loss = 0.0
    total_acc = 0.0
    total_items = 0

    with torch.set_grad_enabled(training):
        for xb, yb in loader:
            xb = xb.to(device)
            yb = yb.to(device)

            logits = model(xb)
            loss = loss_fn(logits, yb)

            if training:
                optimizer.zero_grad()
                loss.backward()
                optimizer.step()

            bs = xb.size(0)
            total_loss += loss.item() * bs
            total_acc += batch_accuracy_from_logits(logits, yb) * bs
            total_items += bs

    return total_loss / total_items, total_acc / total_items

## Step 5 - Train the Model
We will train for a small number of epochs and print train/test metrics.

In [6]:
epochs = 20
history = []

for epoch in range(1, epochs + 1):
    train_loss, train_acc = run_epoch(model, train_loader, loss_fn, optimizer)
    test_loss, test_acc = run_epoch(model, test_loader, loss_fn, optimizer=None)

    history.append((train_loss, train_acc, test_loss, test_acc))

    if epoch == 1 or epoch % 5 == 0:
        print(
            f"Epoch {epoch:02d} | "
            f"train_loss={train_loss:.4f} train_acc={train_acc:.3f} | "
            f"test_loss={test_loss:.4f} test_acc={test_acc:.3f}"
        )

Epoch 01 | train_loss=0.6211 train_acc=0.625 | test_loss=0.4771 test_acc=0.860
Epoch 05 | train_loss=0.2043 train_acc=0.919 | test_loss=0.2157 test_acc=0.920
Epoch 10 | train_loss=0.1162 train_acc=0.960 | test_loss=0.1504 test_acc=0.950
Epoch 15 | train_loss=0.0808 train_acc=0.978 | test_loss=0.1893 test_acc=0.930
Epoch 20 | train_loss=0.0635 train_acc=0.980 | test_loss=0.1571 test_acc=0.945


## Step 6 - Final Evaluation and Quick Interpretation
We will inspect test accuracy and a few predicted probabilities.

In [8]:
final_train_loss, final_train_acc = history[-1][0], history[-1][1]
final_test_loss, final_test_acc = history[-1][2], history[-1][3]

print(f"Final train accuracy: {final_train_acc:.3f}")
print(f"Final test accuracy:  {final_test_acc:.3f}")

model.eval()
with torch.no_grad():
    sample_logits = model(X_test[:10].to(device))
    sample_probs = torch.sigmoid(sample_logits).squeeze(1).cpu()

print("First 10 predicted probabilities:")
print(sample_probs)
print("First 10 true labels:")
print(y_test[:10].squeeze(1))

Final train accuracy: 0.980
Final test accuracy:  0.945
First 10 predicted probabilities:
tensor([2.3238e-05, 7.3397e-01, 6.5447e-01, 1.5740e-01, 5.6465e-08, 9.9960e-01,
        9.9734e-01, 7.7261e-09, 1.8203e-03, 8.1115e-01])
First 10 true labels:
tensor([0., 1., 1., 0., 0., 1., 1., 0., 0., 1.])


## Exercise
Try improving the model in one controlled change at a time.

Ideas:
1. Change hidden units from 32 to 64
2. Lower learning rate from `1e-2` to `1e-3`
3. Train for 40 epochs

Record your best test accuracy and explain what changed.

In [None]:
# TODO: Copy the training section and run one change at a time.
# Suggested template:
# - hidden_dim = ?
# - learning_rate = ?
# - epochs = ?
# - best_test_acc = ?
# - short explanation = "..."

## Common Pitfall
Applying `sigmoid` inside the model **and** using `BCEWithLogitsLoss`.

Why it is a problem:
- `BCEWithLogitsLoss` already applies sigmoid internally.
- Double sigmoid can hurt gradients and slow learning.

## Optional Extension
Turn this into a multiclass task by:
- Changing the final layer to output `num_classes`
- Using `CrossEntropyLoss`
- Predicting class with `argmax`