# Simple Neural Network Template
Goal of this notebook is to avoid rewrinting code for Neural Networks with tabular data


In [15]:
import torch
import torch.nn as nn
from torch.utils.data import DataLoader, TensorDataset

import matplotlib.pyplot as plt
from torchinfo import summary


import pandas as pd
import numpy as np

from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split

In [2]:
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
device

device(type='cuda')

# Prepare The dataset

## Data Loading and Split

In [3]:
# Use a random df instead of df = pd.read_csv("") 
df = pd.DataFrame(np.random.randint(0,100,size=(100, 4)), columns=["f1","f2","f3","y"])

# list of columns used as features and as target
feaures = ["f1","f2","f3"]
target = ["y"]

df.head()


Unnamed: 0,f1,f2,f3,y
0,30,29,68,89
1,5,51,14,95
2,6,1,9,65
3,30,58,52,58
4,72,58,72,72


In [4]:
# if needed, spit the data into train\validation and test set
features = df[feaures].values
targets = df[target].values

# Split the dataset into training and temp sets (85% train, 15% temp)
features_train, features_temp, targets_train, targets_temp = train_test_split(
    features, targets, test_size=0.3, random_state=0
)

# Further split the temp set into validation and test sets 
features_val, features_test, targets_val, targets_test = train_test_split(
    features_temp, targets_temp, test_size=0.5, random_state=0
)

# Note that from now we are using numpy array of dimension (sample, features) 
print(f"Shape of the training feature set:\t {features_train.shape}")
print(f"Shape of the training target set:\t {targets_temp.shape}")


Shape of the training feature set:	 (70, 3)
Shape of the training target set:	 (30, 1)


## Scaling

Scaling is important for NN traing:
- NN are trained with gradient-based optimization. If input features have very different ranges (e.g., one is in [0,1], another in [0,1000]), the gradients become unbalanced
- Weights are usually initialized with small random values. If inputs vary a lot in scale, some neurons saturate (e.g., sigmoid stuck at 0 or 1), killing gradients.
- A feature with large values might artificially look more important, skewing learning.

Here the most common methods from sklearn: 
| Method         | Formula                  | Best For                        | Sensitive to Outliers |
|----------------|--------------------------|---------------------------------|-----------------------|
| StandardScaler | (x - μ) / σ             | General NN input, Gaussian data |  Yes                |
| MinMaxScaler   | (x - min) / (max - min) | Images, bounded features        |  Yes                |
| RobustScaler   | (x - median) / IQR      | Skewed data, outliers present   |  No                 |
| MaxAbsScaler   | x / max(abs(x))         | Sparse data, [-1,1] scaling     |  Yes                |

For sake of semplicity:

- Images → MinMax (0–1)
- General tabular → StandardScaler
- Outlier-heavy data → RobustScaler

In [5]:
scaler = StandardScaler()
# remember to fit only of training data
scaler.fit(features_train) 

# Transform the training, validation, and test sets
features_train = scaler.transform(features_train)
features_val = scaler.transform(features_val)
features_test = scaler.transform(features_test)

## DataSet & DataLoader

In PyTorch, data handling is split into two complementary abstractions. A Dataset defines what the data is and how to access a single sample, while a DataLoader defines how to efficiently serve that data to a model in batches. This separation allows you to keep the logic of accessing data independent from the logic of batching, shuffling, and parallelizing.

**Dataset**: defines the data access pattern.
- Implements __getitem__(index) → returns a single (features, target) pair.
- Implements __len__() → reports dataset size (for map-style datasets).
- Can be map-style (random access) or iterable (streaming).

**DataLoader**: orchestrates data delivery.
- Wraps a Dataset and produces mini-batches.
- Handles shuffling, parallel sample loading (num_workers), and batch collation into tensors.
- Provides an iterator over batches, enabling efficient training loops.

For tabular data, we have 2 possible options for the creation of the dataset.


* **Custom Dataset** ([docs](https://docs.pytorch.org/tutorials/beginner/basics/data_tutorial.html))
    - subclass `torch.utils.data.Dataset` and must implement `__len__` / `__getitem__`.
    -  Flexible: supports preprocessing (scaling, encoding, imputation), lazy loading from disk, and any custom logic for accessing samples.
    - Needed for non-ram fitting dataset
* **`TensorDataset`** – lightweight wrapper around `(X_tensor, y_tensor)` when data is already preprocessed and fits in memory. 
    -  Minimal boilerplate, but no support for on-the-fly transforms or complex loading.


Would you like me to now provide **minimal code examples** for both in the same format (so you can compare side by side)?



In [6]:
train_dataset = TensorDataset(torch.Tensor(features_train), torch.Tensor(targets_train))
val_dataset = TensorDataset(torch.Tensor(features_val), torch.Tensor(targets_val))


BATCH_SIZE = 32
# Create DataLoaders for training, validation, and testing sets
train_loader = DataLoader(train_dataset, batch_size=BATCH_SIZE, shuffle=True)
val_loader = DataLoader(val_dataset, batch_size=BATCH_SIZE, shuffle=False)


# Your Neural network

## Define It

When creating a custom neural network by subclassing `nn.Module`, the class typically includes **required components** and **optional but common patterns**.

### Required Components
- **`__init__(self, ...)`**  
  - Purpose: define all layers and components of the network.  
  - Must call `super().__init__()` at the beginning.  
  - Typical layers: `nn.Linear`, `nn.Conv2d`, `nn.ReLU`, `nn.BatchNorm1d`, `nn.Dropout`.

- **`forward(self, x)`**  
  - Purpose: defines the computation of the network, i.e., how the input `x` is transformed into the output.  
  - Must return a tensor.  
  - **Do not call this directly**; PyTorch internally calls it when you execute `model(input)`.


)Pay Attention): In theory, the last layer of a neural network should include an activation function during inference to produce probabilities or bounded outputs. However, most PyTorch loss functions (e.g., `BCEWithLogitsLoss`, `CrossEntropyLoss`) apply these activations internally for numerical stability and efficiency. Therefore, when defining the network for training, the final activation should not be included in the model. It can then be applied separately during inference.

> **Note:** Sometimes tutorials or projects define an `inference` or `predict` method. This is **not required** by PyTorch; it’s just a convenience wrapper that calls `forward` (often with `torch.no_grad()` for evaluation) to separate training vs prediction logic.




### Possible Improvements

- **Deeper / Wider Layers**: increase representational capacity.  
- **Regularization**: dropout and batch normalization for stability and generalization.  
- **Custom Activations**: ReLU, LeakyReLU, GELU, SiLU.  
- **Residual Connections**: optionally add transformed inputs to intermediate layers.  
- **Weight Initialization**: e.g., Kaiming for ReLU, Xavier for Sigmoid/Tanh.  
- **Output Layer**: adapt with Sigmoid/Softmax depending on task.  
- **Optimizer & Scheduler**: Adam, SGD+momentum, learning rate scheduling.  
- **Input Preprocessing**: standardization, normalization, categorical encoding for tabular data.


In [9]:
class NeuralNetwork(nn.Module):
    def __init__(self, input_dim,drop_out_par):
        super().__init__()
        
        # Flatten input
        self.flatten = nn.Flatten()
        
        # Optional residual: projects input to match first hidden layer
        self.residual = nn.Linear(input_dim, 256)

        # Feature extraction
        self.features_extraction = nn.Sequential(
            nn.Linear(input_dim, 64),
            nn.ReLU(),                          # Can switch to LeakyReLU, GELU, SiLU
            nn.BatchNorm1d(64),                 # Optional batch normalization
            nn.Dropout(drop_out_par),           # Optional dropout for regularization

            nn.Linear(64, 128),
            nn.ReLU(),
            nn.BatchNorm1d(128),
            nn.Dropout(drop_out_par),

            nn.Linear(128, 256),
            nn.ReLU(),
            nn.BatchNorm1d(256),
            nn.Dropout(drop_out_par)
        )

        # Head (output)
        self.head = nn.Sequential(
            nn.Linear(256, 1) 
            # (Pay Attention): nn.Sigmoid() or nn.Softmax(dim=1) depending on task (Replace 1 with num_classes for multi-class)
         ) 
        
        # Weight initialization hint
        # nn.init.kaiming_uniform_(self.features_extraction[0].weight, nonlinearity='relu')

    def forward(self, x):
        x = self.flatten(x)
        
        # Residual connection: add projected input to first hidden layer output
        res = self.residual(x)
        features = self.features_extraction(x)
        features = features + res  # simple residual addition

        # Output
        logits = self.head(features)
        return logits


In [18]:
# remember that the features sets are composed by (n_sample,n_features). So a sample is a vector of n_featues elements. 
input_dim = features_train.shape[1]
drop_out_par = 0.1

model = NeuralNetwork(input_dim,drop_out_par)
summary(model, input_size=(1, input_dim))


Layer (type:depth-idx)                   Output Shape              Param #
NeuralNetwork                            [1, 1]                    --
├─Flatten: 1-1                           [1, 3]                    --
├─Linear: 1-2                            [1, 256]                  1,024
├─Sequential: 1-3                        [1, 256]                  --
│    └─Linear: 2-1                       [1, 64]                   256
│    └─ReLU: 2-2                         [1, 64]                   --
│    └─BatchNorm1d: 2-3                  [1, 64]                   128
│    └─Dropout: 2-4                      [1, 64]                   --
│    └─Linear: 2-5                       [1, 128]                  8,320
│    └─ReLU: 2-6                         [1, 128]                  --
│    └─BatchNorm1d: 2-7                  [1, 128]                  256
│    └─Dropout: 2-8                      [1, 128]                  --
│    └─Linear: 2-9                       [1, 256]                  33,024
│ 

## Train It

In [None]:
loss_fn = 
optimizer = 

In [None]:
model.train()
for batch, (X, y) in enumerate(train_dataset):

    # Compute prediction and loss
    pred = model(X)
    loss = loss_fn(pred, y)

    # Backpropagation
    loss.backward()
    optimizer.step()
    optimizer.zero_grad()

    if batch % 100 == 0:
        loss, current = loss.item(), batch * batch_size + len(X)
        print(f"loss: {loss:>7f}  [{current:>5d}/{size:>5d}]")