In [1]:
import torch
import torch.nn as nn
import torch.nn.functional as F
torch.manual_seed(42)

<torch._C.Generator at 0x115725270>

<span style="font-size: 15px;">Neural network layers are the fundamental building blocks that transform input data into meaningful representations. Each layer type is designed for specific data structures and tasks. The **shape of the input and output tensors** plays a crucial role in understanding how these layers operate.

In what follows, we investigate the most common layer types in PyTorch:

1. **nn.Linear**: Fully connected layers for general-purpose transformations
2. **nn.Conv1d**: 1D convolutions for sequential/temporal data
3. **nn.Conv2d**: 2D convolutions for image data
4. **nn.Conv3d**: 3D convolutions for volumetric/video data

Throughout this notebook:
- $B$ denotes the batch size
- $C_{\text{in}}$ and $C_{\text{out}}$ denote input and output channels/features
- Spatial dimensions are denoted by $L$ (length), $H$ (height), $W$ (width), $D$ (depth)
</span>

**Overview**

| Layer | Input Shape | Output Shape | Primary Use Cases | Key Parameters |
|-------|-------------|--------------|-------------------|----------------|
| **nn.Linear** | $(*, H_{\text{in}})$ | $(*, H_{\text{out}})$ | MLPs, classification heads, dense connections, Transformer projections | `in_features`, `out_features`, `bias` |
| **nn.Conv1d** | $(B, C_{\text{in}}, L)$ | $(B, C_{\text{out}}, L_{\text{out}})$ | Time series, audio, text (1D sequences) | `in_channels`, `out_channels`, `kernel_size`, `stride`, `padding` |
| **nn.Conv2d** | $(B, C_{\text{in}}, H, W)$ | $(B, C_{\text{out}}, H_{\text{out}}, W_{\text{out}})$ | Images, feature maps, 2D spatial data | `in_channels`, `out_channels`, `kernel_size`, `stride`, `padding`, `dilation`, `groups` |
| **nn.Conv3d** | $(B, C_{\text{in}}, D, H, W)$ | $(B, C_{\text{out}}, D_{\text{out}}, H_{\text{out}}, W_{\text{out}})$ | Videos, medical imaging (CT/MRI), 3D point clouds | `in_channels`, `out_channels`, `kernel_size`, `stride`, `padding`, `dilation`, `groups` |

Detailed explanations of each layer, including mathematical formulations and implementation examples, follow below.

## nn.Linear

<span style="font-size: 15px;">

### Where is it used?

The `nn.Linear` layer (also known as a fully connected or dense layer) applies a linear transformation to the incoming data. It is the most fundamental building block in neural networks and is used in:

- **Multi-Layer Perceptrons (MLPs)**: The backbone of simple feedforward networks
- **Classification heads**: Final layers that map features to class logits
- **Transformer architectures**: Q, K, V projections and feed-forward networks
- **Autoencoders**: Encoding and decoding dense representations
- **Regression tasks**: Mapping features to continuous outputs

### Input and Output Shapes

- **Input**: $(*, H_{\text{in}})$ where $*$ means any number of dimensions and $H_{\text{in}}$ is the number of input features
- **Output**: $(*, H_{\text{out}})$ where all dimensions except the last remain unchanged

**Important**: The linear transformation is applied to the **last dimension** only.

### Default Arguments

```python
nn.Linear(in_features, out_features, bias=True, device=None, dtype=None)
```

- `in_features` (int): Size of each input sample (required)
- `out_features` (int): Size of each output sample (required)
- `bias` (bool): If `True`, adds a learnable bias. Default: `True`
- 
### Mathematical Formulation

The linear layer applies the following transformation:

$$
{y_{\rm out}}^{i,j,\cdots, k} = \omega_{k z} \, {x_{\rm in}}^{i,j,\cdots, z} + b^{k}
$$


where the indices "$i,j,\cdots$" represent an arbitrary dimension, i.e., $*$. This makes the layer **batch-agnostic**. As we see,
- $x$ is the input tensor of shape $(*, H_{\text{in}})$
- $\omega$ is the weight matrix of shape $(H_{\text{out}}, H_{\text{in}})$
- $b$ is the bias vector of shape $(H_{\text{out}})$
- $y$ is the output tensor of shape $(*, H_{\text{out}})$
- **Notice**, that, all elements in the arbirary dimension "$i,j,\cdots$" get exactly the same biases and weights.




### Weight Initialization

Weights are initialized from $\mathcal{U}(-\sqrt{k}, \sqrt{k})$ where $k = \frac{1}{H_{\text{in}}}$

</span>

In [8]:
# Basic nn.Linear usage
# Create a linear layer that maps 20 input features to 30 output features
linear = nn.Linear(20, 30)

# Check the weight and bias shapes
print(f"Weight shape: {linear.weight.shape}")  # (out_features, in_features)
print(f"Bias shape: {linear.bias.shape}")      # (out_features,)

Weight shape: torch.Size([30, 20])
Bias shape: torch.Size([30])


In [25]:
# Example 1: Simple 2D input (batch_size, in_features)
batch_size = 128
in_features = 20
out_features = 30

linear = nn.Linear(in_features, out_features)
models_weights = linear.weight
models_biases = linear.bias
x = torch.randn(batch_size, in_features)
output = linear(x)

print(f"Input shape: {x.shape}")
print(f"Output shape: {output.shape}")

# Lets now try to reproduce the output manually:

output_manual = torch.einsum('bi,oi->bo', x, models_weights) + models_biases.unsqueeze(0).expand(batch_size, -1)
print(f'Outputs match: {torch.all(output_manual==output).item()}')

Input shape: torch.Size([128, 20])
Output shape: torch.Size([128, 30])
Outputs match: True


In [23]:
# Example 3: Without bias
linear_no_bias = nn.Linear(20, 30, bias=False)
print(f"Has bias: {linear_no_bias.bias is not None}")

# Verify the transformation manually
x = torch.randn(5, 20)
output_layer = linear_no_bias(x)
output_manual = x @ linear_no_bias.weight.T  # y = x @ W^T

print(f"Outputs match: {torch.allclose(output_layer, output_manual)}")

Has bias: False
Outputs match: True
