# Module

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Mitchell-Mirano/sorix/blob/qa/docs/learn/layers/06-Module.ipynb)
[![Open in GitHub](https://img.shields.io/badge/Open%20in-GitHub-black?logo=github)](https://github.com/Mitchell-Mirano/sorix/blob/qa/docs/learn/layers/06-Module.ipynb)
[![Open in Docs](https://img.shields.io/badge/Open%20in-Docs-blue?logo=readthedocs)](https://mitchell-mirano.github.io/sorix/latest/learn/layers/06-Module)

In Sorix, the `Module` class is the fundamental building block for all neural network components. Whether you are building a simple activation function, a complex layer, or an entire deep neural network, you will almost always inherit from `Module`.

Its design is intentionally similar to PyTorch's `nn.Module`, making it intuitive for those coming from other frameworks while remaining simple enough to extend manually.

## Key Features of `Module`

1. **Automatic Parameter Tracking**: Any `Tensor` attribute that has `requires_grad=True` is automatically collected by the `.parameters()` method.
2. **Sub-module Registration**: If you assign another `Module` as an attribute of your class, Sorix will recursively find its parameters as well.
3. **Device Management**: The `.to(device)` method moves all parameters and sub-modules to CPU or GPU (via CuPy).
4. **Training/Evaluation Modes**: The `.train()` and `.eval()` methods toggle the behavior of layers like `Dropout` and `BatchNorm1d` surface-wide.
5. **State Management**: `.state_dict()` and `.load_state_dict()` allow for easy serialization of your model's weights.

## 1. Creating a Custom Layer with Parameters

A "layer" in Sorix is just a `Module` that performs a specific operation. While we have built-in layers like `Linear`, you can easily create your own. 

Let's implement a **Parametric ReLU (PReLU)**, which is like a standard ReLU but with a learned slope $\alpha$ for negative values:

$$f(x) = \begin{cases} x & \text{if } x > 0 \\ \alpha x & \text{if } x \leq 0 \end{cases}$$

In [1]:
import numpy as np
from sorix import tensor
from sorix.nn import Module

class PReLU(Module):
    def __init__(self, size=1, initial_alpha=0.25):
        super().__init__()
        # alpha is a learned parameter
        self.alpha = tensor(np.full(size, initial_alpha), requires_grad=True)
        
    def forward(self, x):
        # x > 0 returns a boolean mask (converted to float in operation)
        # We use Sorix operations to stay within the autograd graph
        pos = (x > 0) * x
        neg = (x <= 0) * (self.alpha * x)
        return pos + neg

prelu = PReLU(size=1)
x = tensor([-2.0, 1.0, -0.5])
y = prelu(x)

print(f"Input: {x.numpy()}")
print(f"Output with initial alpha=0.25: {y.numpy()}")

Input: [-2.   1.  -0.5]
Output with initial alpha=0.25: [-0.5    1.    -0.125]


### Verifying Autograd in Custom Layers

To verify that our layer is indeed learning, we can perform a simple optimization step. If we want the output for negative numbers to be more positive, the optimizer should adjust `alpha` accordingly.

In [2]:
from sorix.optim import SGD
from sorix.nn import MSELoss

optimizer = SGD(prelu.parameters(), lr=0.1)
target = tensor([0.0, 1.0, 0.0]) # We want negative inputs to result in 0
criterion = MSELoss()

print(f"Alpha before update: {prelu.alpha.item():.4f}")

# One training step
y = prelu(x)
loss = criterion(y, target)
loss.backward()
optimizer.step()
optimizer.zero_grad()

print(f"Alpha after update: {prelu.alpha.item():.4f}")
print(f"New output: {prelu(x).numpy()}")

Alpha before update: 0.2500
Alpha after update: 0.1792
New output: [-0.35833333  1.         -0.08958333]


## 2. Advanced Composition: Residual Blocks

Modern deep learning architectures (like ResNets) rely on **Skip Connections**. In Sorix, you can easily build complex re-usable blocks by nesting other modules.

In [3]:
from sorix.nn import Linear, ReLU, BatchNorm1d

class ResidualBlock(Module):
    def __init__(self, dim):
        super().__init__()
        self.fc1 = Linear(dim, dim)
        self.bn1 = BatchNorm1d(dim)
        self.relu = ReLU()
        self.fc2 = Linear(dim, dim)
        self.bn2 = BatchNorm1d(dim)
        
    def forward(self, x):
        residual = x
        
        out = self.fc1(x)
        out = self.bn1(out)
        out = self.relu(out)
        out = self.fc2(out)
        out = self.bn2(out)
        
        return self.relu(out + residual)

block = ResidualBlock(10)
print(f"Number of parameters in block: {len(block.parameters())}")

Number of parameters in block: 8


## 3. Training a Complete Architecture

Let's build a ResNet-style MLP and train it on a simple synthetic regression task to prove that the entire stack (nested modules, residual connections, custom layers, and optimizers) works in harmony.

In [4]:
class ResNetMLP(Module):
    def __init__(self, input_dim, hidden_dim, output_dim, num_blocks=2):
        super().__init__()
        self.stem = Linear(input_dim, hidden_dim)
        self.blocks = [ResidualBlock(hidden_dim) for _ in range(num_blocks)]
        self.prelu = PReLU(size=hidden_dim)
        self.head = Linear(hidden_dim, output_dim)
        
    def forward(self, x):
        x = self.stem(x)
        for block in self.blocks:
            x = block(x)
        x = self.prelu(x)
        return self.head(x)

model = ResNetMLP(input_dim=5, hidden_dim=16, output_dim=1)
optimizer = SGD(model.parameters(), lr=0.01)
criterion = MSELoss()

# Create synthetic data: y = sum(x) 
X_train = tensor(np.random.randn(100, 5))
y_train = tensor(np.sum(X_train.numpy(), axis=1, keepdims=True))

print("Training ResNetMLP...")
for epoch in range(101):
    model.train()
    y_pred = model(X_train)
    loss = criterion(y_pred, y_train)
    
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()
    
    if epoch % 20 == 0:
        print(f"Epoch {epoch:3d} | Loss: {loss.item():.6f}")

Training ResNetMLP...
Epoch   0 | Loss: 4.135838
Epoch  20 | Loss: 0.418998
Epoch  40 | Loss: 0.248450
Epoch  60 | Loss: 0.168582
Epoch  80 | Loss: 0.120915
Epoch 100 | Loss: 0.091177


## 4. Parameter and State Management

One of the most powerful features of `Module` is the `.parameters()` method. It automatically crawls the object's attributes (including lists, dictionaries, and sub-models) to find everything that needs to be optimized.

The `state_dict()` returns a dictionary mapping parameter names to their current values, perfect for saving weights.

In [5]:
from sorix import save, load

# Get state dict
sd = model.state_dict()
print("State dict keys sample:", list(sd.keys())[:5])

# Save and Load
save(sd, "resnet_model.sor")
loaded_weights = load("resnet_model.sor")

new_model = ResNetMLP(input_dim=5, hidden_dim=16, output_dim=1)
new_model.load_state_dict(loaded_weights)

print("\nWeights persistence verified!")

State dict keys sample: ['stem.W', 'stem.b', 'prelu.alpha', 'head.W', 'head.b']

Weights persistence verified!


## 5. Device and Mode Management

Since our model contains `BatchNorm1d`, switching between `train()` and `eval()` is mandatory for correct inference.

In [6]:
import sorix

# Switch to evaluation mode (essential for BatchNorm/Dropout)
model.eval()
print(f"In training mode? {model.training}")

# Move to GPU if available
if sorix.cuda.is_available():
    model.to('cuda')
    print("Entire model and its nested blocks moved to GPU memory.")

In training mode? False


âœ… GPU basic operation passed
âœ… GPU available: NVIDIA GeForce RTX 4070 Laptop GPU
CUDA runtime version: 13000
CuPy version: 14.0.1
Entire model and its nested blocks moved to GPU memory.


## Conclusion

By subclassing `Module`, you gain all the power of Sorix's ecosystem with minimal code. You can implement complex research architectures with skip connections and custom primitives, and Sorix will handle the gradients, optimization, and hardware acceleration for you.