### ME2


#### Import necessary packages

This notebook demonstrates custom deep learning inference using NumPy for portability and clarity.

- **NumPy** is the core array library we use to implement convolution, pooling, activation, and linear layers from scratch.
- We still rely on PyTorch / torchvision for data loading and to access pretrained AlexNet weights.
- Supporting packages (einops for concise tensor algebra, tqdm for progress reporting) round out the tooling.

The imports below are grouped to highlight standard libraries, third-party utilities, and NumPy stride tricks used for patch extraction.


In [1]:
# Standard library imports
from typing import Tuple

# Third-party imports
import torch
from torch import nn
from torch.utils.data import DataLoader
from einops import einsum, rearrange
from tqdm.auto import tqdm
from torchvision.datasets import ImageNet
from torchvision.models import AlexNet_Weights

# NumPy imports
import numpy as np
from numpy.lib.stride_tricks import sliding_window_view

  from .autonotebook import tqdm as notebook_tqdm


In [2]:
# Profiling utilities: decorator to time forward methods
import time
from functools import wraps
from collections import defaultdict

PROFILE_EVENTS = []  # list of dicts: {"name": str, "elapsed": float}


def profile_forward(name: str | None = None):
    """Decorator to measure execution time of forward methods.

    Args:
        name: Optional override name. If None attempts to derive from self.__class__.__name__.
    """
    def outer(fn):
        @wraps(fn)
        def inner(self, *args, **kwargs):
            label = name or self.__class__.__name__
            start = time.perf_counter()
            out = fn(self, *args, **kwargs)
            end = time.perf_counter()
            PROFILE_EVENTS.append({"name": label, "elapsed": end - start})
            print(f'Name: {label}, Time: {(end - start)*1000:.3f} ms')
            return out
        return inner
    return outer


def profile_summary(sort: str = "total", top_n: int | None = None):
    """Print aggregated profiling results.

    Args:
        sort: One of {"total", "mean", "calls", "max"} to control ordering.
        top_n: If provided, limit output rows.
    """
    if not PROFILE_EVENTS:
        print("No profiling data collected.")
        return
    agg = defaultdict(lambda: {"total": 0.0, "calls": 0, "max": 0.0})
    for ev in PROFILE_EVENTS:
        a = agg[ev["name"]]
        a["total"] += ev["elapsed"]
        a["calls"] += 1
        if ev["elapsed"] > a["max"]:
            a["max"] = ev["elapsed"]
    rows = []
    for name, stats in agg.items():
        mean = stats["total"] / stats["calls"]
        rows.append({"name": name, **stats, "mean": mean})
    rows.sort(key=lambda r: r[sort], reverse=True)
    if top_n is not None:
        rows = rows[:top_n]
    header = f"{'Layer':<22} {'Calls':>5} {'Total(ms)':>12} {'Mean(ms)':>12} {'Max(ms)':>12}"
    print(header)
    print('-' * len(header))
    for r in rows:
        print(f"{r['name']:<22} {r['calls']:>5} {r['total']*1000:>12.3f} {r['mean']*1000:>12.3f} {r['max']*1000:>12.3f}")
    total_time = sum(r['total'] for r in rows)
    print('-' * len(header))
    print(
        f"Total profiled time: {total_time*1000:.2f} ms across {len(PROFILE_EVENTS)} calls")

#### Load the weights and biases of AlexNet

In this section, we extract the pretrained weights and biases from torchvision's AlexNet model.

- The weights and biases are loaded using the default configuration from `AlexNet_Weights`.
- These parameters are stored in a dictionary and printed to verify the available keys.
- Our custom layers will use these pretrained values for inference, ensuring the model matches the original AlexNet architecture.

This step is essential for initializing all custom layers with correct parameters before running inference on the validation set.


In [3]:
weights_and_biases = AlexNet_Weights.DEFAULT.get_state_dict()
print(weights_and_biases.keys())

odict_keys(['features.0.weight', 'features.0.bias', 'features.3.weight', 'features.3.bias', 'features.6.weight', 'features.6.bias', 'features.8.weight', 'features.8.bias', 'features.10.weight', 'features.10.bias', 'classifier.1.weight', 'classifier.1.bias', 'classifier.4.weight', 'classifier.4.bias', 'classifier.6.weight', 'classifier.6.bias'])


#### Load the data

In this section, we load the ImageNet validation dataset and prepare it for inference with our custom NumPy-based AlexNet implementation.

- We use torchvision's `ImageNet` class to access the validation split, applying the standard AlexNet preprocessing transforms.
- The DataLoader batches images; our custom `default_collate` converts tensors directly to NumPy arrays.
- Keeping everything in NumPy simplifies the educational focus (no device transfers or GPU specifics).

This setup enables end-to-end inference using only NumPy for tensor operations alongside pretrained weights.


In [4]:
def default_collate(batch):
    """
    Collate function converting incoming PyTorch tensors to NumPy arrays.

    We convert tensors early so all subsequent custom layers operate purely on
    NumPy arrays (no device transfers needed).
    """
    imgs, labels = zip(*batch)  # imgs: tuple[torch.Tensor], labels: tuple[int]
    imgs = [np.asarray(img.numpy(), dtype=np.float32) for img in imgs]
    imgs = np.stack(imgs, axis=0)                 # [B,3,224,224]
    labels = np.asarray(labels, dtype=np.int64)    # [B]
    return imgs, labels


# implement using ImageNet
imagenet_val = ImageNet(
    root="data/ImageNet1k",
    split="val",
    transform=AlexNet_Weights.IMAGENET1K_V1.transforms()
)

# the dataloader automatically segregates the labels
# val_dataloader = DataLoader(
#     imagenet_val,
#     batch_size=512,
#     shuffle=False,
#     num_workers=0,
#     collate_fn=default_collate,
# )
val_dataloader = DataLoader(
    imagenet_val,
    batch_size=128,
    num_workers=12,
    prefetch_factor=4,
    persistent_workers=True,
    pin_memory=False,
    drop_last=False,
    collate_fn=default_collate
)

#### Define the custom Conv2d

We implement a minimal convolution by extracting sliding kÃ—k patches with `sliding_window_view` and contracting them with the weight tensor using `einsum`. This mirrors PyTorch's Conv2d behavior for stride and padding in a clear, NumPy-only form.


In [5]:
class PatchMixin:
    """Mixin for extracting patches from input arrays with a given kernel size and stride.
    Used for convolution and pooling operations on NumPy arrays."""

    def __init__(self, kernel_size: int, stride: int) -> None:
        """Initialize patch extraction parameters.
        Args:
            kernel_size (int): Size of the square kernel.
            stride (int): Stride for patch extraction.
        """
        super().__init__()
        self.kernel_size = kernel_size
        self.stride = stride

    def _patch_with_stride(self, x_pad: np.ndarray) -> np.ndarray:
        """Extract k x k patches from the input array with the given stride.
        Args:
            x_pad (np.ndarray): Input array of shape (b, c, h, w).
        Returns:
            np.ndarray: Array of shape (b, c, h/stride, w/stride, k, k) containing the extracted patches.
        """
        windows = sliding_window_view(  # type: ignore
            x_pad,
            window_shape=(self.kernel_size, self.kernel_size),
            axis=(-2, -1)  # type: ignore
        )
        return windows[:, :, ::self.stride, ::self.stride, :, :]


class WeightsAndBiasMixin:
    """Mixin for loading pretrained weights and biases from a state dict as NumPy arrays."""

    def __init__(self, *args, **kwargs) -> None:
        """Initialize the mixin (calls super)."""
        super().__init__(*args, **kwargs)

    def init_weights_and_bias(self, weight_loc: str, bias_loc: str) -> Tuple[np.ndarray, np.ndarray]:
        """Load weights and biases from the state dict.
        Args:
            weight_loc (str): Key for weights in the state dict.
            bias_loc (str): Key for biases in the state dict.
        Returns:
            Tuple[np.ndarray, np.ndarray]: NumPy arrays for weights and biases.
        """
        weight = weights_and_biases[weight_loc].detach().cpu().numpy()
        bias = weights_and_biases[bias_loc].detach().cpu().numpy()
        return weight, bias


class CustomConv2d(WeightsAndBiasMixin, PatchMixin, nn.Module):
    """
    Custom 2D Convolution layer using NumPy and einsum.
    Performs convolution on NumPy arrays.
    Limited shape flexibility for demonstration/inference.
    """

    def __init__(
        self,
        in_channels: int,
        out_channels: int,
        kernel_size: int,
        stride: int = 1,
        padding: int = 0,
        weight_loc: str = '',
        bias_loc: str = '',
    ) -> None:
        """Initialize the convolution layer with parameters and pretrained weights/biases.
        Args:
            in_channels (int): Number of input channels.
            out_channels (int): Number of output channels.
            kernel_size (int): Size of the convolution kernel.
            stride (int, optional): Stride for convolution. Defaults to 1.
            padding (int, optional): Padding for input. Defaults to 0.
            weight_loc (str, optional): Key for weights in state dict.
            bias_loc (str, optional): Key for biases in state dict.
        """
        super().__init__(kernel_size, stride)
        self.in_channels = in_channels
        self.out_channels = out_channels
        self.kernel_size = kernel_size
        self.stride = stride
        self.padding = padding

        self.weight, self.bias = self.init_weights_and_bias(
            weight_loc, bias_loc)
        self.reset_parameters()

    def reset_parameters(self) -> None:
        """No-op for pretrained weights. Provided for API compatibility."""
        pass

    def _apply_padding(self, x: np.ndarray) -> np.ndarray:
        """Apply zero padding to the input array if required.
        Args:
            x (np.ndarray): Input array of shape (b, c, h, w).
        Returns:
            np.ndarray: Padded input array.
        """
        if self.padding == 0:
            return x
        return np.pad(
            x,
            pad_width=((0, 0), (0, 0), (self.padding, self.padding),
                       (self.padding, self.padding)),
            mode='constant',
            constant_values=0,
        )

    @profile_forward()
    def forward(self, x: np.ndarray) -> np.ndarray:
        """Perform the forward pass of the convolution layer.
        Args:
            x (np.ndarray): Input array of shape (b, c, h, w).
        Returns:
            np.ndarray: Output array after convolution and bias addition.
        """
        x_pad = self._apply_padding(x)
        patched_windows = self._patch_with_stride(x_pad)
        pre_activation = einsum(
            patched_windows, self.weight, 'b c w h kw kh, o c kw kh -> b o w h')
        return pre_activation + self.bias[None, :, None, None]  # type: ignore

#### Custom ReLU

A straightforward NumPy implementation of ReLU using `np.maximum`.


In [6]:
class CustomReLU(nn.Module):
    """
    Custom ReLU activation layer using NumPy.

    Applies the Rectified Linear Unit (ReLU) function: f(x)=max(0,x) element-wise.
    """

    @profile_forward()
    def forward(self, x: np.ndarray) -> np.ndarray:
        """Apply ReLU to the input.
        Args:
            x (np.ndarray): Input tensor of any shape.
        Returns:
            np.ndarray: Same shape as input with negatives zeroed.
        """
        return np.maximum(x, 0.0)

#### Custom MaxPool2d

Implemented via the same patch extraction utility as convolution, followed by a reduction (`np.max`) over the spatial kernel dimensions.


In [7]:
class CustomMaxPool2d(PatchMixin, nn.Module):
    """
    Custom Max Pooling layer using NumPy.

    Extracts spatial patches then takes the maximum over each patch.
    """

    def __init__(self, kernel_size: int, stride: int) -> None:
        super().__init__(kernel_size, stride)

    @profile_forward()
    def forward(self, x: np.ndarray) -> np.ndarray:
        """Apply max pooling.
        Args:
            x (np.ndarray): Input of shape (batch, channels, height, width).
        Returns:
            np.ndarray: Pooled output.
        """
        patched_windows = self._patch_with_stride(x)
        return np.max(patched_windows, axis=(-2, -1))

#### Custom Adaptive AvgPool2d

Reduces arbitrary spatial dimensions to a fixed target by averaging over partitioned regions computed with integer floor/ceil boundaries.


In [8]:
class CustomAdaptiveAvgPool2d(nn.Module):
    """
    Adaptive Average Pooling layer implemented with NumPy.

    Reduces spatial dimensions to a target (H_out, W_out) by averaging over
    variable-sized input regions computed via integer partitioning.
    """

    def __init__(self, output_size: tuple[int, int]) -> None:
        super().__init__()
        if isinstance(output_size, int):
            self.output_size = (output_size, output_size)
        else:
            self.output_size = output_size

    @profile_forward()
    def forward(self, x: np.ndarray) -> np.ndarray:
        """Apply adaptive average pooling.
        Args:
            x (np.ndarray): Input of shape (batch, channels, height, width).
        Returns:
            np.ndarray: Output of shape (batch, channels, H_out, W_out).
        """
        b, c, h, w = x.shape
        out_h, out_w = self.output_size
        out = np.zeros((b, c, out_h, out_w), dtype=x.dtype)
        for i in range(out_h):
            h_start = int(np.floor(i * h / out_h))
            h_end = int(np.ceil((i + 1) * h / out_h))
            for j in range(out_w):
                w_start = int(np.floor(j * w / out_w))
                w_end = int(np.ceil((j + 1) * w / out_w))
                region = x[:, :, h_start:h_end, w_start:w_end]
                out[:, :, i, j] = region.mean(axis=(-2, -1))
        return out

#### Custom Linear Module

Uses `einsum` to express the matrix multiply `x W^T` cleanly, then adds the bias. Operates purely on NumPy arrays.


In [9]:
class EinopsLinear(WeightsAndBiasMixin, nn.Module):
    """
    Custom Linear (fully connected) layer using NumPy and einops.

    Performs y = x W^T + b via einsum for clarity.
    """

    def __init__(self, in_features: int, out_features: int, weight_loc: str, bias_loc: str):
        super().__init__()
        self.in_features = in_features
        self.out_features = out_features
        self.weight, self.bias = self.init_weights_and_bias(
            weight_loc, bias_loc)

    @profile_forward()
    def forward(self, x: np.ndarray) -> np.ndarray:
        """Forward pass.
        Args:
            x (np.ndarray): Shape (batch, in_features).
        Returns:
            np.ndarray: Shape (batch, out_features).
        """
        y = einsum(x, self.weight, "b i, o i -> b o")
        if self.bias is not None:
            y = y + self.bias
        return y

#### AlexNet Class Implementation

Assembles the feature extractor, adaptive pooling, and classifier blocks using the custom NumPy-based layers defined above.


In [10]:
class AlexNet(nn.Module):
    """
    Custom AlexNet implementation using NumPy for educational inference.

    Replicates the original AlexNet architecture with all major layers
    (convolution, pooling, linear, activation) implemented from scratch
    operating on NumPy arrays. Pretrained weights are loaded from the
    torchvision reference model.
    """

    def __init__(self, num_classes: int = 1000) -> None:
        super().__init__()
        self.features = nn.Sequential(
            CustomConv2d(3, 64, kernel_size=11, stride=4, padding=2,
                         weight_loc='features.0.weight', bias_loc='features.0.bias'),
            CustomReLU(),
            CustomMaxPool2d(kernel_size=3, stride=2),
            CustomConv2d(64, 192, kernel_size=5, padding=2,
                         weight_loc='features.3.weight', bias_loc='features.3.bias'),
            CustomReLU(),
            CustomMaxPool2d(kernel_size=3, stride=2),
            CustomConv2d(192, 384, kernel_size=3, padding=1,
                         weight_loc='features.6.weight', bias_loc='features.6.bias'),
            CustomReLU(),
            CustomConv2d(384, 256, kernel_size=3, padding=1,
                         weight_loc='features.8.weight', bias_loc='features.8.bias'),
            CustomReLU(),
            CustomConv2d(256, 256, kernel_size=3, padding=1,
                         weight_loc='features.10.weight', bias_loc='features.10.bias'),
            CustomReLU(),
            CustomMaxPool2d(kernel_size=3, stride=2),
        )
        self.avgpool = CustomAdaptiveAvgPool2d((6, 6))
        self.classifier = nn.Sequential(
            EinopsLinear(256 * 6 * 6, 4096, weight_loc='classifier.1.weight',
                         bias_loc='classifier.1.bias'),
            CustomReLU(),
            EinopsLinear(4096, 4096, weight_loc='classifier.4.weight',
                         bias_loc='classifier.4.bias'),
            CustomReLU(),
            EinopsLinear(4096, num_classes, weight_loc='classifier.6.weight',
                         bias_loc='classifier.6.bias'),
        )

    @profile_forward("AlexNet.forward")
    def forward(self, x: np.ndarray) -> np.ndarray:
        """Forward pass.
        Args:
            x (np.ndarray): (batch, 3, 224, 224)
        Returns:
            np.ndarray: (batch, num_classes) class scores.
        """
        x = self.features(x)
        x = self.avgpool(x)
        b = x.shape[0]
        x = x.reshape(b, -1)  # flatten
        x = self.classifier(x)
        return x

#### Inference

Run a forward pass over the validation set, accumulating accuracy using NumPy operations.


#### Profiling Usage

All forward methods are wrapped with a lightweight timing decorator storing per-call events in `PROFILE_EVENTS`.

After running inference you can view an aggregate table:

```python
profile_summary()                 # default sort by total time
profile_summary(sort="mean")      # sort by mean per call
profile_summary(sort="max")       # highlight worst single call
profile_summary(top_n=5)          # show only top 5 layers
```

Columns:

- Total(ms): cumulative time across calls
- Mean(ms): average per invocation
- Max(ms): slowest single invocation
- Calls: number of times that layer's forward executed

To reset profiling data between experiments:

```python
PROFILE_EVENTS.clear()
```


In [11]:
model = AlexNet()
model.eval()

total = 0
correct = 0

# Estimate total batches for progress bar length
try:
    total_batches = len(val_dataloader)
except TypeError:
    total_batches = None

for images, labels in tqdm(val_dataloader, total=total_batches, desc="Evaluating", leave=True):
    # images, labels already NumPy from collate
    outputs = model.forward(images)  # (b, num_classes) np.ndarray
    predicted = np.argmax(outputs, axis=1)
    total += int(labels.shape[0])
    correct += int((predicted == labels).sum())
    running_acc = correct / total if total else 0.0
    tqdm.write(f"Running Acc: {running_acc:.4f}")

accuracy = correct / total if total else 0.0
print(f"Validation Accuracy: {accuracy:.4f}")

Evaluating:   0%|          | 0/391 [00:00<?, ?it/s]

Name: CustomConv2d, Time: 4006.158 ms
Name: CustomReLU, Time: 10.769 ms
Name: CustomMaxPool2d, Time: 428.962 ms
Name: CustomMaxPool2d, Time: 428.962 ms
Name: CustomConv2d, Time: 27425.579 ms
Name: CustomReLU, Time: 7.916 ms
Name: CustomConv2d, Time: 27425.579 ms
Name: CustomReLU, Time: 7.916 ms
Name: CustomMaxPool2d, Time: 305.894 ms
Name: CustomMaxPool2d, Time: 305.894 ms
Name: CustomConv2d, Time: 25686.951 ms
Name: CustomReLU, Time: 2.128 ms
Name: CustomConv2d, Time: 25686.951 ms
Name: CustomReLU, Time: 2.128 ms
Name: CustomConv2d, Time: 33992.972 ms
Name: CustomReLU, Time: 1.467 ms
Name: CustomConv2d, Time: 33992.972 ms
Name: CustomReLU, Time: 1.467 ms
Name: CustomConv2d, Time: 22696.935 ms
Name: CustomReLU, Time: 1.425 ms
Name: CustomMaxPool2d, Time: 92.475 ms
Name: CustomAdaptiveAvgPool2d, Time: 2.481 ms
Name: CustomConv2d, Time: 22696.935 ms
Name: CustomReLU, Time: 1.425 ms
Name: CustomMaxPool2d, Time: 92.475 ms
Name: CustomAdaptiveAvgPool2d, Time: 2.481 ms
Name: EinopsLinear, Ti

Evaluating:   0%|          | 1/391 [02:00<13:04:17, 120.66s/it]

Name: EinopsLinear, Time: 222.093 ms
Name: CustomReLU, Time: 0.152 ms
Name: EinopsLinear, Time: 49.427 ms
Name: AlexNet.forward, Time: 115738.870 ms
Running Acc: 0.7422
Name: CustomConv2d, Time: 3832.251 ms
Name: CustomReLU, Time: 10.684 ms
Name: CustomConv2d, Time: 3832.251 ms
Name: CustomReLU, Time: 10.684 ms
Name: CustomMaxPool2d, Time: 429.246 ms
Name: CustomMaxPool2d, Time: 429.246 ms
Name: CustomConv2d, Time: 27857.744 ms
Name: CustomReLU, Time: 7.822 ms
Name: CustomConv2d, Time: 27857.744 ms
Name: CustomReLU, Time: 7.822 ms
Name: CustomMaxPool2d, Time: 305.207 ms
Name: CustomMaxPool2d, Time: 305.207 ms
Name: CustomConv2d, Time: 26001.330 ms
Name: CustomReLU, Time: 2.234 ms
Name: CustomConv2d, Time: 26001.330 ms
Name: CustomReLU, Time: 2.234 ms


Evaluating:   0%|          | 1/391 [03:33<23:10:36, 213.94s/it]



KeyboardInterrupt: 