### ME2


#### Import necessary packages

This notebook demonstrates custom deep learning inference using NumPy for portability and clarity.

- **NumPy** is the core array library we use to implement convolution, pooling, activation, and linear layers from scratch.
- We still rely on PyTorch / torchvision for data loading and to access pretrained AlexNet weights.
- Supporting packages (einops for concise tensor algebra, tqdm for progress reporting) round out the tooling.

The imports below are grouped to highlight standard libraries, third-party utilities, and NumPy stride tricks used for patch extraction.


In [None]:
# Standard library imports
from typing import Tuple

# Third-party imports
from torch import nn
from torch.utils.data import DataLoader
from einops import einsum, rearrange
from tqdm.auto import tqdm
from torchvision.datasets import ImageNet
from torchvision.models import AlexNet_Weights

# NumPy imports
import numpy as np
from numpy.lib.stride_tricks import sliding_window_view

  from .autonotebook import tqdm as notebook_tqdm


In [2]:
# Profiling utilities: decorator to time forward methods
import time
from functools import wraps
from collections import defaultdict

PROFILE_EVENTS = []  # list of dicts: {"name": str, "elapsed": float}


def profile_forward(name: str | None = None):
    """Decorator to measure execution time of forward methods.

    Args:
        name: Optional override name. If None attempts to derive from self.__class__.__name__.
    """
    def outer(fn):
        @wraps(fn)
        def inner(self, *args, **kwargs):
            label = name or self.__class__.__name__
            start = time.perf_counter()
            out = fn(self, *args, **kwargs)
            end = time.perf_counter()
            PROFILE_EVENTS.append({"name": label, "elapsed": end - start})
            return out
        return inner
    return outer


def profile_summary(sort: str = "total", top_n: int | None = None):
    """Print aggregated profiling results.

    Args:
        sort: One of {"total", "mean", "calls", "max"} to control ordering.
        top_n: If provided, limit output rows.
    """
    if not PROFILE_EVENTS:
        print("No profiling data collected.")
        return
    agg = defaultdict(lambda: {"total": 0.0, "calls": 0, "max": 0.0})
    for ev in PROFILE_EVENTS:
        a = agg[ev["name"]]
        a["total"] += ev["elapsed"]
        a["calls"] += 1
        if ev["elapsed"] > a["max"]:
            a["max"] = ev["elapsed"]
    rows = []
    for name, stats in agg.items():
        mean = stats["total"] / stats["calls"]
        rows.append({"name": name, **stats, "mean": mean})
    rows.sort(key=lambda r: r[sort], reverse=True)
    if top_n is not None:
        rows = rows[:top_n]
    header = f"{'Layer':<22} {'Calls':>5} {'Total(ms)':>12} {'Mean(ms)':>12} {'Max(ms)':>12}"
    print(header)
    print('-' * len(header))
    for r in rows:
        print(f"{r['name']:<22} {r['calls']:>5} {r['total']*1000:>12.3f} {r['mean']*1000:>12.3f} {r['max']*1000:>12.3f}")
    total_time = sum(r['total'] for r in rows)
    print('-' * len(header))
    print(
        f"Total profiled time: {total_time*1000:.2f} ms across {len(PROFILE_EVENTS)} calls")

#### Load the weights and biases of AlexNet

In this section, we extract the pretrained weights and biases from torchvision's AlexNet model.

- The weights and biases are loaded using the default configuration from `AlexNet_Weights`.
- These parameters are stored in a dictionary and printed to verify the available keys.
- Our custom layers will use these pretrained values for inference, ensuring the model matches the original AlexNet architecture.

This step is essential for initializing all custom layers with correct parameters before running inference on the validation set.


In [3]:
weights_and_biases = AlexNet_Weights.DEFAULT.get_state_dict()
print(weights_and_biases.keys())

odict_keys(['features.0.weight', 'features.0.bias', 'features.3.weight', 'features.3.bias', 'features.6.weight', 'features.6.bias', 'features.8.weight', 'features.8.bias', 'features.10.weight', 'features.10.bias', 'classifier.1.weight', 'classifier.1.bias', 'classifier.4.weight', 'classifier.4.bias', 'classifier.6.weight', 'classifier.6.bias'])


#### Load the data

In this section, we load the ImageNet validation dataset and prepare it for inference with our custom NumPy-based AlexNet implementation.

- We use torchvision's `ImageNet` class to access the validation split, applying the standard AlexNet preprocessing transforms.
- The DataLoader batches images; our custom `default_collate` converts tensors directly to NumPy arrays.
- Keeping everything in NumPy simplifies the educational focus (no device transfers or GPU specifics).

This setup enables end-to-end inference using only NumPy for tensor operations alongside pretrained weights.


In [4]:
def default_collate(batch):
    """
    Collate function converting incoming PyTorch tensors to NumPy arrays.

    We convert tensors early so all subsequent custom layers operate purely on
    NumPy arrays (no device transfers needed).
    """
    imgs, labels = zip(*batch)  # imgs: tuple[torch.Tensor], labels: tuple[int]
    imgs = [np.asarray(img.numpy(), dtype=np.float32) for img in imgs]
    imgs = np.stack(imgs, axis=0)                 # [B,3,224,224]
    labels = np.asarray(labels, dtype=np.int64)    # [B]
    return imgs, labels


# implement using ImageNet
imagenet_val = ImageNet(
    root="data/ImageNet1k",
    split="val",
    transform=AlexNet_Weights.IMAGENET1K_V1.transforms()
)

val_dataloader = DataLoader(
    imagenet_val,
    batch_size=256,
    num_workers=12,
    prefetch_factor=4,
    persistent_workers=True,
    pin_memory=False,
    drop_last=False,
    collate_fn=default_collate
)

#### Define the custom Conv2d

We implement convolution with an explicit im2col + GEMM strategy:

1. (Optional) pad the input.
2. Use `sliding_window_view` to obtain a strided view of all k×k receptive fields.
3. Reshape (B, C, out_h, out_w, k, k) -> (B*out_h*out_w, C*k*k).
4. Reshape weights (C_out, C, k, k) -> (C_out, C*k*k).
5. Perform a single matrix multiply and reshape back.

Compared to a high-rank `einsum` over the patch tensor, this form usually maps better onto optimized BLAS routines, improving runtime for large kernels and early layers.


In [5]:
class PatchMixin:
    """Mixin for extracting patches from input arrays with a given kernel size and stride.
    Used for convolution and pooling operations on NumPy arrays."""

    def __init__(self, kernel_size: int, stride: int) -> None:
        """Initialize patch extraction parameters.
        Args:
            kernel_size (int): Size of the square kernel.
            stride (int): Stride for patch extraction.
        """
        super().__init__()
        self.kernel_size = kernel_size
        self.stride = stride

    def _patch_with_stride(self, x_pad: np.ndarray) -> np.ndarray:
        """Extract k x k patches from the input array with the given stride.
        Args:
            x_pad (np.ndarray): Input array of shape (b, c, h, w).
        Returns:
            np.ndarray: Array of shape (b, c, h/stride, w/stride, k, k) containing the extracted patches.
        """
        windows = sliding_window_view(  # type: ignore
            x_pad,
            window_shape=(self.kernel_size, self.kernel_size),
            axis=(-2, -1)  # type: ignore
        )
        return windows[:, :, ::self.stride, ::self.stride, :, :]


class WeightsAndBiasMixin:
    """Mixin for loading pretrained weights and biases from a state dict as NumPy arrays."""

    def __init__(self, *args, **kwargs) -> None:
        """Initialize the mixin (calls super)."""
        super().__init__(*args, **kwargs)

    def init_weights_and_bias(self, weight_loc: str, bias_loc: str) -> Tuple[np.ndarray, np.ndarray]:
        """Load weights and biases from the state dict.
        Args:
            weight_loc (str): Key for weights in the state dict.
            bias_loc (str): Key for biases in the state dict.
        Returns:
            Tuple[np.ndarray, np.ndarray]: NumPy arrays for weights and biases.
        """
        weight = weights_and_biases[weight_loc].detach().cpu().numpy()
        bias = weights_and_biases[bias_loc].detach().cpu().numpy()
        return weight, bias


class CustomConv2d(WeightsAndBiasMixin, PatchMixin, nn.Module):
    """
    Custom 2D Convolution layer using NumPy and an im2col + GEMM strategy.

    Steps:
      1. (Optional) Pad input.
      2. Extract sliding windows (im2col) with stride via `sliding_window_view`.
      3. Reshape patches to a 2D matrix (B*out_h*out_w, C*k*k).
      4. Reshape weights to (out_channels, C*k*k) and perform a matrix multiply.
      5. Reshape back to (B, out_channels, out_h, out_w) and add bias.

    This avoids a high-rank einsum over strided memory and leverages optimized BLAS.
    """

    def __init__(
        self,
        in_channels: int,
        out_channels: int,
        kernel_size: int,
        stride: int = 1,
        padding: int = 0,
        weight_loc: str = '',
        bias_loc: str = '',
    ) -> None:
        """Initialize the convolution layer with parameters and pretrained weights/biases.
        Args:
            in_channels (int): Number of input channels.
            out_channels (int): Number of output channels.
            kernel_size (int): Size of the convolution kernel.
            stride (int, optional): Stride for convolution. Defaults to 1.
            padding (int, optional): Padding for input. Defaults to 0.
            weight_loc (str, optional): Key for weights in state dict.
            bias_loc (str, optional): Key for biases in state dict.
        """
        super().__init__(kernel_size, stride)
        self.in_channels = in_channels
        self.out_channels = out_channels
        self.kernel_size = kernel_size
        self.stride = stride
        self.padding = padding

        self.weight, self.bias = self.init_weights_and_bias(
            weight_loc, bias_loc)
        self.reset_parameters()

    def reset_parameters(self) -> None:
        """No-op for pretrained weights. Provided for API compatibility."""
        pass

    def _apply_padding(self, x: np.ndarray) -> np.ndarray:
        """Apply zero padding to the input array if required.
        Args:
            x (np.ndarray): Input array of shape (b, c, h, w).
        Returns:
            np.ndarray: Padded input array.
        """
        if self.padding == 0:
            return x
        return np.pad(
            x,
            pad_width=((0, 0), (0, 0), (self.padding, self.padding),
                       (self.padding, self.padding)),
            mode='constant',
            constant_values=0,
        )

    @profile_forward()
    def forward(self, x: np.ndarray) -> np.ndarray:
        """Perform the forward pass using im2col + matrix multiply.
        Args:
            x (np.ndarray): Input array of shape (B, C_in, H, W).
        Returns:
            np.ndarray: Output array of shape (B, C_out, H_out, W_out).
        """
        B, C, H, W = x.shape
        k = self.kernel_size
        x_pad = self._apply_padding(x)
        H_p, W_p = x_pad.shape[2:]
        out_h = (H_p - k) // self.stride + 1
        out_w = (W_p - k) // self.stride + 1

        windows = self._patch_with_stride(x_pad)

        # Reshape to (B*out_h*out_w, C*k*k)
        col = windows.reshape(B, C, out_h, out_w, k * k)
        col = col.transpose(0, 2, 3, 1, 4).reshape(
            B * out_h * out_w, C * k * k)

        # Weight reshape: (C_out, C*k*k)
        w_mat = self.weight.reshape(self.out_channels, C * k * k)

        # GEMM: (N, C*k*k) @ (C*k*k, C_out) -> (N, C_out)
        out_mat = col @ w_mat.T

        # Reshape back: (B, out_h, out_w, C_out) -> (B, C_out, out_h, out_w)
        out = out_mat.reshape(
            B, out_h, out_w, self.out_channels).transpose(0, 3, 1, 2)

        if self.bias is not None:
            out += self.bias[None, :, None, None]
        return out

#### Custom ReLU

A straightforward NumPy implementation of ReLU using `np.maximum`.


In [6]:
class CustomReLU(nn.Module):
    """
    Custom ReLU activation layer using NumPy.

    Applies the Rectified Linear Unit (ReLU) function: f(x)=max(0,x) element-wise.
    """

    @profile_forward()
    def forward(self, x: np.ndarray) -> np.ndarray:
        """Apply ReLU to the input.
        Args:
            x (np.ndarray): Input tensor of any shape.
        Returns:
            np.ndarray: Same shape as input with negatives zeroed.
        """
        return np.maximum(x, 0.0)

#### Custom MaxPool2d

Implemented via the same patch extraction utility as convolution, followed by a reduction (`np.max`) over the spatial kernel dimensions.


In [7]:
class CustomMaxPool2d(PatchMixin, nn.Module):
    """
    Custom Max Pooling layer using NumPy.

    Extracts spatial patches then takes the maximum over each patch.
    """

    def __init__(self, kernel_size: int, stride: int) -> None:
        super().__init__(kernel_size, stride)

    @profile_forward()
    def forward(self, x: np.ndarray) -> np.ndarray:
        """Apply max pooling.
        Args:
            x (np.ndarray): Input of shape (batch, channels, height, width).
        Returns:
            np.ndarray: Pooled output.
        """
        patched_windows = self._patch_with_stride(x)
        return np.max(patched_windows, axis=(-2, -1))

#### Custom Adaptive AvgPool2d

Reduces arbitrary spatial dimensions to a fixed target by averaging over partitioned regions computed with integer floor/ceil boundaries.


In [8]:
class CustomAdaptiveAvgPool2d(nn.Module):
    """
    Adaptive Average Pooling layer implemented with NumPy.

    Reduces spatial dimensions to a target (H_out, W_out) by averaging over
    variable-sized input regions computed via integer partitioning.
    """

    def __init__(self, output_size: tuple[int, int]) -> None:
        super().__init__()
        if isinstance(output_size, int):
            self.output_size = (output_size, output_size)
        else:
            self.output_size = output_size

    @profile_forward()
    def forward(self, x: np.ndarray) -> np.ndarray:
        """Apply adaptive average pooling.
        Args:
            x (np.ndarray): Input of shape (batch, channels, height, width).
        Returns:
            np.ndarray: Output of shape (batch, channels, H_out, W_out).
        """
        b, c, h, w = x.shape
        out_h, out_w = self.output_size
        out = np.zeros((b, c, out_h, out_w), dtype=x.dtype)
        for i in range(out_h):
            h_start = int(np.floor(i * h / out_h))
            h_end = int(np.ceil((i + 1) * h / out_h))
            for j in range(out_w):
                w_start = int(np.floor(j * w / out_w))
                w_end = int(np.ceil((j + 1) * w / out_w))
                region = x[:, :, h_start:h_end, w_start:w_end]
                out[:, :, i, j] = region.mean(axis=(-2, -1))
        return out

#### Custom Linear Module

Uses `einsum` to express the matrix multiply `x W^T` cleanly, then adds the bias. Operates purely on NumPy arrays.


In [9]:
class EinopsLinear(WeightsAndBiasMixin, nn.Module):
    """
    Custom Linear (fully connected) layer using NumPy and einops.

    Performs y = x W^T + b via einsum for clarity.
    """

    def __init__(self, in_features: int, out_features: int, weight_loc: str, bias_loc: str):
        super().__init__()
        self.in_features = in_features
        self.out_features = out_features
        self.weight, self.bias = self.init_weights_and_bias(
            weight_loc, bias_loc)

    @profile_forward()
    def forward(self, x: np.ndarray) -> np.ndarray:
        """Forward pass.
        Args:
            x (np.ndarray): Shape (batch, in_features).
        Returns:
            np.ndarray: Shape (batch, out_features).
        """
        y = einsum(x, self.weight, "b i, o i -> b o")
        if self.bias is not None:
            y = y + self.bias
        return y

#### AlexNet Class Implementation

Assembles the feature extractor, adaptive pooling, and classifier blocks using the custom NumPy-based layers defined above.


In [10]:
class AlexNet(nn.Module):
    """
    Custom AlexNet implementation using NumPy for educational inference.

    Replicates the original AlexNet architecture with all major layers
    (convolution, pooling, linear, activation) implemented from scratch
    operating on NumPy arrays. Pretrained weights are loaded from the
    torchvision reference model.
    """

    def __init__(self, num_classes: int = 1000) -> None:
        super().__init__()
        self.features = nn.Sequential(
            CustomConv2d(3, 64, kernel_size=11, stride=4, padding=2,
                         weight_loc='features.0.weight', bias_loc='features.0.bias'),
            CustomReLU(),
            CustomMaxPool2d(kernel_size=3, stride=2),
            CustomConv2d(64, 192, kernel_size=5, padding=2,
                         weight_loc='features.3.weight', bias_loc='features.3.bias'),
            CustomReLU(),
            CustomMaxPool2d(kernel_size=3, stride=2),
            CustomConv2d(192, 384, kernel_size=3, padding=1,
                         weight_loc='features.6.weight', bias_loc='features.6.bias'),
            CustomReLU(),
            CustomConv2d(384, 256, kernel_size=3, padding=1,
                         weight_loc='features.8.weight', bias_loc='features.8.bias'),
            CustomReLU(),
            CustomConv2d(256, 256, kernel_size=3, padding=1,
                         weight_loc='features.10.weight', bias_loc='features.10.bias'),
            CustomReLU(),
            CustomMaxPool2d(kernel_size=3, stride=2),
        )
        self.avgpool = CustomAdaptiveAvgPool2d((6, 6))
        self.classifier = nn.Sequential(
            EinopsLinear(256 * 6 * 6, 4096, weight_loc='classifier.1.weight',
                         bias_loc='classifier.1.bias'),
            CustomReLU(),
            EinopsLinear(4096, 4096, weight_loc='classifier.4.weight',
                         bias_loc='classifier.4.bias'),
            CustomReLU(),
            EinopsLinear(4096, num_classes, weight_loc='classifier.6.weight',
                         bias_loc='classifier.6.bias'),
        )

    @profile_forward("AlexNet.forward")
    def forward(self, x: np.ndarray) -> np.ndarray:
        """Forward pass.
        Args:
            x (np.ndarray): (batch, 3, 224, 224)
        Returns:
            np.ndarray: (batch, num_classes) class scores.
        """
        x = self.features(x)
        x = self.avgpool(x)
        b = x.shape[0]
        x = x.reshape(b, -1)  # flatten
        x = self.classifier(x)
        return x

#### Inference

Run a forward pass over the validation set, accumulating accuracy using NumPy operations.


#### Profiling Usage

All forward methods are wrapped with a lightweight timing decorator storing per-call events in `PROFILE_EVENTS`.

After running inference you can view an aggregate table:

```python
profile_summary()                 # default sort by total time
profile_summary(sort="mean")      # sort by mean per call
profile_summary(sort="max")       # highlight worst single call
profile_summary(top_n=5)          # show only top 5 layers
```

Columns:

- Total(ms): cumulative time across calls
- Mean(ms): average per invocation
- Max(ms): slowest single invocation
- Calls: number of times that layer's forward executed

To reset profiling data between experiments:

```python
PROFILE_EVENTS.clear()
```


In [11]:
model = AlexNet()
model.eval()

total = 0
correct = 0

# Estimate total batches for progress bar length
try:
    total_batches = len(val_dataloader)
except TypeError:
    total_batches = None

for images, labels in tqdm(val_dataloader, total=total_batches, desc="Evaluating", leave=True):
    # images, labels already NumPy from collate
    outputs = model.forward(images)  # (b, num_classes) np.ndarray
    predicted = np.argmax(outputs, axis=1)
    total += int(labels.shape[0])
    correct += int((predicted == labels).sum())
    running_acc = correct / total if total else 0.0
    tqdm.write(f"Running Acc: {running_acc:.4f}")

accuracy = correct / total if total else 0.0
print(f"Validation Accuracy: {accuracy:.4f}")

Evaluating:   1%|          | 1/196 [00:15<50:17, 15.48s/it]

Running Acc: 0.6836


Evaluating:   1%|          | 2/196 [00:20<29:14,  9.04s/it]

Running Acc: 0.7109


Evaluating:   2%|▏         | 3/196 [00:24<22:10,  6.89s/it]

Running Acc: 0.7396


Evaluating:   2%|▏         | 4/196 [00:28<18:28,  5.77s/it]

Running Acc: 0.7500


Evaluating:   3%|▎         | 5/196 [00:32<16:23,  5.15s/it]

Running Acc: 0.7523


Evaluating:   3%|▎         | 6/196 [00:36<15:13,  4.81s/it]

Running Acc: 0.7428


Evaluating:   4%|▎         | 7/196 [00:40<14:20,  4.55s/it]

Running Acc: 0.7093


Evaluating:   4%|▍         | 8/196 [00:44<13:46,  4.40s/it]

Running Acc: 0.6816


Evaluating:   5%|▍         | 9/196 [00:48<13:28,  4.33s/it]

Running Acc: 0.6693


Evaluating:   5%|▌         | 10/196 [00:52<13:10,  4.25s/it]

Running Acc: 0.6559


Evaluating:   6%|▌         | 11/196 [00:57<12:55,  4.19s/it]

Running Acc: 0.6456


Evaluating:   6%|▌         | 12/196 [01:01<12:44,  4.16s/it]

Running Acc: 0.6283


Evaluating:   7%|▋         | 13/196 [01:06<13:28,  4.42s/it]

Running Acc: 0.6145


Evaluating:   7%|▋         | 14/196 [01:10<13:23,  4.41s/it]

Running Acc: 0.6119


Evaluating:   8%|▊         | 15/196 [01:14<13:06,  4.34s/it]

Running Acc: 0.6122


Evaluating:   8%|▊         | 16/196 [01:18<12:45,  4.25s/it]

Running Acc: 0.6155


Evaluating:   9%|▊         | 17/196 [01:22<12:27,  4.18s/it]

Running Acc: 0.6234


Evaluating:   9%|▉         | 18/196 [01:26<12:19,  4.15s/it]

Running Acc: 0.6369


Evaluating:  10%|▉         | 19/196 [01:30<12:05,  4.10s/it]

Running Acc: 0.6468


Evaluating:  10%|█         | 20/196 [01:34<11:58,  4.08s/it]

Running Acc: 0.6520


Evaluating:  11%|█         | 21/196 [01:39<12:03,  4.13s/it]

Running Acc: 0.6529


Evaluating:  11%|█         | 22/196 [01:43<12:03,  4.16s/it]

Running Acc: 0.6529


Evaluating:  12%|█▏        | 23/196 [01:47<11:59,  4.16s/it]

Running Acc: 0.6510


Evaluating:  12%|█▏        | 24/196 [01:51<11:49,  4.12s/it]

Running Acc: 0.6481


Evaluating:  13%|█▎        | 25/196 [01:56<12:46,  4.48s/it]

Running Acc: 0.6430


Evaluating:  13%|█▎        | 26/196 [02:01<12:51,  4.54s/it]

Running Acc: 0.6487


Evaluating:  14%|█▍        | 27/196 [02:05<12:36,  4.48s/it]

Running Acc: 0.6534


Evaluating:  14%|█▍        | 28/196 [02:10<12:21,  4.42s/it]

Running Acc: 0.6575


Evaluating:  15%|█▍        | 29/196 [02:14<12:07,  4.35s/it]

Running Acc: 0.6630


Evaluating:  15%|█▌        | 30/196 [02:18<12:01,  4.35s/it]

Running Acc: 0.6629


Evaluating:  16%|█▌        | 31/196 [02:22<11:38,  4.24s/it]

Running Acc: 0.6618


Evaluating:  16%|█▋        | 32/196 [02:26<11:23,  4.17s/it]

Running Acc: 0.6591


Evaluating:  17%|█▋        | 33/196 [02:30<11:12,  4.13s/it]

Running Acc: 0.6525


Evaluating:  17%|█▋        | 34/196 [02:34<11:04,  4.10s/it]

Running Acc: 0.6495


Evaluating:  18%|█▊        | 35/196 [02:38<10:58,  4.09s/it]

Running Acc: 0.6482


Evaluating:  18%|█▊        | 36/196 [02:42<10:48,  4.05s/it]

Running Acc: 0.6452


Evaluating:  19%|█▉        | 37/196 [02:47<11:22,  4.29s/it]

Running Acc: 0.6426


Evaluating:  19%|█▉        | 38/196 [02:51<11:12,  4.26s/it]

Running Acc: 0.6405


Evaluating:  20%|█▉        | 39/196 [02:55<10:55,  4.18s/it]

Running Acc: 0.6402


Evaluating:  20%|██        | 40/196 [02:59<10:41,  4.11s/it]

Running Acc: 0.6388


Evaluating:  21%|██        | 41/196 [03:03<10:32,  4.08s/it]

Running Acc: 0.6373


Evaluating:  21%|██▏       | 42/196 [03:07<10:31,  4.10s/it]

Running Acc: 0.6356


Evaluating:  22%|██▏       | 43/196 [03:12<10:34,  4.15s/it]

Running Acc: 0.6370


Evaluating:  22%|██▏       | 44/196 [03:16<10:38,  4.20s/it]

Running Acc: 0.6355


Evaluating:  23%|██▎       | 45/196 [03:20<10:36,  4.21s/it]

Running Acc: 0.6346


Evaluating:  23%|██▎       | 46/196 [03:24<10:27,  4.19s/it]

Running Acc: 0.6332


Evaluating:  24%|██▍       | 47/196 [03:28<10:23,  4.18s/it]

Running Acc: 0.6324


Evaluating:  24%|██▍       | 48/196 [03:33<10:21,  4.20s/it]

Running Acc: 0.6315


Evaluating:  25%|██▌       | 49/196 [03:38<10:58,  4.48s/it]

Running Acc: 0.6290


Evaluating:  26%|██▌       | 50/196 [03:42<10:48,  4.44s/it]

Running Acc: 0.6312


Evaluating:  26%|██▌       | 51/196 [03:46<10:35,  4.38s/it]

Running Acc: 0.6329


Evaluating:  27%|██▋       | 52/196 [03:51<10:27,  4.36s/it]

Running Acc: 0.6323


Evaluating:  27%|██▋       | 53/196 [03:55<10:19,  4.33s/it]

Running Acc: 0.6309


Evaluating:  28%|██▊       | 54/196 [03:59<10:17,  4.35s/it]

Running Acc: 0.6314


Evaluating:  28%|██▊       | 55/196 [04:04<10:07,  4.31s/it]

Running Acc: 0.6321


Evaluating:  29%|██▊       | 56/196 [04:08<09:57,  4.27s/it]

Running Acc: 0.6309


Evaluating:  29%|██▉       | 57/196 [04:12<09:51,  4.26s/it]

Running Acc: 0.6321


Evaluating:  30%|██▉       | 58/196 [04:16<09:45,  4.25s/it]

Running Acc: 0.6342


Evaluating:  30%|███       | 59/196 [04:21<09:42,  4.25s/it]

Running Acc: 0.6356


Evaluating:  31%|███       | 60/196 [04:25<09:37,  4.25s/it]

Running Acc: 0.6348


Evaluating:  31%|███       | 61/196 [04:30<10:07,  4.50s/it]

Running Acc: 0.6334


Evaluating:  32%|███▏      | 62/196 [04:34<09:59,  4.47s/it]

Running Acc: 0.6323


Evaluating:  32%|███▏      | 63/196 [04:38<09:43,  4.39s/it]

Running Acc: 0.6342


Evaluating:  33%|███▎      | 64/196 [04:43<09:31,  4.33s/it]

Running Acc: 0.6370


Evaluating:  33%|███▎      | 65/196 [04:47<09:22,  4.30s/it]

Running Acc: 0.6373


Evaluating:  34%|███▎      | 66/196 [04:51<09:17,  4.29s/it]

Running Acc: 0.6387


Evaluating:  34%|███▍      | 67/196 [04:55<09:11,  4.28s/it]

Running Acc: 0.6393


Evaluating:  35%|███▍      | 68/196 [05:00<09:08,  4.28s/it]

Running Acc: 0.6396


Evaluating:  35%|███▌      | 69/196 [05:04<09:03,  4.28s/it]

Running Acc: 0.6399


Evaluating:  36%|███▌      | 70/196 [05:08<08:56,  4.25s/it]

Running Acc: 0.6385


Evaluating:  36%|███▌      | 71/196 [05:12<08:51,  4.25s/it]

Running Acc: 0.6380


Evaluating:  37%|███▋      | 72/196 [05:17<08:45,  4.24s/it]

Running Acc: 0.6390


Evaluating:  37%|███▋      | 73/196 [05:22<09:14,  4.51s/it]

Running Acc: 0.6384


Evaluating:  38%|███▊      | 74/196 [05:26<09:05,  4.47s/it]

Running Acc: 0.6388


Evaluating:  38%|███▊      | 75/196 [05:30<08:51,  4.39s/it]

Running Acc: 0.6373


Evaluating:  39%|███▉      | 76/196 [05:35<08:40,  4.33s/it]

Running Acc: 0.6383


Evaluating:  39%|███▉      | 77/196 [05:39<08:32,  4.31s/it]

Running Acc: 0.6385


Evaluating:  40%|███▉      | 78/196 [05:43<08:27,  4.30s/it]

Running Acc: 0.6382


Evaluating:  40%|████      | 79/196 [05:47<08:22,  4.29s/it]

Running Acc: 0.6373


Evaluating:  41%|████      | 80/196 [05:51<08:13,  4.25s/it]

Running Acc: 0.6372


Evaluating:  41%|████▏     | 81/196 [05:56<08:08,  4.25s/it]

Running Acc: 0.6344


Evaluating:  42%|████▏     | 82/196 [06:00<07:59,  4.21s/it]

Running Acc: 0.6326


Evaluating:  42%|████▏     | 83/196 [06:04<07:50,  4.16s/it]

Running Acc: 0.6308


Evaluating:  43%|████▎     | 84/196 [06:08<07:46,  4.16s/it]

Running Acc: 0.6305


Evaluating:  43%|████▎     | 85/196 [06:13<08:13,  4.44s/it]

Running Acc: 0.6291


Evaluating:  44%|████▍     | 86/196 [06:18<08:07,  4.43s/it]

Running Acc: 0.6269


Evaluating:  44%|████▍     | 87/196 [06:22<07:56,  4.38s/it]

Running Acc: 0.6261


Evaluating:  45%|████▍     | 88/196 [06:26<07:46,  4.32s/it]

Running Acc: 0.6245


Evaluating:  45%|████▌     | 89/196 [06:30<07:33,  4.24s/it]

Running Acc: 0.6230


Evaluating:  46%|████▌     | 90/196 [06:34<07:27,  4.22s/it]

Running Acc: 0.6212


Evaluating:  46%|████▋     | 91/196 [06:38<07:17,  4.17s/it]

Running Acc: 0.6182


Evaluating:  47%|████▋     | 92/196 [06:42<07:13,  4.16s/it]

Running Acc: 0.6174


Evaluating:  47%|████▋     | 93/196 [06:47<07:06,  4.15s/it]

Running Acc: 0.6178


Evaluating:  48%|████▊     | 94/196 [06:51<07:01,  4.13s/it]

Running Acc: 0.6157


Evaluating:  48%|████▊     | 95/196 [06:55<06:55,  4.11s/it]

Running Acc: 0.6141


Evaluating:  49%|████▉     | 96/196 [06:59<06:50,  4.11s/it]

Running Acc: 0.6128


Evaluating:  49%|████▉     | 97/196 [07:04<07:14,  4.39s/it]

Running Acc: 0.6110


Evaluating:  50%|█████     | 98/196 [07:08<07:11,  4.41s/it]

Running Acc: 0.6097


Evaluating:  51%|█████     | 99/196 [07:13<07:04,  4.37s/it]

Running Acc: 0.6069


Evaluating:  51%|█████     | 100/196 [07:17<06:56,  4.33s/it]

Running Acc: 0.6058


Evaluating:  52%|█████▏    | 101/196 [07:21<06:47,  4.29s/it]

Running Acc: 0.6035


Evaluating:  52%|█████▏    | 102/196 [07:25<06:41,  4.27s/it]

Running Acc: 0.6032


Evaluating:  53%|█████▎    | 103/196 [07:29<06:32,  4.22s/it]

Running Acc: 0.6016


Evaluating:  53%|█████▎    | 104/196 [07:34<06:27,  4.21s/it]

Running Acc: 0.6006


Evaluating:  54%|█████▎    | 105/196 [07:38<06:23,  4.21s/it]

Running Acc: 0.6006


Evaluating:  54%|█████▍    | 106/196 [07:42<06:17,  4.19s/it]

Running Acc: 0.5994


Evaluating:  55%|█████▍    | 107/196 [07:46<06:11,  4.17s/it]

Running Acc: 0.5986


Evaluating:  55%|█████▌    | 108/196 [07:50<06:04,  4.14s/it]

Running Acc: 0.5987


Evaluating:  56%|█████▌    | 109/196 [07:55<06:20,  4.37s/it]

Running Acc: 0.5985


Evaluating:  56%|█████▌    | 110/196 [07:59<06:13,  4.34s/it]

Running Acc: 0.5980


Evaluating:  57%|█████▋    | 111/196 [08:03<06:03,  4.28s/it]

Running Acc: 0.5982


Evaluating:  57%|█████▋    | 112/196 [08:08<05:55,  4.24s/it]

Running Acc: 0.5984


Evaluating:  58%|█████▊    | 113/196 [08:12<05:49,  4.21s/it]

Running Acc: 0.5988


Evaluating:  58%|█████▊    | 114/196 [08:16<05:45,  4.21s/it]

Running Acc: 0.5986


Evaluating:  59%|█████▊    | 115/196 [08:20<05:38,  4.18s/it]

Running Acc: 0.5964


Evaluating:  59%|█████▉    | 116/196 [08:24<05:31,  4.15s/it]

Running Acc: 0.5952


Evaluating:  60%|█████▉    | 117/196 [08:28<05:26,  4.13s/it]

Running Acc: 0.5945


Evaluating:  60%|██████    | 118/196 [08:32<05:21,  4.12s/it]

Running Acc: 0.5932


Evaluating:  61%|██████    | 119/196 [08:36<05:17,  4.13s/it]

Running Acc: 0.5938


Evaluating:  61%|██████    | 120/196 [08:41<05:13,  4.12s/it]

Running Acc: 0.5942


Evaluating:  62%|██████▏   | 121/196 [08:45<05:27,  4.36s/it]

Running Acc: 0.5928


Evaluating:  62%|██████▏   | 122/196 [08:50<05:20,  4.33s/it]

Running Acc: 0.5908


Evaluating:  63%|██████▎   | 123/196 [08:54<05:09,  4.24s/it]

Running Acc: 0.5914


Evaluating:  63%|██████▎   | 124/196 [08:58<05:01,  4.19s/it]

Running Acc: 0.5900


Evaluating:  64%|██████▍   | 125/196 [09:02<04:55,  4.16s/it]

Running Acc: 0.5885


Evaluating:  64%|██████▍   | 126/196 [09:06<04:51,  4.17s/it]

Running Acc: 0.5883


Evaluating:  65%|██████▍   | 127/196 [09:10<04:46,  4.16s/it]

Running Acc: 0.5884


Evaluating:  65%|██████▌   | 128/196 [09:14<04:41,  4.14s/it]

Running Acc: 0.5869


Evaluating:  66%|██████▌   | 129/196 [09:18<04:37,  4.14s/it]

Running Acc: 0.5858


Evaluating:  66%|██████▋   | 130/196 [09:23<04:32,  4.12s/it]

Running Acc: 0.5849


Evaluating:  67%|██████▋   | 131/196 [09:27<04:27,  4.11s/it]

Running Acc: 0.5848


Evaluating:  67%|██████▋   | 132/196 [09:31<04:22,  4.10s/it]

Running Acc: 0.5840


Evaluating:  68%|██████▊   | 133/196 [09:36<04:34,  4.36s/it]

Running Acc: 0.5830


Evaluating:  68%|██████▊   | 134/196 [09:40<04:29,  4.35s/it]

Running Acc: 0.5831


Evaluating:  69%|██████▉   | 135/196 [09:44<04:21,  4.29s/it]

Running Acc: 0.5825


Evaluating:  69%|██████▉   | 136/196 [09:48<04:14,  4.24s/it]

Running Acc: 0.5819


Evaluating:  70%|██████▉   | 137/196 [09:52<04:06,  4.18s/it]

Running Acc: 0.5815


Evaluating:  70%|███████   | 138/196 [09:57<04:03,  4.20s/it]

Running Acc: 0.5812


Evaluating:  71%|███████   | 139/196 [10:01<03:58,  4.18s/it]

Running Acc: 0.5801


Evaluating:  71%|███████▏  | 140/196 [10:05<03:53,  4.18s/it]

Running Acc: 0.5804


Evaluating:  72%|███████▏  | 141/196 [10:09<03:48,  4.15s/it]

Running Acc: 0.5796


Evaluating:  72%|███████▏  | 142/196 [10:13<03:44,  4.16s/it]

Running Acc: 0.5798


Evaluating:  73%|███████▎  | 143/196 [10:17<03:39,  4.15s/it]

Running Acc: 0.5788


Evaluating:  73%|███████▎  | 144/196 [10:21<03:35,  4.14s/it]

Running Acc: 0.5786


Evaluating:  74%|███████▍  | 145/196 [10:26<03:44,  4.39s/it]

Running Acc: 0.5779


Evaluating:  74%|███████▍  | 146/196 [10:31<03:38,  4.36s/it]

Running Acc: 0.5773


Evaluating:  75%|███████▌  | 147/196 [10:35<03:31,  4.31s/it]

Running Acc: 0.5761


Evaluating:  76%|███████▌  | 148/196 [10:39<03:23,  4.24s/it]

Running Acc: 0.5762


Evaluating:  76%|███████▌  | 149/196 [10:43<03:17,  4.20s/it]

Running Acc: 0.5759


Evaluating:  77%|███████▋  | 150/196 [10:47<03:13,  4.21s/it]

Running Acc: 0.5751


Evaluating:  77%|███████▋  | 151/196 [10:51<03:08,  4.18s/it]

Running Acc: 0.5748


Evaluating:  78%|███████▊  | 152/196 [10:55<03:02,  4.15s/it]

Running Acc: 0.5737


Evaluating:  78%|███████▊  | 153/196 [11:00<02:58,  4.14s/it]

Running Acc: 0.5737


Evaluating:  79%|███████▊  | 154/196 [11:04<02:53,  4.12s/it]

Running Acc: 0.5726


Evaluating:  79%|███████▉  | 155/196 [11:08<02:48,  4.11s/it]

Running Acc: 0.5719


Evaluating:  80%|███████▉  | 156/196 [11:12<02:44,  4.10s/it]

Running Acc: 0.5712


Evaluating:  80%|████████  | 157/196 [11:17<02:50,  4.36s/it]

Running Acc: 0.5720


Evaluating:  81%|████████  | 158/196 [11:21<02:44,  4.34s/it]

Running Acc: 0.5714


Evaluating:  81%|████████  | 159/196 [11:25<02:38,  4.28s/it]

Running Acc: 0.5700


Evaluating:  82%|████████▏ | 160/196 [11:29<02:32,  4.23s/it]

Running Acc: 0.5697


Evaluating:  82%|████████▏ | 161/196 [11:33<02:26,  4.18s/it]

Running Acc: 0.5697


Evaluating:  83%|████████▎ | 162/196 [11:38<02:22,  4.18s/it]

Running Acc: 0.5687


Evaluating:  83%|████████▎ | 163/196 [11:42<02:17,  4.15s/it]

Running Acc: 0.5683


Evaluating:  84%|████████▎ | 164/196 [11:46<02:12,  4.13s/it]

Running Acc: 0.5665


Evaluating:  84%|████████▍ | 165/196 [11:50<02:07,  4.12s/it]

Running Acc: 0.5659


Evaluating:  85%|████████▍ | 166/196 [11:54<02:03,  4.11s/it]

Running Acc: 0.5649


Evaluating:  85%|████████▌ | 167/196 [11:58<01:59,  4.13s/it]

Running Acc: 0.5654


Evaluating:  86%|████████▌ | 168/196 [12:02<01:55,  4.12s/it]

Running Acc: 0.5652


Evaluating:  86%|████████▌ | 169/196 [12:07<01:58,  4.38s/it]

Running Acc: 0.5650


Evaluating:  87%|████████▋ | 170/196 [12:11<01:53,  4.35s/it]

Running Acc: 0.5643


Evaluating:  87%|████████▋ | 171/196 [12:16<01:46,  4.27s/it]

Running Acc: 0.5650


Evaluating:  88%|████████▊ | 172/196 [12:20<01:41,  4.23s/it]

Running Acc: 0.5642


Evaluating:  88%|████████▊ | 173/196 [12:24<01:36,  4.20s/it]

Running Acc: 0.5634


Evaluating:  89%|████████▉ | 174/196 [12:28<01:32,  4.21s/it]

Running Acc: 0.5636


Evaluating:  89%|████████▉ | 175/196 [12:32<01:27,  4.19s/it]

Running Acc: 0.5632


Evaluating:  90%|████████▉ | 176/196 [12:36<01:23,  4.17s/it]

Running Acc: 0.5627


Evaluating:  90%|█████████ | 177/196 [12:40<01:18,  4.15s/it]

Running Acc: 0.5621


Evaluating:  91%|█████████ | 178/196 [12:45<01:14,  4.14s/it]

Running Acc: 0.5609


Evaluating:  91%|█████████▏| 179/196 [12:49<01:10,  4.13s/it]

Running Acc: 0.5613


Evaluating:  92%|█████████▏| 180/196 [12:53<01:05,  4.11s/it]

Running Acc: 0.5618


Evaluating:  92%|█████████▏| 181/196 [12:58<01:05,  4.36s/it]

Running Acc: 0.5614


Evaluating:  93%|█████████▎| 182/196 [13:02<01:00,  4.35s/it]

Running Acc: 0.5612


Evaluating:  93%|█████████▎| 183/196 [13:06<00:55,  4.28s/it]

Running Acc: 0.5615


Evaluating:  94%|█████████▍| 184/196 [13:10<00:50,  4.22s/it]

Running Acc: 0.5622


Evaluating:  94%|█████████▍| 185/196 [13:14<00:45,  4.18s/it]

Running Acc: 0.5626


Evaluating:  95%|█████████▍| 186/196 [13:18<00:41,  4.19s/it]

Running Acc: 0.5629


Evaluating:  95%|█████████▌| 187/196 [13:23<00:37,  4.17s/it]

Running Acc: 0.5638


Evaluating:  96%|█████████▌| 188/196 [13:27<00:33,  4.15s/it]

Running Acc: 0.5638


Evaluating:  96%|█████████▋| 189/196 [13:31<00:28,  4.12s/it]

Running Acc: 0.5636


Evaluating:  97%|█████████▋| 190/196 [13:35<00:24,  4.12s/it]

Running Acc: 0.5627


Evaluating:  97%|█████████▋| 191/196 [13:39<00:20,  4.11s/it]

Running Acc: 0.5626


Evaluating:  98%|█████████▊| 192/196 [13:43<00:16,  4.10s/it]

Running Acc: 0.5628


Evaluating:  98%|█████████▊| 193/196 [13:47<00:12,  4.18s/it]

Running Acc: 0.5640


Evaluating:  99%|█████████▉| 194/196 [13:52<00:08,  4.18s/it]

Running Acc: 0.5653


Evaluating:  99%|█████████▉| 195/196 [13:56<00:04,  4.16s/it]

Running Acc: 0.5661


Evaluating: 100%|██████████| 196/196 [13:57<00:00,  4.27s/it]

Running Acc: 0.5656
Validation Accuracy: 0.5656





In [None]:
profile_summary()
# all in all it took around 14 mins on my local machine

Layer                  Calls    Total(ms)     Mean(ms)      Max(ms)
-------------------------------------------------------------------
AlexNet.forward          196   808636.946     4125.699     5371.612
CustomConv2d             980   390976.126      398.955     1112.151
EinopsLinear             588   389103.508      661.741     1690.173
CustomMaxPool2d          588    15172.414       25.803       83.213
CustomReLU              1372    12040.950        8.776       65.341
CustomAdaptiveAvgPool2d   196      733.569        3.743        5.838
-------------------------------------------------------------------
Total profiled time: 1616663.51 ms across 3920 calls
