# Prototypical neural networks

Prototypical networks are a type of neural network proposed for the task of few-shot learning i.e. the task of learning to classify examples from classes not seen during training after being given only a few examples of the new classes.

````{prf:observation} Squared Euclidean distance is a regular Bregman divergence
:class: dropdown

The squared Euclidean distance of two vectors $\vec{x}, \vec{y} \in \mathbb{R}^N$ is given by

$$
\begin{align}
||\vec{x}-\vec{y}||^2 &= \left(\sqrt{ (x_1-y_1)^2 + (x_2-y_2)^2 + \dots + (x_N-y_N)^2 }\right)^2  \\
&= (x_1-y_1)^2 + (x_2-y_2)^2 + \dots + (x_N-y_N)^2 \\
&= (x_1^2-2x_1y_1 + y_1^2) + (x_2^2-2x_2y_2 + y_2^2) + \dots +  (x_N^2-2x_Ny_N + y_N^2) \\
\end{align}
$$

Letting $\varphi(\vec{y})=\sum_{i=1}^{N}y_i^2$, which is clearly differentiable and strictly convex as it is the sum of strictly convex differentiable functions, we have 

$$
\nabla\varphi(\vec{y})=[2y_1,...,2y_N].
$$ 

And of course 

$$
(\vec{x}-\vec{y})=[x_1-y_1,...,x_N-y_N]
$$

Plugging all of this into Eq. 3 from the paper we have

$$
\begin{align}
d_\varphi(\vec{x}, \vec{y}) &= \varphi(\vec{x})-\varphi(\vec{y})-(\vec{x}-\vec{y})^T\nabla\varphi(\vec{y}) \\
&= \sum_{i=1}^{N}x_i^2 - \sum_{i=1}^{N}y_i^2 - [x_1-y_1,...,x_N-y_N]^T[2y_1,...,2y_N]\\
&= \sum_{i=1}^{N}x_i^2 - \sum_{i=1}^{N}y_i^2 - \sum_{i=1}^{N} (2x_iy_i - 2y_i^2) \\
&= \sum_{i=1}^{N}x_i^2 - \sum_{i=1}^{N}y_i^2 - \sum_{i=1}^{N} 2x_iy_i + \sum_{i=1}^{N} 2y_i^2 \\
&= \sum_{i=1}^{N}x_i^2 + \sum_{i=1}^{N}y_i^2 - \sum_{i=1}^{N} 2x_iy_i \\
&= (x_1^2-2x_1y_1 + y_1^2) + (x_2^2-2x_2y_2 + y_2^2) + \dots +  (x_N^2-2x_Ny_N + y_N^2) \\ 
&= ||\vec{x}-\vec{y}||^2 \quad \blacksquare
\end{align}

$$

We have thus proven that the squared Euclidean distance is in fact a regular Bregman divergence.
````

In [1]:
import optuna
import torch
from torch import nn, optim, Tensor 

In [None]:
class PrototypicalNetwork(nn.Module):
    def __init__(self, input_dim: int, hidden_dim: int, output_dim: int):
        super().__init__()

        self.encoder = nn.Sequential(
            self.convolutional_block(input_dim, hidden_dim),
            self.convolutional_block(hidden_dim, hidden_dim),
            self.convolutional_block(hidden_dim, hidden_dim),
            self.convolutional_block(hidden_dim, output_dim),
        )

    def forward(self, input: Tensor) -> Tensor:
        output = self.encoder(input)
        
        return output
    
    @staticmethod
    def convolutional_block(self, in_channels: int, out_channels: int = 64) -> nn.Module:
        '''
        Returns a block conv-bn-relu-maxpool layer a described in the paper.
        '''
        return nn.Sequential(
            # in the paper 64 out_channels (feature maps) were used with a kernel of size 3x3
            nn.Conv2d(in_channels, out_channels, kernel_size=3, padding=1),
            nn.BatchNorm2d(out_channels),
            nn.ReLU(),
            # in the paper a 2x2 max pooling layer was used
            nn.MaxPool2d(2)
        )

In [None]:
class PrototypicalNetworkTrainer:
    def __init__(self, learning_rate: float):
        self.learning_rate = learning_rate

    def train() -> None:
        pass

In [22]:
test_input = torch.rand((2, 3, 70, 70))
proto_net = PrototypicalNetwork(input_dim=3, hidden_dim=64, output_dim=64)

out = proto_net(test_input)

print(out[0].shape)

print(out[0].view(64, -1).shape)


torch.Size([64, 4, 4])
torch.Size([64, 16])


In [2]:
%load_ext watermark
%watermark -n -u -v -iv

Last updated: Sun Mar 02 2025

Python implementation: CPython
Python version       : 3.10.12
IPython version      : 8.22.2

optuna: 4.2.1

