<a href="https://colab.research.google.com/github/AlehciM00/DLAI-2022/blob/main/DellOmo_project_GCNN_second_part.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Task 3: second part


This section aims to extend the concept of group equivariant convolution to steerable convolution, according to these [reference](https://uvadlc-notebooks.readthedocs.io/en/latest/tutorial_notebooks/DL2/Geometric_deep_learning/tutorial2_steerable_cnns.html) and [this reference](https://github.com/QUVA-Lab/e2cnn_experiments), both of which were inspired by this [paper](https://arxiv.org/abs/1612.08498).

# Desclamer
The 'escnn' library, which is used to construct the steerable, is defined in this [repository](https://github.com/QUVA-Lab/escnn). As you can see, there have been some changes. When you install '>pip install escnn', a package called 'lie_learn.representations.SO3.irrep_bases'[here](https://github.com/QUVA-Lab/escnn/issues/24) is included, which does not work with Python versions other than 3.8.10. However, due to a recent Python update on Colab, the current version is now 3.9.16, which causes a conflict with the above package. For this reason, inspired by this [article](https://medium.com/google-colab/colab-updated-to-python-3-9-2593f8b1eb79), I will conduct a brief study in this section to avoid conflicts between the Python versions of the first and second parts of the project. Additionally, another technical issue that led me to divide the notebooks is that I have to restart the run to load an older version of Python.






In [None]:
#@title visualizziamo la versione #Python 3.9.16
!python --version

Python 3.9.16


Trick : cmd+alt+p and select 'usa versione di riserva dell'ambiente di runtime'

In [None]:
#@title Run da qui post trick #Python 3.8.10
!python --version

Python 3.8.10


# Funzioni utili e pacchetti

In [None]:
#@title Pacchetti
!pip install scipy torch torchvision numpy




Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [None]:
#@title escnn
!pip install escnn


#!pip uninstall escnn 
#pip show escnn

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting escnn
  Downloading escnn-1.0.7-py3-none-any.whl (364 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m364.6/364.6 KB[0m [31m7.5 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting py3nj
  Downloading py3nj-0.1.2.tar.gz (28 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting pymanopt
  Downloading pymanopt-2.1.1-py3-none-any.whl (69 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m69.1/69.1 KB[0m [31m9.8 MB/s[0m eta [36m0:00:00[0m
Collecting lie-learn
  Downloading lie_learn-0.0.1.post1-cp38-cp38-manylinux1_x86_64.whl (16.6 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m16.6/16.6 MB[0m [31m69.5 MB/s[0m eta [36m0:00:00[0m
Building wheels for collected packages: py3nj
  Building wheel for py3nj (setup.py) ... [?25l[?25hdone
  Created wheel for py3nj: filename=py3nj-0.1.2-cp38-cp38-linux_x86_64.whl 

In [None]:
!pip install e2cnn

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting e2cnn
  Downloading e2cnn-0.2.3-py3-none-any.whl (225 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m225.3/225.3 KB[0m [31m6.1 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: e2cnn
Successfully installed e2cnn-0.2.3


In [None]:
#@title importo

import matplotlib
%matplotlib inline
import matplotlib.pyplot as plt 

import numpy as np 
import pandas as pd 
import random 

import torch
import torchvision
from torchvision import transforms

import torch.optim as optim
import torch.nn.functional as F

from torch.nn.modules.flatten import Flatten
from torch.nn.modules.conv import ConvTranspose3d
from torch.nn.modules.pooling import MaxPool3d
from torch.utils.data import Dataset, DataLoader,random_split
from torchvision.datasets import MNIST, CIFAR10


from torch.utils.data import Dataset
from torchvision.transforms import RandomRotation
from torchvision.transforms import Pad
from torchvision.transforms import Resize
from torchvision.transforms import ToTensor
from torchvision.transforms import Compose
from tqdm.auto import tqdm

from PIL import Image
import os

device = 'cuda' if torch.cuda.is_available() else 'cpu'



In [None]:
#@title escnn/e2cnn
import escnn
from escnn import group
from escnn import gspaces
from escnn import nn

import e2cnn      
  


# Information on steerable 

All the informations can be found exatly in [escnn library](https://quva-lab.github.io/escnn/api/escnn.nn.html), in [this paper](https://arxiv.org/abs/1612.08498), in [this web page](https://uvadlc-notebooks.readthedocs.io/en/latest/tutorial_notebooks/DL2/Geometric_deep_learning/tutorial2_steerable_cnns.html) and in [this](https://github.com/QUVA-Lab/escnn).

CNNs produce state-of-the-art results in classification tasks, although they enforce equivariance only to small groups of transformations, such as multiple 90-degree rotations. Learning representations that are equivariant to larger groups is likely to lead to further gains, but the computational cost of current methods increases with the size of the group, rendering this impractical. Therefore, the idea of introducing a general theory of steerability representations stems from the possibility of increasing the flexibility of equivariant CNNs.


Below, we present some key concepts of steerability theory, without going into specific details that are beyond the scope of the project



1. **Fourier Trasform**:
To achive the result of steerable we need to introduce the fourier trasform to approach the harmonic analysis of function over a group $G$. Note that a representation $ρ: G→ ℜ^{dxd}$ can be interpreted as a collection of $d^2$ functions over $G$, one for each matrix entry of $ρ$. The **[Peter-Weyl theorem](https://en.wikipedia.org/wiki/Peter%E2%80%93Weyl_theorem)** states that the collection of functions in the matrix entries of all irreps $Ĝ$ of a group $G$ spans the space of all (square-integrable) functions over $G$.This result gives us a way to parameterize functions over the group. This is the focus of this section. In particular, this is useful to parameterize functions over groups with infinite elements. From the Fourier trasform we can link the regular representation.


2. **Regular representation**: it's particular important, because it describes the action of a group $G$ on the vector space of a functions over the group $G$. Assume for the moment that the group $G$ is finite: i.e $|G|<∞$. So the set of function over the group, is equivalent to the vector space $ℜ^{|G|}$, so we can interpret a vector of $ℜ^{|G|}$ as a function over $G$ there the $i-th$ entry of $f$ is interpreted as the valu of the function on the $i-th$ element $g_i ∈ G$. So in a GCNN, a feature map $f$ is stored as a multi-dimensional array with an axis for each of the $n$ spatial dimensions and one for the group $G$; but in a steerable CNN, we replace the $G$ axis with a "Fourier" axis, which contains $c$ Fourier coefficients used to parameterize a function over $G$, as described in the previous section.
 The representation of $G$ on these $c$ coefficients is: $\rho: G \to \mathbb{R}^{c \times c}$.
The result is equivalent to a standard GCNN if $G$ is finite (and we have $c = |G|$), but we can now also use infinite $G$, such as $SO(2)$. A feature map $f$ can now be interpreted as a vector field on the space $\mathbb{R}^n$, i.e.:
$$ f: \mathbb{R}^n \to \mathbb{R}^c $$ which assigns a $c$-dimensional feature vector $f(x)\in\mathbb{R}^c$ to each spatial position $x\in\mathbb{R}^n$.
We call such vector field a **feature vector field**.



2. Definition of **steerable** : Let $(𝐹, π)$ be a feature space with a group representation and Φ : 𝑭 → $𝑭'$ a convolutional network. The feature space $𝑭'$ is said to be (linearly) **steerable** with respect to $G$, if for all transformations $g ∈ G$, the features $Φf$ and $Φπ(g)f$ are related by a linear transformation $π'(g)$ that does not depend on $f$. 
So $π'(g)$ allows us to “*steer*” the features in $𝑭'$ without referring to the input in F from which they were computed.


3. Introduction of **equivariant filter banks**: in a simple terms is a group of filter which each of them is oriented in a different orientation or scale, that allow us to divide an image with refer to different spatial components of frequencies component. Filter bank can be descrived as an array of dimension ($K',K,s,s$), where $K,K'$ denote the number of input/output channels and $s$ is the kernel size; It's possibile image its as a linear map $Ψ:𝐹→ℝ^{K'}$ that takes as input a signal $f \in 𝐹$ and produces a $K'$ dimensional feature vector. In order to make the output of the convolutional steerable, we need the filter bank $Ψ:𝐹→ℝ^{K'}$ to be H-*equivariant*    $∀h∈H$ : $ ρ(h)Ψ = Ψπ(h) $ .So with this concept it's possibile to parametrize filter banks that interwile $π$ and $ρ$, making the output fibers $H-steerable$ by $ρ$ if the input space $𝐹$ is $H-steerable$ by $π$.









 

# Steerable in practise


[Reference](https://uvadlc-notebooks.readthedocs.io/en/latest/tutorial_notebooks/DL2/Geometric_deep_learning/tutorial2_steerable_cnns.html)

With refer to the first part, we consider as group of point symmetries the *finite* group $G=C_4$.
The **point group** $G$ and its **action on the space** $\mathbb{R}^2$  are determinated  by instantiating a subclass of `gspace.GSpace`.

Having specified the symmetry transformation on the *base space* $\mathbb{R}^2$, we next need to define the representation $\rho: G \to \mathbb{R}^{c \times c}$ which describes how a **feature vector field** $f : \mathbb{R}^2 \to \mathbb{R}^c$ transforms under the action of $G$.
This transformation law of feature fields is implemented by  ``nn.FieldType``.

We instantiate the `nn.FieldType` modeling a GCNN feature by passing it the `gspaces.GSpace` instance and the *regular representation* of $G=C_4$.
We call a feature field associated with the regular representation $\rho_\text{reg}$ a **regular feature field**.

Each hidden layer of a steerable CNN has its own transformation law which the user needs to specify (equivalent to the choice of number of channels in each layer of a conventional CNN). 
The *input* and *output* of a steerable CNN are also feature fields.

The most common example is that of gray-scale input images.
A rotation of a gray-scale image is performed by moving each pixel to a new position without changing their intensity values. The invariance of the scalar pixel values under rotations is modeled by the **trivial representation** $\rho_0: G\to\mathbb{R},\ g\mapsto 1$ of $G$ and identifies them as **scalar fields**. The `nn.FieldType` can be instantiate modeling a gray-scale image by passing it the trivial representation of $G$.

Once having defined how the input and output feature spaces should transform, we can build neural network functions as **equivariant modules**.
These are implemented as subclasses of an abstract base class `nn.EquivariantModule` which itself inherits from `torch.nn.Module`.

**Equivariant Convolution Layer**: 
Let $\rho_\text{in}: G \to \mathbb{R}^{c_\text{in} \times c_\text{in}}$ and $\rho_\text{out}: G \to \mathbb{R}^{c_\text{out} \times c_\text{out}}$ be respectively the representations of $G$ associated with `feat_type_in` and `feat_type_out`.
Then, an equivariant convolution layer is a standard convolution layer with a filter $k: \mathbb{R}^2 \to \mathbb{R}^{c_\text{out} \times c_\text{in}}$ which satisfies a particular **steerability constraint**:
$$
\forall g \in G, x \in \mathbb{R}^2 \quad k(g.x) = \rho_\text{out}(g) k(x) \rho_\text{in}(g)^{-1}
$$

In particular, the use of convolution guarantees the translation equivariance, while the fact the filters satisfy this steerability constraint guarantees the $G$-equivairance.

The steerability constraint restricts the space of possible learnable filters to a smaller space of equivariant filters.
Solving this constraint goes beyond the scope of this tutorial; fortunately, the `nn.R2Conv` module takes care of properly parameterizing the filter $k$ such that it satisfies the constraint.

**Deeper Models**: In *deep learning* we usually want to stack multiple layers to build a deep model.
As long as each layer is equivariant and consecutive layers are compatible, the equivariance property is preserved by induction. The compatibility of two consecutive layers requires the output type of the first layer to be equal to the input type of the second layer.



# Project


#Mnist
First of all, we train the steerable on rotated MNIST. The original dataset is simple MNIST, which is rotated during the creation of the training and test sets, in accordance with the reference 

In [None]:
# download the dataset
!wget -nc http://www.iro.umontreal.ca/~lisa/icml2007data/mnist.zip
# uncompress the zip file
!unzip -n mnist.zip -d mnist



--2023-03-15 22:44:44--  http://www.iro.umontreal.ca/~lisa/icml2007data/mnist.zip
Resolving www.iro.umontreal.ca (www.iro.umontreal.ca)... 132.204.26.36
Connecting to www.iro.umontreal.ca (www.iro.umontreal.ca)|132.204.26.36|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 23653151 (23M) [application/zip]
Saving to: ‘mnist.zip’


2023-03-15 22:44:44 (49.4 MB/s) - ‘mnist.zip’ saved [23653151/23653151]

Archive:  mnist.zip
  inflating: mnist/mnist_train.amat  
  inflating: mnist/mnist_test.amat   


In [None]:
#@title Dataset MNIST
class MnistDataset(Dataset):

    def __init__(self, mode, rotated: bool = True):
        assert mode in ['train', 'test']

        if mode == "train":
            file = "mnist/mnist_train.amat"
        else:
            file = "mnist/mnist_test.amat"

        data = np.loadtxt(file)

        images = data[:, :-1].reshape(-1, 28, 28).astype(np.float32)

        # images are padded to have shape 29x29.
        # this allows to use odd-size filters with stride 2 when downsampling a feature map in the model
        pad = Pad((0, 0, 1, 1), fill=0)

        # to reduce interpolation artifacts (e.g. when testing the model on rotated images),
        # we upsample an image by a factor of 3, rotate it and finally downsample it again
        resize1 = Resize(87) # to upsample
        resize2 = Resize(29) # to downsample

        totensor = ToTensor()

        if rotated:
            self.images = torch.empty((images.shape[0], 1, 29, 29))
            for i in tqdm(range(images.shape[0]), leave=False):
                img = images[i]
                img = Image.fromarray(img, mode='F')
                r = (np.random.rand() * 360.)
                self.images[i] = totensor(resize2(resize1(pad(img)).rotate(r, Image.BILINEAR))).reshape(1, 29, 29)
        else:
            self.images = torch.zeros((images.shape[0], 1, 29, 29))
            self.images[:, :, :28, :28] = torch.tensor(images).reshape(-1, 1, 28, 28)

        self.labels = data[:, -1].astype(np.int64)
        self.num_samples = len(self.labels)

    def __getitem__(self, index):
        image, label = self.images[index], self.labels[index]

        return image, label

    def __len__(self):
        return len(self.labels)

In [None]:
# Set the random seed for reproducibility
np.random.seed(42)

# build the rotated training and test datasets
train_set = MnistDataset(mode='train', rotated=True)

test_set = MnistDataset(mode='test', rotated=True)




  0%|          | 0/12000 [00:00<?, ?it/s]

  0%|          | 0/50000 [00:00<?, ?it/s]

In [None]:
#@title Train and Test function for Cifar and MNIST

def test(model: torch.nn.Module):
    # test over the full rotated test set
    total = 0
    correct = 0

    test_loader = torch.utils.data.DataLoader(test_set, batch_size=64)

    with torch.no_grad():
        model.eval()
        for i, (x, t) in enumerate(test_loader):
            x = x.to(device)
            t = t.to(device)

            y = model(x)

            _, prediction = torch.max(y.data, 1)
            total += t.shape[0]
            correct += (prediction == t).sum().item()
            
    return correct/total*100.


def train(model: torch.nn.Module, lr=1e-4, wd=1e-4, checkpoint_path: str = None):

    train_loader = torch.utils.data.DataLoader(train_set, batch_size=64, num_workers=2, shuffle = True)

    loss_function = torch.nn.CrossEntropyLoss()
    optimizer = torch.optim.Adam(model.parameters(), lr=lr, weight_decay=wd)

    for epoch in tqdm(range(30)):
        model.train()
        for i, (x, t) in enumerate(train_loader):
            optimizer.zero_grad()

            x = x.to(device)
            t = t.to(device)

            y = model(x)

            loss = loss_function(y, t)

            loss.backward()

            optimizer.step()
            del x, y, t, loss

        if epoch % 3 == 0:
            accuracy = test(model)
            print(f"epoch {epoch} | test accuracy: {accuracy: .3f}")


In [None]:
#@title SteerableCNN for MNIST
class CNSteerableCNN(torch.nn.Module):

    def __init__(self, n_classes=10):

        super(CNSteerableCNN, self).__init__()

        # the model is equivariant to rotations by multiples of 2pi/N
        self.r2_act = gspaces.rot2dOnR2(N=4)

        # the input image is a scalar field, corresponding to the trivial representation
        in_type = nn.FieldType(self.r2_act, [self.r2_act.trivial_repr])

        # we store the input type for wrapping the images into a geometric tensor during the forward pass
        self.input_type = in_type

        # We need to mask the input image since the corners are moved outside the grid under rotations
        self.mask = nn.MaskModule(in_type, 29, margin=1)

        # convolution 1
        # first we build the non-linear layer, which also constructs the right feature type
        # we choose 8 feature fields, each transforming under the regular representation of C_4
        activation1 = nn.ELU(nn.FieldType(self.r2_act, 8*[self.r2_act.regular_repr]), inplace=True)
        out_type = activation1.in_type
        self.block1 = nn.SequentialModule(
            nn.R2Conv(in_type, out_type, kernel_size=7, padding=1, bias=False),
            nn.IIDBatchNorm2d(out_type),
            activation1,
        )

        # convolution 2
        # the old output type is the input type to the next layer
        in_type = self.block1.out_type
        # the output type of the second convolution layer are 16 regular feature fields
        activation2 = nn.ELU(nn.FieldType(self.r2_act, 16*[self.r2_act.regular_repr]), inplace=True)
        out_type = activation2.in_type
        self.block2 = nn.SequentialModule(
            nn.R2Conv(in_type, out_type, kernel_size=5, padding=2, bias=False),
            nn.IIDBatchNorm2d(out_type),
            activation2
        )
        self.pool1 = nn.SequentialModule(
            nn.PointwiseAvgPoolAntialiased(out_type, sigma=0.66, stride=2)
        )

        # convolution 3
        # the old output type is the input type to the next layer
        in_type = self.block2.out_type
        # the output type of the third convolution layer are 32 regular feature fields
        activation3 = nn.ELU(nn.FieldType(self.r2_act, 32*[self.r2_act.regular_repr]), inplace=True)
        out_type = activation3.in_type
        self.block3 = nn.SequentialModule(
            nn.R2Conv(in_type, out_type, kernel_size=5, padding=2, bias=False),
            nn.IIDBatchNorm2d(out_type),
            activation3
        )

        # convolution 4
        # the old output type is the input type to the next layer
        in_type = self.block3.out_type
        # the output type of the fourth convolution layer are 32 regular feature fields
        activation4 = nn.ELU(nn.FieldType(self.r2_act, 32*[self.r2_act.regular_repr]), inplace=True)
        out_type = activation4.in_type
        self.block4 = nn.SequentialModule(
            nn.R2Conv(in_type, out_type, kernel_size=5, padding=2, bias=False),
            nn.IIDBatchNorm2d(out_type),
            activation4
        )
        self.pool2 = nn.SequentialModule(
            nn.PointwiseAvgPoolAntialiased(out_type, sigma=0.66, stride=2)
        )

        # convolution 5
        # the old output type is the input type to the next layer
        in_type = self.block4.out_type
        # the output type of the fifth convolution layer are 64 regular feature fields
        activation5 = nn.ELU(nn.FieldType(self.r2_act, 64*[self.r2_act.regular_repr]), inplace=True)
        out_type = activation5.in_type
        self.block5 = nn.SequentialModule(
            nn.R2Conv(in_type, out_type, kernel_size=5, padding=2, bias=False),
            nn.IIDBatchNorm2d(out_type),
            activation5
        )

        # convolution 6
        # the old output type is the input type to the next layer
        in_type = self.block5.out_type
        # the output type of the sixth convolution layer are 64 regular feature fields
        activation6 = nn.ELU(nn.FieldType(self.r2_act, 64*[self.r2_act.regular_repr]), inplace=True)
        out_type = activation6.in_type
        self.block6 = nn.SequentialModule(
            nn.R2Conv(in_type, out_type, kernel_size=5, padding=1, bias=False),
            nn.IIDBatchNorm2d(out_type),
            activation6
        )
        self.pool3 = nn.PointwiseAvgPoolAntialiased(out_type, sigma=0.66, stride=1, padding=0)

        # number of output invariant channels
        c = 64

        output_invariant_type = nn.FieldType(self.r2_act, c*[self.r2_act.trivial_repr])
        self.invariant_map = nn.R2Conv(out_type, output_invariant_type, kernel_size=1, bias=False)


        # Fully Connected classifier
        self.fully_net = torch.nn.Sequential(
            torch.nn.BatchNorm1d(c),
            torch.nn.ELU(inplace=True),
            torch.nn.Linear(c, n_classes),
        )

    def forward(self, input: torch.Tensor):
        # wrap the input tensor in a GeometricTensor
        # (associate it with the input type)
        x = self.input_type(input)

        # mask out the corners of the input image
        x = self.mask(x)

        # apply each equivariant block

        # Each layer has an input and an output type
        # A layer takes a GeometricTensor in input.
        # This tensor needs to be associated with the same representation of the layer's input type
        #
        # Each layer outputs a new GeometricTensor, associated with the layer's output type.
        # As a result, consecutive layers need to have matching input/output types
        x = self.block1(x)
        x = self.block2(x)
        x = self.pool1(x)

        x = self.block3(x)
        x = self.block4(x)
        x = self.pool2(x)

        x = self.block5(x)
        x = self.block6(x)

        # pool over the spatial dimensions
        x = self.pool3(x)

        # extract invariant features
        x = self.invariant_map(x)

        # unwrap the output GeometricTensor
        # (take the Pytorch tensor and discard the associated representation)
        x = x.tensor

        # classify with the final fully connected layer
        x = self.fully_net(x.reshape(x.shape[0], -1))

        return x

In [None]:
#@title Model for MNIST
model_c4 = CNSteerableCNN().to(device)
train(model_c4)

accuracy = test(model_c4)
print(f"Test accuracy: {accuracy}")



  0%|          | 0/30 [00:00<?, ?it/s]

epoch 0 | test accuracy: 85.722
epoch 3 | test accuracy: 92.08
epoch 6 | test accuracy: 93.202
epoch 9 | test accuracy: 92.094
epoch 12 | test accuracy: 93.58
epoch 15 | test accuracy: 94.88799999999999
epoch 18 | test accuracy: 94.69800000000001
epoch 21 | test accuracy: 94.64399999999999
epoch 24 | test accuracy: 95.19
epoch 27 | test accuracy: 94.878
Test accuracy: 95.22


#Cifar10

In [None]:
#@title Dataset Cifar10
transform_train = transforms.Compose(
    [transforms.ToTensor(),
     transforms.Normalize([0.4914, 0.4822, 0.4465],[0.2023,0.1994,0.2010])]
     )

transform_test = transforms.Compose(
    [transforms.ToTensor(),
     transforms.Normalize([0.4914, 0.4822, 0.4465],[0.2023,0.1994,0.2010])]
    )



train_set = torchvision.datasets.CIFAR10(root='./data',
                                         train=True,
                                         download=True,
                                         transform=transform_train)
test_set =  torchvision.datasets.CIFAR10(root='./data',
                                       train = False,
                                       download= True,
                                       transform = transform_test)
                                         

classes = ('plane', 'car', 'bird', 'cat',
           'deer', 'dog', 'frog', 'horse', 'ship', 'truck')

Downloading https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz to ./data/cifar-10-python.tar.gz


  0%|          | 0/170498071 [00:00<?, ?it/s]

Extracting ./data/cifar-10-python.tar.gz to ./data
Files already downloaded and verified


**Converting grayscale images to RGB images**:

In this case, the deep feature spaces of GCNNs comprise multiple channels, similar to CNNs which can include multiple independent feature fields via direct sum, stacking multiple copies of $\rho$. In the case of 3 copies, the `nn.FieldType` is instantiated with 3 regular representations by passing the full field representation as a list of three regular representations.


In [None]:
#@title Converting grayscale images to RGB images
# As group of point symmetries is considered C4 group which models 4 rotation
# we achive that instantiating gspace.GSpace
r2_act = gspaces.rot2dOnR2(N=4)
print("r2_act: ", r2_act)


# To access to the group G 
G = r2_act.fibergroup 
print("G: ", G)

# Now we need to define the representation rho: G->R^{cxc} which describe
# how a feature vector filed f:R^2->R^c trasforms under the action of G.
# That is implemented by nb.FieldType in which we pass the gspaces.GSpace and
# regular representation of G=C4. We call a feature field associated with the 
# regular representation  𝜌_reg  a regular feature field.
# G.regular_resentaton is a permutation matrix with the shape |G|x|G|
input_type = nn.FieldType(r2_act, 3*[G.regular_representation])
print("Input_type: ", input_type) 
print("Shape G.regular_representation: \n", G.regular_representation(G.sample()) ) 


# We have stacked three copies of regular representation, which represents a direct sum

r2_act:  C4_on_R2[(None, 4)]
G:  C4
Input_type:  [C4_on_R2[(None, 4)]: {regular (x3)}(12)]
Shape G.regular_representation: 
 [[ 1.00000000e+00  5.55111512e-17 -5.55111512e-17 -8.32667268e-17]
 [ 5.55111512e-17  1.00000000e+00  5.55111512e-17 -5.55111512e-17]
 [-5.55111512e-17  5.55111512e-17  1.00000000e+00  5.55111512e-17]
 [-8.32667268e-17 -5.55111512e-17  5.55111512e-17  1.00000000e+00]]


When we build a model equivariant to a group G, we require that the output proceduced by the model transforms consistently when the input transforms under the action of an element of the group G. So we have an equivariant constraint: 

$$ \mathcal{T}^\text{out}_g \big[F(x)\big]\ =\ F\big(\mathcal{T}^\text{in}_g[x]\big) \quad \forall g\in G$$

where $\mathcal{T}^\text{in}_g$ is the transformation of the input by the group element $g$ while $\mathcal{T}^\text{out}_g$ is the transformation of the output by the same element.
The *field type* `feat_type_in`  describes $\mathcal{T}^\text{in}$. 
The transformation law $\mathcal{T}^\text{out}$ of the output of the first layer is similarly chosen by defining an instance `feat_type_out` of `nn.FieldType`.

Now is possible to defining a equivariant modules. 

**Equivariant Convolution Layer**:
Instantiating a convolutional layer that maps between fields of types `in_type` and `out_type`.

Let $\rho_\text{in}: G \to \mathbb{R}^{c_\text{in} \times c_\text{in}}$ and $\rho_\text{out}: G \to \mathbb{R}^{c_\text{out} \times c_\text{out}}$ be the representations of $G$ associated with `in_type` and `out_type`.

Equivariant convolution layer is a standard convolution layer with a filter $k: \mathbb{R}^2 \to \mathbb{R}^{c_\text{out} \times c_\text{in}}$ which satisfies **steerability constraint**:
$$
\forall g \in G, x \in \mathbb{R}^2 \quad k(g.x) = \rho_\text{out}(g) k(x) \rho_\text{in}(g)^{-1}
$$

In particular, the use of convolution guarantees the translation equivariance, while the fact the filters satisfy this steerability constraint guarantees the $G$-equivairance.

In [None]:
#@title CnSteerable on Cifar
class CNSteerableCNN_CIFAR(torch.nn.Module):
    
    def __init__(self, n_classes=10):
        
        super(CNSteerableCNN_CIFAR, self).__init__()
        
        # Group space trasformation C4
        self.r2_act = gspaces.rot2dOnR2(N=4)
        
        
        # The input image is a scalar field, formed by 3 copies of regular representation
        # each copy for each channel of RBG image. With trivial representation
        # each channel is trasformed indipendently from others
        in_type = nn.FieldType(self.r2_act, 3*[self.r2_act.trivial_repr])
        # in_type = nn.FieldType(self.r2_act, 3*[self.r2_act.regular_repr]) # da errore


        # we store the input type for wrapping the images into a geometric tensor during the forward pass
        self.input_type = in_type
        
        # We need to mask the input image since the corners are moved outside the grid under rotations
        self.mask = nn.MaskModule(in_type, 32, margin=1)

        # convolution 1

        # first we build the non-linear layer, which also constructs the right feature type
        # we choose 8 feature fields, each transforming under the regular representation of C_4
        

        # Secondo me qui ci va 3* self.r2_act
        activation1 = nn.ELU(nn.FieldType(self.r2_act, 8*[self.r2_act.regular_repr]), inplace=True)
        out_type = activation1.in_type
        self.layer1 = nn.SequentialModule(
            nn.R2Conv(in_type, out_type, kernel_size=7, padding=1, bias=False),
            nn.IIDBatchNorm2d(out_type),
            activation1,
        )

        

        # convolution 2
        # the old output type is the input type to the next layer
        in_type = self.layer1.out_type

        
        # the output type of the second convolution layer are 16 regular feature fields
        activation2 = nn.ELU(nn.FieldType(self.r2_act, 16*[self.r2_act.regular_repr]), inplace=True)
        out_type = activation2.in_type

        
        self.layer2 = nn.SequentialModule(
            nn.R2Conv(in_type, out_type, kernel_size=5, padding=2, bias=False),
            nn.IIDBatchNorm2d(out_type),
            activation2
        )
        self.pool1 = nn.SequentialModule(
            nn.PointwiseAvgPoolAntialiased(out_type, sigma=0.66, stride=2)
        )

        
        # convolution 3
        # the old output type is the input type to the next layer
        in_type = self.layer2.out_type
        

        # the output type of the third convolution layer are 32 regular feature fields
        activation3 = nn.ELU(nn.FieldType(self.r2_act, 32*[self.r2_act.regular_repr]), inplace=True)
        out_type = activation3.in_type    
        

        self.layer3 = nn.SequentialModule(
            nn.R2Conv(in_type, out_type, kernel_size=5, padding=2, bias=False),
            nn.IIDBatchNorm2d(out_type),
            activation3
        )
        
        
        # convolution 4
        # the old output type is the input type to the next layer
        in_type = self.layer3.out_type
        

        # the output type of the fourth convolution layer are 32 regular feature fields
        activation4 = nn.ELU(nn.FieldType(self.r2_act, 32*[self.r2_act.regular_repr]), inplace=True)

        out_type = activation4.in_type  
        

        self.layer4 = nn.SequentialModule(
            nn.R2Conv(in_type, out_type, kernel_size=5, padding=2, bias=False),
            nn.IIDBatchNorm2d(out_type),
            activation4
        )

        

        self.pool2 = nn.SequentialModule(
            nn.PointwiseAvgPoolAntialiased(out_type, sigma=0.66, stride=2)
        )
        

        
        # convolution 5
        # the old output type is the input type to the next layer
        in_type = self.layer4.out_type
        
        # the output type of the fifth convolution layer are 64 regular feature fields
        activation5 = nn.ELU(nn.FieldType(self.r2_act, 64*[self.r2_act.regular_repr]), inplace=True)
        out_type = activation5.in_type  
        
        self.layer5 = nn.SequentialModule(
            nn.R2Conv(in_type, out_type, kernel_size=5, padding=2, bias=False),
            nn.IIDBatchNorm2d(out_type),
            activation5
        )
        
        
        # convolution 6
        # the old output type is the input type to the next layer
        in_type = self.layer5.out_type
        
        # the output type of the sixth convolution layer are 64 regular feature fields
        activation6 = nn.ELU(nn.FieldType(self.r2_act, 64*[self.r2_act.regular_repr]), inplace=True)
        out_type = activation6.in_type  
        
        self.layer6 = nn.SequentialModule(
            nn.R2Conv(in_type, out_type, kernel_size=5, padding=1, bias=False),
            nn.IIDBatchNorm2d(out_type),
            activation6
        )
        
        self.pool3 = nn.PointwiseAvgPoolAntialiased(out_type, sigma=0.66, stride=1, padding=0)

        
        # number of output invariant channels
        c = 64

        output_invariant_type = nn.FieldType(self.r2_act, c*[self.r2_act.trivial_repr])
        

        self.invariant_map = nn.R2Conv(out_type, output_invariant_type, kernel_size=1, bias=False)
        
        
        # Fully Connected classifier
        self.fully_net = torch.nn.Sequential(
            torch.nn.BatchNorm1d(c),
            torch.nn.ELU(inplace=True),
            torch.nn.Linear(c, n_classes),
        )
        


    
    def forward(self, input: torch.Tensor):
        # wrap the input tensor in a GeometricTensor
        # (associate it with the input type)
       # print('\n Input shape ',input.shape)
        x = self.input_type(input)
      #  print('Define input type ',x.shape)
        
        # mask out the corners of the input image
        x = self.mask(x)
       # print('Define a mask ',x.shape)

        # apply each equivariant block
        
        # Each layer has an input and an output type
        # A layer takes a GeometricTensor in input.
        # This tensor needs to be associated with the same representation of the layer's input type
        #
        # Each layer outputs a new GeometricTensor, associated with the layer's output type.
        # As a result, consecutive layers need to have matching input/output types
        x = self.layer1(x)
        #print('After first layer ',x.shape) 
        x = self.layer2(x)
       # print('After second layer ',x.shape)
        x = self.pool1(x)
        #print('After pool1 ',x.shape)
        
        x = self.layer3(x)
        #print('After third layer ',x.shape)
        x = self.layer4(x)
        #print('After fourth layer',x.shape)
        x = self.pool2(x)
        #print('After pool2 ',x.shape)
        
        x = self.layer5(x)
        #print('After fifth layer ', x.shape)

        x = self.layer6(x)
       # print('After sixth layer ', x.shape)
        # pool over the spatial dimensions
        x = self.pool3(x)
       # print('After pool2 ', x.shape)
        
        # extract invariant features
        x = self.invariant_map(x)
       # print('After invariant map ',x.shape)

        # unwrap the output GeometricTensor
        # (take the Pytorch tensor and discard the associated representation)
        x = x.tensor
       # print('After retake the Geometric tensor type ', x.shape)
        
        # classify with the final fully connected layer
        x = self.fully_net(x.reshape(x.shape[0], -1))
       # print('After Fully net with reshape',x.shape)
        
        
        return x

In [None]:
#@title Model for Cifar10
model_c4 = CNSteerableCNN_CIFAR().to(device)
train(model_c4)

accuracy = test(model_c4)    
print(f"\nTest accuracy: {accuracy}")

  0%|          | 0/30 [00:00<?, ?it/s]

epoch 0 | test accuracy: 49.480000000000004
epoch 3 | test accuracy: 63.57000000000001
epoch 6 | test accuracy: 67.73
epoch 9 | test accuracy: 69.76
epoch 12 | test accuracy: 70.97
epoch 15 | test accuracy: 72.06
epoch 18 | test accuracy: 71.73
epoch 21 | test accuracy: 72.39999999999999
epoch 24 | test accuracy: 72.76
epoch 27 | test accuracy: 72.78

Test accuracy: 72.67


# Cifar10 with rotation

In [None]:
#@title Dataset Cifar10 with rotation
transform_rot = transforms.Compose(
    [
      transforms.ToTensor(),
      transforms.Normalize([0.4914, 0.4822, 0.4465],[0.2023, 0.1994, 0.2010]),
      transforms.RandomHorizontalFlip(p=0.5),
     # degrees : in range [-degrees,+degrees]
      transforms.RandomAffine(degrees=180, translate=(0.1,0.1)) 
    ])
            

train_set = torchvision.datasets.CIFAR10(
        "../data",
        train=True,
        download=True,
        transform=transform_rot)

test_set = torchvision.datasets.CIFAR10(
        "../data",
        train=False,
        download=True,
        transform=transform_rot)

Downloading https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz to ../data/cifar-10-python.tar.gz


  0%|          | 0/170498071 [00:00<?, ?it/s]

Extracting ../data/cifar-10-python.tar.gz to ../data
Files already downloaded and verified


In [None]:
#@title Model for Cifar10 with rotation

model_c4 = CNSteerableCNN_CIFAR().to(device)
train(model_c4)

accuracy = test(model_c4)    
print(f"\nTest accuracy: {accuracy}")

  0%|          | 0/30 [00:00<?, ?it/s]

epoch 0 | test accuracy: 42.02
epoch 3 | test accuracy: 51.580000000000005
epoch 6 | test accuracy: 55.50000000000001
epoch 9 | test accuracy: 57.53
epoch 12 | test accuracy: 59.64
epoch 15 | test accuracy: 60.57
epoch 18 | test accuracy: 61.42999999999999
epoch 21 | test accuracy: 62.08
epoch 24 | test accuracy: 63.129999999999995
epoch 27 | test accuracy: 64.14

Test accuracy: 64.56


# Cifar10+

In [None]:
#@title Dataset for Cifar10+
transform_plus = transforms.Compose(
    [
      transforms.ToTensor(),
      transforms.Normalize([0.4914, 0.4822, 0.4465],[0.2023, 0.1994, 0.2010]),
      transforms.RandomHorizontalFlip(p=0.5),
      transforms.RandomAffine(0, translate=(0.1,0.1)) 
    ])
                      


train_set =  torchvision.datasets.CIFAR10(
        "../data",
        train=True,
        download=True,
        transform=transform_plus)

test_set =  torchvision.datasets.CIFAR10(
        "../data",
        train=False,
        download=True,
        transform=transform_plus)



output_classes = ('plane', 'car', 'bird', 'cat', 'deer', 
                  'dog', 'frog', 'horse', 'ship', 'truck')

Files already downloaded and verified
Files already downloaded and verified


In [None]:
#@title Model for Cifar10+

model_c4 = CNSteerableCNN_CIFAR().to(device)
train(model_c4)

accuracy = test(model_c4)    
print(f"\nTest accuracy: {accuracy}")

  0%|          | 0/30 [00:00<?, ?it/s]

epoch 0 | test accuracy: 47.74
epoch 3 | test accuracy: 59.940000000000005
epoch 6 | test accuracy: 64.34
epoch 9 | test accuracy: 66.56
epoch 12 | test accuracy: 68.2
epoch 15 | test accuracy: 69.48
epoch 18 | test accuracy: 70.84
epoch 21 | test accuracy: 71.50999999999999
epoch 24 | test accuracy: 72.48
epoch 27 | test accuracy: 73.21

Test accuracy: 73.36


# Conclusion


The best performance was achieved by the MNIST rotated dataset, which, despite the increased complexity in analysis due to rotation, performs well with both GCNNs and CNNs. Regarding the CIFAR10 datasets, CNNs and GCNNs have comparable performances as expected, given the lack of some underlying image symmetries, while improvements are highlighted with steerable convolution. The CIFAR10 dataset with rotations performs worse because, in addition to rotations, there are also translations, which increase processing complexity and make effective classification difficult, while the performance of the steerable one has an accuracy close to 65%. It's important to note that the accuracy of each one is improved by the batch normalization layer. This work can be generalized by giving more emphasis to the concept of steerable and studying how to model this class with other groups besides C4 and as well as understanding how to learn the FieldType [(T.Cohen,2016)](https://arxiv.org/abs/1612.08498). Another interesting generalization is that of group-equivariant graphs [(EGCNN)](https://arxiv.org/pdf/2102.09844.pdf), considering the growing interest in these flexible objects.