In [1]:
import sys
sys.path.insert(0, "../../../")

In [2]:
import deeptorch as dtm

import torch
import torch.nn as nn
import torchvision
import pytorch_lightning as pl

In [3]:
# Load the MNIST dataset
mnist = torchvision.datasets.MNIST(
    root="data", train=True, download=True, transform=torchvision.transforms.ToTensor()
)

mnist_test = torchvision.datasets.MNIST(
    root="data", train=False, download=True, transform=torchvision.transforms.ToTensor()
)

In [4]:
mnist_dataloader = torch.utils.data.DataLoader(mnist, batch_size=32, num_workers=4)
mnist_test_dataloader = torch.utils.data.DataLoader(mnist_test, batch_size=32, num_workers=4)

## 1. Creating an ImageClassifier

There are two ways of creating a DeepTorchModel. They operate almost identically, but there are cases where one may be more convenient than the other.

These are:
- Using the class constructor method (`ImageClassifier()`)
- Using the `from_config` method (`ImageClassifier.from_config(config)`)

The first option, `ImageClassifier()` can be convenient when no complex customization is needed, for example: 

In [5]:
classifier = dtm.ImageClassifier(num_classes=10)
classifier

ImageClassifier(
  (backbone): ConvolutionalEncoder(
    (blocks): ModuleList(
      (0): Template(
        (layer): LazyConv2d(0, 8, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (activation): ReLU()
        (pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
      )
      (1): Template(
        (layer): LazyConv2d(0, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (activation): ReLU()
        (pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
      )
      (2): Template(
        (layer): LazyConv2d(0, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (activation): ReLU()
        (pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
      )
      (3): Template(
        (layer): LazyConv2d(0, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (activation): ReLU()
        (pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mo

The second option, `ImageClassifier.from_config(config)`, can be more convenient if the different components of `ImageClassifier` should be modified: 

In [6]:
config = dtm.Config().num_classes(10)
classifier = dtm.ImageClassifier.from_config(config)

Above, we used the `Config` object. `Config` is the heart and soul of `DeepTorch`, and is a very powerful way of defining rules for how a `DeepTorch` model should be created.

First, let's briefly introduce the syntax of Config. (TODO: create a separate more in-depth tutorial for Config)

### 1.1 Creating Config objects

`Config` objects are collections of `rules`, which link some name or selection (`num_classes`) to a value (`10`). `Config` objects are designed to be defined in a single line. So, for example:

In [7]:
config = (
    dtm.Config()
       .num_classes(10)
       .batch_size(128)
       .learning_rate(0.001)
)
config

Config(
#.num_classes = 10
#.batch_size = 128
#.learning_rate = 0.001
)

Above is a simple configuration object with `num_classes`, `batch_size` and `learning_rate` defined.
These can then be accessed using the `get` method.

In [8]:
config.get("num_classes")

10

or the `get_parameters` method. 

In [9]:
config.get_parameters()

{'num_classes': 10, 'batch_size': 128, 'learning_rate': 0.001}

`DeepTorchModule`s will take parameters from the `Config` object as required. 

There's much more to discover about `Config`, but we'll introduce these topics as we progress through this tutorial.

# 2 - Customizing ImageClassifier using Config attributes

`ImageClassifier` accepts the following arguments:
- `num_classes`: Integer, number of classes (10 in MNIST)
- `backbone`:    Specifier for the backbone module
- `head`:        Specifier for the classification head module
- `connector`    Specifier for the module connecting `backbone` to `head`.

Let's focus on `connector`. To customize the `connector` of `ImageClassifier`, we can pass a `Config` object with our specification of the backbone. 

As mentioned above, there are to ways to create a `DeepTorchModel`. For this first example, we'll demonstrate both. In the later examples, we'll stick to the `from_config` syntax for clarity. 

In [10]:
# Using constructor and no config
classifier = dtm.ImageClassifier(
    num_classes=10,
    connector=nn.Flatten(),       
)
classifier

ImageClassifier(
  (backbone): ConvolutionalEncoder(
    (blocks): ModuleList(
      (0): Template(
        (layer): LazyConv2d(0, 8, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (activation): ReLU()
        (pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
      )
      (1): Template(
        (layer): LazyConv2d(0, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (activation): ReLU()
        (pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
      )
      (2): Template(
        (layer): LazyConv2d(0, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (activation): ReLU()
        (pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
      )
      (3): Template(
        (layer): LazyConv2d(0, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (activation): ReLU()
        (pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mo

In [11]:
# Using from_config
classifier = dtm.ImageClassifier.from_config(
    dtm.Config()
    .num_classes(2)
    .connector(nn.Flatten())   
)
classifier

ImageClassifier(
  (backbone): ConvolutionalEncoder(
    (blocks): ModuleList(
      (0): Template(
        (layer): LazyConv2d(0, 8, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (activation): ReLU()
        (pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
      )
      (1): Template(
        (layer): LazyConv2d(0, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (activation): ReLU()
        (pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
      )
      (2): Template(
        (layer): LazyConv2d(0, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (activation): ReLU()
        (pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
      )
      (3): Template(
        (layer): LazyConv2d(0, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (activation): ReLU()
        (pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mo

Here we passed `nn.Flatten()` as the connector. It is also possible to pass `nn.Flatten` without instantiating it. DeepTorch will instatiate the class correctly where it is needed. 

It is generally recommended to _not_ instantiate the class. In many cases, the model will want to create many instances of the class for use in different places. If we instantiate the class _outside_ the model, then that same class will be used in multiple places. For some operations (e.g. `nn.Flatten`, `nn.ReLU`), this is safe. But for most (e.g. `nn.Linear`, `nn.Conv2d`), it is not. 

Arguments used to instantiate the class can be passed after the class in `Config`:

In [12]:
config = (
    dtm.Config()
        .num_classes(10)
        .connector(nn.Flatten, start_dim=1, end_dim=-1)
)

classifier = dtm.ImageClassifier.from_config(config)

Or, equivalently:

In [13]:
config = (
    dtm.Config()
        .num_classes(10)
        .connector(nn.Flatten)
        .connector.start_dim(1)
        .connector.end_dim(-1)
)
classifier = dtm.ImageClassifier.from_config(config)

This second syntax demonstrates another property of `Config`. It is trivial to define nested structures by simply chaining `.` accessors. We will see why this is useful later in this notebook. 

As a final example, here is how one would using global average pooling instead of flatten to connect the convolutional backbone with the classification head:

In [14]:
classifier = dtm.ImageClassifier.from_config(
    dtm.Config()
        .num_classes(10)
        .connector(nn.AdaptiveAvgPool2d, output_size=(1, 1))
)
classifier

ImageClassifier(
  (backbone): ConvolutionalEncoder(
    (blocks): ModuleList(
      (0): Template(
        (layer): LazyConv2d(0, 8, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (activation): ReLU()
        (pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
      )
      (1): Template(
        (layer): LazyConv2d(0, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (activation): ReLU()
        (pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
      )
      (2): Template(
        (layer): LazyConv2d(0, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (activation): ReLU()
        (pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
      )
      (3): Template(
        (layer): LazyConv2d(0, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (activation): ReLU()
        (pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mo

## 2.1 Customizing the backbone

Ok, customizing the connector is simple; it's just a single layer after all. How about the backbone, which would typically be made of many layers? 

Well, there are three main ways:

1. DeepTorch modules
2. DeepTorch templates
3. Torch modules

### 2.1.1 DeepTorch modules

The simplest way to customizing the backbone (or any other multi-layer component) is to stay in the `DeepTorch` ecosystem:

In [15]:
config = (
    dtm.Config()
        .num_classes(10)
        .backbone(dtm.ConvolutionalEncoder)
)
classifier = dtm.ImageClassifier.from_config(config)
classifier

ImageClassifier(
  (backbone): ConvolutionalEncoder(
    (blocks): ModuleList(
      (0): Template(
        (layer): LazyConv2d(0, 8, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (activation): ReLU()
        (pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
      )
      (1): Template(
        (layer): LazyConv2d(0, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (activation): ReLU()
        (pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
      )
      (2): Template(
        (layer): LazyConv2d(0, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (activation): ReLU()
        (pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
      )
      (3): Template(
        (layer): LazyConv2d(0, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (activation): ReLU()
        (pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mo

As we mentioned above, arguments to instantiate the class can also the set in the config. 

`ConvolutionalEncoder` takes two arguments:
- `blocks`: specifier for each step of the encoder.
- `depth`: Integer, number of blocks in the encoder

As such, we should be able easily change the size of the encoder as follows:

In [16]:
config = (
    dtm.Config()
        .num_classes()
        .backbone(dtm.ConvolutionalEncoder, depth=2)
)
classifier = dtm.ImageClassifier.from_config(config)
classifier

ImageClassifier(
  (backbone): ConvolutionalEncoder(
    (blocks): ModuleList(
      (0): Template(
        (layer): LazyConv2d(0, 8, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (activation): ReLU()
        (pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
      )
      (1): Template(
        (layer): LazyConv2d(0, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (activation): ReLU()
        (pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
      )
    )
  )
  (connector): Flatten(start_dim=1, end_dim=-1)
  (head): CategoricalClassificationHead(
    (output): Template(
      (layer): LazyLinear(in_features=0, out_features=2, bias=True)
      (activation): Softmax(dim=-1)
    )
  )
  (val_accuracy): Accuracy()
)

But aren't we back to the same problem? How do we customize the `backbone.blocks`? Well, now we're at a small enough scale that we can use templates:

### 2.1.2 DeepTorch templates

Templates are a convenient way of defining small blocks of layers. The syntax is simple:

In [17]:
block_template = dtm.Layer("layer") >> dtm.Layer("activation") >> dtm.Layer("pool")

Here `Layer` is a sort of placeholder for a module that has is assigned a name (`"layer"`, `"activation"`, `"pool"`). Any name is valid. `>>` is a piping operation. It just means that first `dtm.Layer("layer")` is evaluated, then `dtm.Layer("activation")` and finally `dtm.Layer("pool")`.
We can assign modules to these names in the config. Let's demonstrate:

In [18]:
config = (
    dtm.Config()
        .num_classes(10)
        .backbone(dtm.ConvolutionalEncoder)
        .backbone.depth(2)
        .backbone.blocks(block_template)

        # We can now refer to our names layer, activation, pool
        .backbone.blocks.layer(nn.LazyConv2d, kernel_size=3, padding=1)
        .backbone.blocks.activation(nn.LeakyReLU, negative_slope=0.2)
        .backbone.blocks.pool(nn.MaxPool2d)
)
classifier = dtm.ImageClassifier.from_config(config)
classifier

ImageClassifier(
  (backbone): ConvolutionalEncoder(
    (blocks): ModuleList(
      (0): Template(
        (layer): LazyConv2d(0, 8, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (activation): LeakyReLU(negative_slope=0.2)
        (pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
      )
      (1): Template(
        (layer): LazyConv2d(0, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (activation): LeakyReLU(negative_slope=0.2)
        (pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
      )
    )
  )
  (connector): Flatten(start_dim=1, end_dim=-1)
  (head): CategoricalClassificationHead(
    (output): Template(
      (layer): LazyLinear(in_features=0, out_features=10, bias=True)
      (activation): Softmax(dim=-1)
    )
  )
  (val_accuracy): Accuracy()
)

In [None]:
block_with_skip = dtm.Layer("socket") >> dtm.Layer("layer") >> dtm.Layer("activation") >> dtm.Layer("pool")

branch = dtm.OutputOf("encoder.blocks[4]")  >> dtm.Layer("layer") >> dtm.Layer("activation")

branches = [dtm.OutputOf(f"encoder.blocks[{4-i}]") >> dtm.Layer("layer") >> dtm.Layer("activation")
            for i in range(4)]

config = (
    dtm.Config()
        # Adding
        .decoder.blocks[0](block_with_skip)
        .decoder.blocks[0].socket(dtm.Concatenate, dim=1)
        .decoder.blocks[0].socket.inputs[0](None)
        .decoder.blocks[0].socket.inputs[1](dtm.OutputOf("encoder.blocks[4]"))
        .decoder.blocks[0].socket.inputs[1].layer(nn.Conv2d, kernel_size=1, bias=False)
        .decoder.blocks[0].socket.inputs[1].activation(nn.LeakyReLU, negative_slope=0.2)


        # Custom skip
        .decoder.blocks[0](block_with_skip)
        .decoder.blocks[0].socket(
            dtm.Skip,
            func=lambda a, b: nn.cat((a, b), dim=1),
            inputs=(None, "encoder.blocks[4].layer"),
            dim=1,
        )

        # Alternative syntax
        .decoder.blocks[0](block_with_skip)
        .decoder.blocks[0].skip(dtm.Skip)
        .decoder.blocks[0].skip.func(lambda a, b: nn.cat((a, b), dim=1))
        .decoder.blocks[0].skip.inputs((None, "encoder.blocks[4].layer"))
        .decoder.blocks[0].skip.dim(1)

        # Multiple
        .decoder.blocks[:4](block_template)
        .decoder.blocks[:4].socket(nn.Concatenate, dim=1)
        .decoder.blocks[:4].populate("socket.inputs[1]", branches)
        .decoder.blocks.socket.inputs[1](dtm.OutputOf("encoder.blocks[4]"))
        .decoder.blocks.socket.inputs[1].layer(nn.Conv2d, kernel_size=1, bias=False)
        .decoder.blocks.socket.inputs[1].activation(nn.LeakyReLU, negative_slope=0.2)

        # Removing
        .decoder.blocks[2].socket(dtm.NoSkip)

)
classifier = dtm.ImageClassifier.from_config(config)
classifier

Note, a lot of what we specified are already the default values. We can omit everything that's already specified in the defaults:

In [19]:
config = (
    dtm.Config()
        .num_classes(10)
        .backbone.depth(2)
        .backbone.blocks.activation(nn.LeakyReLU, negative_slope=0.2)
)
classifier = dtm.ImageClassifier.from_config(config)
classifier

ImageClassifier(
  (backbone): ConvolutionalEncoder(
    (blocks): ModuleList(
      (0): Template(
        (layer): LazyConv2d(0, 8, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (activation): LeakyReLU(negative_slope=0.2)
        (pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
      )
      (1): Template(
        (layer): LazyConv2d(0, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (activation): LeakyReLU(negative_slope=0.2)
        (pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
      )
    )
  )
  (connector): Flatten(start_dim=1, end_dim=-1)
  (head): CategoricalClassificationHead(
    (output): Template(
      (layer): LazyLinear(in_features=0, out_features=10, bias=True)
      (activation): Softmax(dim=-1)
    )
  )
  (val_accuracy): Accuracy()
)

If you are unsure about the structure of a submodule, you can use `_` and `__` as wildcard and double wildcard selectors. These operate exactly like `*` and `**` do in `glob`. In other words
- `_` will match exactly one name
- `__` will match any number of names, including zero

Here `_` is a wildcard that can match with any value
It will match with `blocks`, since that is the only value in the list. 
As such, only the activation layers inside of backbone.blocks will change to `LeakyReLU`

In [20]:
classifier = dtm.ImageClassifier.from_config(
    dtm.Config()
        .num_classes(10)
        .backbone._.activation(nn.LeakyReLU, negative_slope=0.2)
)
classifier

ImageClassifier(
  (backbone): ConvolutionalEncoder(
    (blocks): ModuleList(
      (0): Template(
        (layer): LazyConv2d(0, 8, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (activation): LeakyReLU(negative_slope=0.2)
        (pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
      )
      (1): Template(
        (layer): LazyConv2d(0, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (activation): LeakyReLU(negative_slope=0.2)
        (pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
      )
      (2): Template(
        (layer): LazyConv2d(0, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (activation): LeakyReLU(negative_slope=0.2)
        (pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
      )
      (3): Template(
        (layer): LazyConv2d(0, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (activation): LeakyReLU(negative_

Here we have two `_` wildcards. 
In this case, there will be two matches, and our rule will apply to both. The two matches are:
1. `backbone.blocks.activation`
2. `head.output.activation`

Both will now be `LeakyReLU`

In [21]:
classifier = dtm.ImageClassifier.from_config(
    dtm.Config()
    .num_classes(10)
    ._._.activation(nn.LeakyReLU, negative_slope=0.2)
)

classifier

ImageClassifier(
  (backbone): ConvolutionalEncoder(
    (blocks): ModuleList(
      (0): Template(
        (layer): LazyConv2d(0, 8, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (activation): LeakyReLU(negative_slope=0.2)
        (pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
      )
      (1): Template(
        (layer): LazyConv2d(0, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (activation): LeakyReLU(negative_slope=0.2)
        (pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
      )
      (2): Template(
        (layer): LazyConv2d(0, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (activation): LeakyReLU(negative_slope=0.2)
        (pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
      )
      (3): Template(
        (layer): LazyConv2d(0, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (activation): LeakyReLU(negative_

`__` will match with any nested structure, regardless of depth. For example, `Config().foo.__.baz` would apply to all of the following:
- foo.baz
- foo.bax.baz
- foo.bax.bix.baz
- foo.bax.bix.zig.baz
etc.

Here we see it in action:

In [22]:
classifier = dtm.ImageClassifier.from_config(
    dtm.Config()
    .num_classes(10)
    .__.activation(nn.LeakyReLU)
    .__.negative_slope(0.1)
)

classifier

ImageClassifier(
  (backbone): ConvolutionalEncoder(
    (blocks): ModuleList(
      (0): Template(
        (layer): LazyConv2d(0, 8, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (activation): LeakyReLU(negative_slope=0.1)
        (pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
      )
      (1): Template(
        (layer): LazyConv2d(0, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (activation): LeakyReLU(negative_slope=0.1)
        (pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
      )
      (2): Template(
        (layer): LazyConv2d(0, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (activation): LeakyReLU(negative_slope=0.1)
        (pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
      )
      (3): Template(
        (layer): LazyConv2d(0, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (activation): LeakyReLU(negative_

### 2.1.3 - Toruch modules

Of course, one can use standard torch modules. This can be useful if you want to use pretrained, standard networks.

In [23]:
import torchvision

backbone = torchvision.models.resnet18(pretrained=True)
# We don't want the pooling or the fully connected layers, so we'll remove them:
backbone.avgpool = nn.Identity()
backbone.fc = nn.Identity()

classifier = dtm.ImageClassifier.from_config(
    dtm.Config()
    .num_classes(10)
    .backbone(backbone)
)

classifier



ImageClassifier(
  (backbone): ResNet(
    (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
    (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (relu): ReLU(inplace=True)
    (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
    (layer1): Sequential(
      (0): BasicBlock(
        (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu): ReLU(inplace=True)
        (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
      (1): BasicBlock(
        (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, 

# 2.2 Customizing backbone specific blocks

So far, we have seen how to modify all blocks at once. However, one may want even more control over the final model by modifying individual blocks.

We can achieve this using indexing with `blocks[index]`! Below we demonstrate how to set the template of individual blocks:

In [24]:
# The first two blocks have normalization after the convolutional layer,
block_0_and_1 = dtm.Layer("layer") >> dtm.Layer("normalization") >> dtm.Layer("activation") >> dtm.Layer("pool")

# The third blocks has no normalization after the convolutional layer.
block_2 = dtm.Layer("layer") >> dtm.Layer("activation") >> dtm.Layer("pool")

# The fourth blocks has no pooling and no normalization after the convolutional layer.
block_3 = dtm.Layer("layer") >> dtm.Layer("activation")

classifier = dtm.ImageClassifier.from_config(
    dtm.Config()
        .num_classes(10)
        .backbone.blocks[:2](block_0_and_1) # will apply to the first two blocks
        .backbone.blocks[2](block_2) # will apply to the third blocks
        .backbone.blocks[3](block_3) # will apply to the fourth blocks
        
        # will apply to all blocks, but only block 0 and 1 use it.
        .backbone.blocks.normalization(nn.LazyBatchNorm2d)
        
)

classifier

ImageClassifier(
  (backbone): ConvolutionalEncoder(
    (blocks): ModuleList(
      (0): Template(
        (layer): LazyConv2d(0, 8, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (normalization): LazyBatchNorm2d(0, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (activation): ReLU()
        (pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
      )
      (1): Template(
        (layer): LazyConv2d(0, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (normalization): LazyBatchNorm2d(0, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (activation): ReLU()
        (pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
      )
      (2): Template(
        (layer): LazyConv2d(0, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (activation): ReLU()
        (pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
      )
 

As an alternative, one can also set Layers to `Identity` where one does not want them.

In [25]:
# An alternative way to do create the same model as above:
block = dtm.Layer("layer") >> dtm.Layer("normalization") >> dtm.Layer("activation") >> dtm.Layer("pool")
classifier = dtm.ImageClassifier(
    num_classes=10,
    backbone=dtm.Config()
                .blocks(block, normalization=nn.LazyBatchNorm2d)
                .blocks[2:4].normalization(nn.Identity) # will apply to the third and fourth blocks
                .blocks[3].pool(nn.Identity) # will apply to the fourth block
)

classifier

ImageClassifier(
  (backbone): ConvolutionalEncoder(
    (blocks): ModuleList(
      (0): Template(
        (layer): LazyConv2d(0, 8, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (normalization): LazyBatchNorm2d(0, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (activation): ReLU()
        (pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
      )
      (1): Template(
        (layer): LazyConv2d(0, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (normalization): LazyBatchNorm2d(0, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (activation): ReLU()
        (pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
      )
      (2): Template(
        (layer): LazyConv2d(0, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (normalization): Identity()
        (activation): ReLU()
        (pool): MaxPool2d(kernel_size=2, stride=2, padding=0, di

For some attributes, it's expected that each block has a different value for that attribute. A common example is `out_channels`, which is expected to increase deeper in the network. Following what we know so far, one could do:

In [26]:

classifier = dtm.ImageClassifier.from_config(
    dtm.Config()
        .num_classes(10)
        .backbone.blocks[0].layer.out_channels(4)
        .backbone.blocks[1].layer.out_channels(8)
        .backbone.blocks[2].layer.out_channels(16)
        .backbone.blocks[3].layer.out_channels(32)
)

classifier

ImageClassifier(
  (backbone): ConvolutionalEncoder(
    (blocks): ModuleList(
      (0): Template(
        (layer): LazyConv2d(0, 4, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (activation): ReLU()
        (pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
      )
      (1): Template(
        (layer): LazyConv2d(0, 8, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (activation): ReLU()
        (pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
      )
      (2): Template(
        (layer): LazyConv2d(0, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (activation): ReLU()
        (pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
      )
      (3): Template(
        (layer): LazyConv2d(0, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (activation): ReLU()
        (pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mod

However, this can be a bit verbose. In these cases, one can optionally use the `populate` method to set all values at once. `populate` can take values either from a list or a function. Let's demonstrate:

In [27]:

classifier = dtm.ImageClassifier.from_config(
    dtm.Config()
        .num_classes(10)
        .backbone.blocks[0:4].populate("layer.out_channels", [4, 8, 16, 32])
)

classifier

ImageClassifier(
  (backbone): ConvolutionalEncoder(
    (blocks): ModuleList(
      (0): Template(
        (layer): LazyConv2d(0, 4, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (activation): ReLU()
        (pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
      )
      (1): Template(
        (layer): LazyConv2d(0, 8, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (activation): ReLU()
        (pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
      )
      (2): Template(
        (layer): LazyConv2d(0, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (activation): ReLU()
        (pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
      )
      (3): Template(
        (layer): LazyConv2d(0, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (activation): ReLU()
        (pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mod

In [28]:

classifier = dtm.ImageClassifier.from_config(
    dtm.Config()
        .num_classes(10)
        .backbone.blocks[0:4].populate("layer.out_channels", lambda i: 4 * 2**i)
)

classifier

ImageClassifier(
  (backbone): ConvolutionalEncoder(
    (blocks): ModuleList(
      (0): Template(
        (layer): LazyConv2d(0, 4, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (activation): ReLU()
        (pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
      )
      (1): Template(
        (layer): LazyConv2d(0, 8, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (activation): ReLU()
        (pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
      )
      (2): Template(
        (layer): LazyConv2d(0, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (activation): ReLU()
        (pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
      )
      (3): Template(
        (layer): LazyConv2d(0, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (activation): ReLU()
        (pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mod

### 2.extra

As a bonus, we can also start directly from templates!

In [31]:
classifier_template = dtm.Layer("backbone") >> dtm.Layer("connector") >> dtm.Layer("head")

classifier = classifier_template.from_config(
    dtm.Config()
        .backbone(dtm.ConvolutionalEncoder)
        .connector(nn.Flatten)
        .head(dtm.CategoricalClassificationHead)
)

classifier

Template(
  (backbone): ConvolutionalEncoder(
    (blocks): ModuleList(
      (0): Template(
        (layer): LazyConv2d(0, 8, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (activation): ReLU()
        (pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
      )
      (1): Template(
        (layer): LazyConv2d(0, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (activation): ReLU()
        (pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
      )
      (2): Template(
        (layer): LazyConv2d(0, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (activation): ReLU()
        (pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
      )
      (3): Template(
        (layer): LazyConv2d(0, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (activation): ReLU()
        (pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=Fals

However, keep in mind that the resulting model will be a pure `pytorch` model, and not a `pytorch_lightning` model. As such, many of the conveniences of `pytorch_lightning` are unavailable.

# Training the classifier

In [32]:

# The first two blocks have normalization after the convolutional layer,
block_0_and_1 = dtm.Layer("layer") >> dtm.Layer("normalization") >> dtm.Layer("activation") >> dtm.Layer("pool")

# The third blocks has no normalization after the convolutional layer.
block_2 = dtm.Layer("layer") >> dtm.Layer("activation") >> dtm.Layer("pool")

# The fourth blocks has no pooling and no normalization after the convolutional layer.
block_3 = dtm.Layer("layer") >> dtm.Layer("activation")

dtm.Skip(
    dtm.Config().encoder.blocks[2],
    dtm.Config().decoder.blocks[4]
)

classifier = dtm.ImageClassifier.from_config(
    dtm.Config()
        .num_classes(10)
        .backbone.blocks[:2](block_0_and_1) # will apply to the first two blocks
        .backbone.blocks[2](block_2) # will apply to the third blocks
        .backbone.blocks[3](block_3) # will apply to the fourth blocks
        .decoder.blocks(dtm.Skip() >> dtm.Layer())
        .decoder.blocks.skip(nn.Concatenate, aux="encoder.block"
        # will apply to all blocks, but only block 0 and 1 use it.
        .backbone.blocks.normalization(nn.LazyBatchNorm2d)
        
)

# Not necessary, but to get the number of parameters we first need to run the model with a batch of data
# This is because the model is initialized lazily
classifier(torch.rand(1, 1, 28, 28))


trainer = pl.Trainer(max_epochs=5, accelerator="cuda")
trainer.fit(classifier, mnist_dataloader, mnist_test_dataloader)

GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]

  | Name         | Type                          | Params
---------------------------------------------------------------
0 | backbone     | ConvolutionalEncoder          | 24.4 K
1 | connector    | Flatten                       | 0     
2 | head         | CategoricalClassificationHead | 5.8 K 
3 | val_accuracy | Accuracy                      | 0     
---------------------------------------------------------------
30.2 K    Trainable params
0         Non-trainable params
30.2 K    Total params
0.121     Total estimated model params size (MB)


Sanity Checking: 0it [00:00, ?it/s]

Training: 0it [00:00, ?it/s]

Validation: 0it [00:00, ?it/s]

Validation: 0it [00:00, ?it/s]

Validation: 0it [00:00, ?it/s]

Validation: 0it [00:00, ?it/s]

Validation: 0it [00:00, ?it/s]

`Trainer.fit` stopped: `max_epochs=5` reached.


In [33]:
trainer.test(dataloaders=mnist_test_dataloader)

  rank_zero_warn(
Restoring states from the checkpoint path at c:\Users\GU\DeepTorch\examples\vision\classification\lightning_logs\version_26\checkpoints\epoch=4-step=9375.ckpt
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
Loaded model weights from checkpoint at c:\Users\GU\DeepTorch\examples\vision\classification\lightning_logs\version_26\checkpoints\epoch=4-step=9375.ckpt


Testing: 0it [00:00, ?it/s]

[{'test_acc': 0.9857000112533569, 'test_loss': 1.4754879474639893}]