# Working with Deeplay Models

Models are broadly defined as classes that represent a specific architecture, such as a ResNet18. Unlike components, they are generally not as flexible in terms of input arguments, and it should be possible to pass them directly to applications. Models are designed to be easy to use and require minimal configuration to get started. They are also designed to be easily extensible, so that you can add new features without having to modify the existing code.

## What is There in a Model?

Generally, a model should define an `.__init__()` method that takes all the necessary arguments to define the model and a `.forward()` method that defines the forward pass of the model.

Optimally, a model should have a forward pass as simple as possible. A fully sequential forward pass is optimal.
This is because any hard-coded structure in the forward pass limits the flexibility of the model. For example, if the forward pass is defined as `self.conv1(x) + self.conv2(x)`, then it is not possible to replace `self.conv1` and `self.conv2` with a single `self.conv` without modifying the model.

Moreover, the model architecture should in almost all cases be defined purely out of components and operations. Try to limit direct calls to `torch.nn` modules and `blocks`. This is because the `torch.nn` modules are not as flexible as the Deeplay components and operations. If components do not exist for the desired architecture, then it is a good idea to create a new component and add it to the `components` folder.

## Managing Unknown Tensor Sizes

Tensorflow, and by extension Keras, allows for unknown tensor sizes thanks to the graph structure. This is not possible in PyTorch.

If you need to support unknown tensor sizes, you can use the `lazy` module. This module allows for unknown tensor sizes by delaying the
construction of the model until the first forward pass. This is not optimal, so use it sparingly. Examples are `nn.LazyConv2d` and `nn.LazyLinear`.

If a model requires unknown tensor sizes, it is heavily encouraged to define the `.validate_after_build()` method, which should call the forward pass with a small input to validate that the model can be built. This will instantiate the lazy modules directly, allowing for a more user-friendly experience.

**TODO** where do the next cells belong?

In [14]:
import deeplay as dl
import torch.nn as nn

net = dl.Sequential(
    dl.ConvolutionalEncoder2d(1, [16, 32, 64], 128),
    dl.Layer(nn.AdaptiveAvgPool2d, 1),
    dl.MultiLayerPerceptron(128, [], 10)
)

class ImageClassifier(dl.Application):

    model: nn.Module

    def __init__(self, model: nn.Module):
        super().__init__()
        self.model = model

    def forward(self, x):
        return self.model(x)

classifier = ImageClassifier(net).create()
classifier

ImageClassifier(
  (model): Sequential(
    (0): ConvolutionalEncoder2d(
      (blocks): LayerList(
        (0): Conv2dBlock(
          (layer): Conv2d(1, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
          (activation): ReLU()
        )
        (1): Conv2dBlock(
          (pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
          (layer): Conv2d(16, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
          (activation): ReLU()
        )
        (2): Conv2dBlock(
          (pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
          (layer): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
          (activation): ReLU()
        )
        (3): Conv2dBlock(
          (pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
          (layer): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
          (activation): Identity()
        )
      )
 

Next, we allow the user to set the optimizer. Note that we are using the `create_optimizer_with_params` method to create the optimizer. We also use Adam as the default optimizer. It is better to set the default value of the optimizer to `None` and then set it to Adam in the `__init__` method. This is because the optimizer is a mutable object, and setting it to a default value of `Adam()` will cause all instances of the class to share the same optimizer object.

In [15]:
from typing import Optional

class ImageClassifier(dl.Application):

    model: nn.Module
    optimizer: dl.Optimizer

    def __init__(self, model: nn.Module, optimizer: Optional[dl.Optimizer] = None):
        super().__init__()
        self.model = model
        self.optimizer = optimizer or dl.Adam(lr=0.001)

    def forward(self, x):
        return self.model(x)

    def configure_optimizers(self):
        return self.create_optimizer_with_params(self.optimizer, self.parameters())
    
classifier = ImageClassifier(net).create()
classifier

ImageClassifier(
  (model): Sequential(
    (0): ConvolutionalEncoder2d(
      (blocks): LayerList(
        (0): Conv2dBlock(
          (layer): Conv2d(1, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
          (activation): ReLU()
        )
        (1): Conv2dBlock(
          (pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
          (layer): Conv2d(16, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
          (activation): ReLU()
        )
        (2): Conv2dBlock(
          (pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
          (layer): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
          (activation): ReLU()
        )
        (3): Conv2dBlock(
          (pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
          (layer): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
          (activation): Identity()
        )
      )
 