# Working with Deeplay Modules

In this section, you'll learn how to create and build Deeplay modules as well as how to configure their properties. You'll also understand the difference between Deeplay and PyTorch modules. You'll 

## Understanding the Differences between Deeplay and PyTorch Modules

The biggest difference between a Deeplay module and a PyTorch module is that the Deeplay module is not immediately fully initialized. This is to allow the user to further configure it before the module is built.

## Creating and Building Deeplay Modules

Deeplay modules can be built using either `.create()` or `.build()`. 

Use the `.create()` method when you want to keep the original module for subsequent use and/or modification. 

Use the `.build()` method when you want to build the module in-place. 

Define a Deeplay module ...

In [1]:
import deeplay as dl

mlp = dl.models.SmallMLP(in_features=784, out_features=10)

print(mlp)

SmallMLP(
  (blocks): LayerList(
    (0): LinearBlock(
      (layer): Layer[Linear](in_features=784, out_features=32, bias=True)
      (activation): Layer[LeakyReLU](negative_slope=0.05)
      (normalization): Layer[BatchNorm1d](num_features=32)
    )
    (1): LinearBlock(
      (layer): Layer[Linear](in_features=32, out_features=32, bias=True)
      (activation): Layer[LeakyReLU](negative_slope=0.05)
      (normalization): Layer[BatchNorm1d](num_features=32)
    )
    (2): LinearBlock(
      (layer): Layer[Linear](in_features=32, out_features=10, bias=True)
      (activation): Layer[Identity]()
    )
  )
)


... this module is not built yet. For example, this can be seen in the summary by the existence of Deeplay `Layer` objects followed by the underlying PyTorch layer in square brackets. Once the module is built, the `Layer` objects are replaced by the actual PyTorch layers. 

Start by creating the `mlp` module ...

In [2]:
created_mlp = mlp.create()

print("mlp=\n", mlp)
print("created_mlp=\n", created_mlp)

mlp=
 SmallMLP(
  (blocks): LayerList(
    (0): LinearBlock(
      (layer): Layer[Linear](in_features=784, out_features=32, bias=True)
      (activation): Layer[LeakyReLU](negative_slope=0.05)
      (normalization): Layer[BatchNorm1d](num_features=32)
    )
    (1): LinearBlock(
      (layer): Layer[Linear](in_features=32, out_features=32, bias=True)
      (activation): Layer[LeakyReLU](negative_slope=0.05)
      (normalization): Layer[BatchNorm1d](num_features=32)
    )
    (2): LinearBlock(
      (layer): Layer[Linear](in_features=32, out_features=10, bias=True)
      (activation): Layer[Identity]()
    )
  )
)
created_mlp=
 SmallMLP(
  (blocks): LayerList(
    (0): LinearBlock(
      (layer): Linear(in_features=784, out_features=32, bias=True)
      (activation): LeakyReLU(negative_slope=0.05)
      (normalization): BatchNorm1d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
    (1): LinearBlock(
      (layer): Linear(in_features=32, out_features=32, bias=T

You can see that `created_mlp` is a new module, while `mlp` is the same module as before.

Now, build the `mlp` module ...

In [3]:
built_mlp = mlp.build()

print("mlp=\n", mlp)
print("built_mlp=\n", built_mlp)

mlp=
 SmallMLP(
  (blocks): LayerList(
    (0): LinearBlock(
      (layer): Linear(in_features=784, out_features=32, bias=True)
      (activation): LeakyReLU(negative_slope=0.05)
      (normalization): BatchNorm1d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
    (1): LinearBlock(
      (layer): Linear(in_features=32, out_features=32, bias=True)
      (activation): LeakyReLU(negative_slope=0.05)
      (normalization): BatchNorm1d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
    (2): LinearBlock(
      (layer): Linear(in_features=32, out_features=10, bias=True)
      (activation): Identity()
    )
  )
)
built_mlp=
 SmallMLP(
  (blocks): LayerList(
    (0): LinearBlock(
      (layer): Linear(in_features=784, out_features=32, bias=True)
      (activation): LeakyReLU(negative_slope=0.05)
      (normalization): BatchNorm1d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
    (1): LinearBlock(
      (layer): Li

... you can see that both `mlp` and `built_mlp` are now built. In fact, they are the same object!

In [4]:
mlp is built_mlp

True

### Deciding Whether to Use `.build()` or `.create()`

In general, you'll want to use `.build()` when you are sure you won't need the original module anymore, and `.create()` when you want to keep the original template (for example, when you want to create multiple similar modules). Most of the time, you'll probably want to use `.build()`.

The `.create()` method is actually equivalent to `.new().build()`. This is because `.new()` clones the object, and `.build()` builds the object in-place. The `.new()` method can also be used by itself to create a clone of the object without building it.

## Working with PyTorch Tensors

Deeplay is compatible with both NumPy arrays and PyTorch tensors. However, internally, when a NumPy tensor is passed to the model, it is converted to a PyTorch tensor. This is because PyTorch only works with PyTorch tensors. This conversion also moves the channel dimension of the tensor from the last dimension to the first non-batch dimension (as is expected by PyTorch). 

**NOTE:** While Deeplay takes all possible care to to ensure that this is done correctly, it is generally recommend directly providing PyTorch tensors to avoid any automatic permuting of your data.

## Configuring Modules

Deeplay modules have a configuration system that allows you to easily change the properties of a module. At its core, this is done using the `.configure()` method. However, most modules also have specific configuration methods that allow you to change specific properties. For example, the `LinearBlock` has the `.normalized()` and `.activated()` methods that allow you to add normalization and activation to the block.

Importantly, most configurations are applied to many layers at once. For example, you may want all blocks in a component to have the same activation function. There are a few ways to do this, but the most powerful of all is the selection system. This will be more thoroughly explained in [GS181 Configuring Deeplay Objects](GS181_configure.ipynb), but the basic idea is that you can select a subset of layers in a module and apply a configuration to them. This is done using the `.__getitem__()` method. For example, to apply an activation function to all blocks in a component, you can use the following code.

In [5]:
import torch.nn as nn

mlp = dl.MultiLayerPerceptron(728, [32, 16], 10)
mlp["blocks", :].all.configure(activation=dl.Layer(nn.Tanh))

print(mlp)

MultiLayerPerceptron(
  (blocks): LayerList(
    (0): LinearBlock(
      (layer): Layer[Linear](in_features=728, out_features=32, bias=True)
      (activation): Layer[Tanh]()
    )
    (1): LinearBlock(
      (layer): Layer[Linear](in_features=32, out_features=16, bias=True)
      (activation): Layer[Tanh]()
    )
    (2): LinearBlock(
      (layer): Layer[Linear](in_features=16, out_features=10, bias=True)
      (activation): Layer[Tanh]()
    )
  )
)


The following are a few ways to achieve the same configuration.

In [6]:
mlp = dl.MultiLayerPerceptron(728, [32, 16], 10)
mlp["blocks", :].all.activated(nn.Tanh)

print(mlp)

MultiLayerPerceptron(
  (blocks): LayerList(
    (0): LinearBlock(
      (layer): Layer[Linear](in_features=728, out_features=32, bias=True)
      (activation): Layer[Tanh]()
    )
    (1): LinearBlock(
      (layer): Layer[Linear](in_features=32, out_features=16, bias=True)
      (activation): Layer[Tanh]()
    )
    (2): LinearBlock(
      (layer): Layer[Linear](in_features=16, out_features=10, bias=True)
      (activation): Layer[Tanh]()
    )
  )
)


In [7]:
mlp = dl.MultiLayerPerceptron(728, [32, 16], 10)
mlp[...].hasattr("activated").all.activated(nn.Tanh)

print(mlp)

MultiLayerPerceptron(
  (blocks): LayerList(
    (0): LinearBlock(
      (layer): Layer[Linear](in_features=728, out_features=32, bias=True)
      (activation): Layer[Tanh]()
    )
    (1): LinearBlock(
      (layer): Layer[Linear](in_features=32, out_features=16, bias=True)
      (activation): Layer[Tanh]()
    )
    (2): LinearBlock(
      (layer): Layer[Linear](in_features=16, out_features=10, bias=True)
      (activation): Layer[Tanh]()
    )
  )
)


In [8]:
mlp = dl.MultiLayerPerceptron(728, [32, 16], 10)
mlp[...].isinstance(dl.LinearBlock).all.activated(nn.Tanh)

print(mlp)

MultiLayerPerceptron(
  (blocks): LayerList(
    (0): LinearBlock(
      (layer): Layer[Linear](in_features=728, out_features=32, bias=True)
      (activation): Layer[Tanh]()
    )
    (1): LinearBlock(
      (layer): Layer[Linear](in_features=32, out_features=16, bias=True)
      (activation): Layer[Tanh]()
    )
    (2): LinearBlock(
      (layer): Layer[Linear](in_features=16, out_features=10, bias=True)
      (activation): Layer[Tanh]()
    )
  )
)


In [9]:
mlp = dl.MultiLayerPerceptron(728, [32, 16], 10)
for block in mlp.blocks:
    block.activated(nn.Tanh)
    
print(mlp)

MultiLayerPerceptron(
  (blocks): LayerList(
    (0): LinearBlock(
      (layer): Layer[Linear](in_features=728, out_features=32, bias=True)
      (activation): Layer[Tanh]()
    )
    (1): LinearBlock(
      (layer): Layer[Linear](in_features=32, out_features=16, bias=True)
      (activation): Layer[Tanh]()
    )
    (2): LinearBlock(
      (layer): Layer[Linear](in_features=16, out_features=10, bias=True)
      (activation): Layer[Tanh]()
    )
  )
)


There are many such methods, and they are usually composable. For example, let's say you want a block that first applies the layer and the activation, then has an additive shortcut connection from the input, and finally applies a normalization. You can do this as follows.

In [10]:
block = (
    dl.LinearBlock(64, 64)
    .activated(nn.GELU)
    .shortcut()
    .normalized(nn.LayerNorm)
    .build()
)

print(block)

LinearBlock(
  (shortcut_start): Identity()
  (layer): Linear(in_features=64, out_features=64, bias=True)
  (activation): GELU(approximate='none')
  (shortcut_end): Add()
  (normalization): LayerNorm((64,), eps=1e-05, elementwise_affine=True)
)


This is a very powerful system that allows you to easily create complex blocks and components. It can of course be used on blocks inside of components or models as well.

In [11]:
model = dl.MultiLayerPerceptron(784, [64, 64], 10)
model["blocks", :-1] \
    .all \
    .activated(nn.ReLU) \
    .shortcut() \
    .normalized(nn.LayerNorm)

print(model)

MultiLayerPerceptron(
  (blocks): LayerList(
    (0): LinearBlock(
      (shortcut_start): Layer[Identity]()
      (layer): Layer[Linear](in_features=784, out_features=64, bias=True)
      (activation): Layer[ReLU]()
      (shortcut_end): Add()
      (normalization): Layer[LayerNorm](normalized_shape=64)
    )
    (1): LinearBlock(
      (shortcut_start): Layer[Identity]()
      (layer): Layer[Linear](in_features=64, out_features=64, bias=True)
      (activation): Layer[ReLU]()
      (shortcut_end): Add()
      (normalization): Layer[LayerNorm](normalized_shape=64)
    )
    (2): LinearBlock(
      (layer): Layer[Linear](in_features=64, out_features=10, bias=True)
      (activation): Layer[Identity]()
    )
  )
)


## Configuring Modules with Styles

Some special configurations of modules have been given names, and can be applied using the `.style()` method. For example, the `Conv2dBlock` has a `"resnet"` style that applies the resnet style residual connection. 

In [12]:
convblock = dl.Conv2dBlock(64, 64).style("resnet").build()

print(convblock)

Conv2dBlock(
  (blocks): Sequential(
    (0-1): 2 x Conv2dBlock(
      (shortcut_start): Conv2dBlock(
        (layer): Identity()
        (activation): Identity()
      )
      (blocks): Sequential(
        (0): Conv2dBlock(
          (layer): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1))
          (normalization): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (activation): ReLU()
        )
        (1): Conv2dBlock(
          (layer): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1))
          (normalization): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        )
      )
      (shortcut_end): Add()
      (activation): ReLU()
    )
  )
)


The same holds for components and models. For example, the `ConvolutionalEncoder2d` has a `"resnet18"` style that applies the resnet style residual connection to all blocks in the component, and styles the input and output blocks to match the resnet18 architecture.

#### 