Copyright (c) MONAI Consortium  
Licensed under the Apache License, Version 2.0 (the "License");  
you may not use this file except in compliance with the License.  
You may obtain a copy of the License at  
&nbsp;&nbsp;&nbsp;&nbsp;http://www.apache.org/licenses/LICENSE-2.0  
Unless required by applicable law or agreed to in writing, software  
distributed under the License is distributed on an "AS IS" BASIS,  
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.  
See the License for the specific language governing permissions and  
limitations under the License.

# UNet input size constraints

MONAI provides an enhanced version of UNet (``monai.networks.nets.UNet ``), which not only supports residual units, but also can use more hyperparameters (like ``strides``, ``kernel_size`` and ``up_kernel_size``) than ``monai.networks.nets.BasicUNet``. However, ``UNet`` has some constraints for both network hyperparameters and sizes of input.

The constraints of hyperparameters can be found in the docstring of the network, and this tutorial is focused on how to determine a reasonable input size.

The last section: **Constraints of UNet** shows the conclusions.

## Setup environments

In [1]:
!python -c "import monai" || pip install -q monai-weekly

## Setup imports

In [2]:
from monai.networks.nets import UNet
import monai
import math
import torch
import torch.nn as nn

monai.config.print_config()

IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html


MONAI version: 0+untagged.2891.gccd32ca
Numpy version: 1.25.1
Pytorch version: 2.0.1
MONAI flags: HAS_EXT = False, USE_COMPILED = False, USE_META_DICT = False
MONAI rev id: ccd32ca5e9e84562d2f388b45b6724b5c77c1f57
MONAI __file__: /Users/<username>/Envs/monai/lib/python3.9/site-packages/monai/__init__.py

Optional dependencies:
Pytorch Ignite version: 0.4.11
ITK version: 5.3.0
Nibabel version: 5.1.0
scikit-image version: 0.21.0
scipy version: 1.11.1
Pillow version: 10.0.0
Tensorboard version: 2.13.0
gdown version: 4.7.1
TorchVision version: 0.15.2
tqdm version: 4.65.0
lmdb version: 1.4.1
psutil version: 5.9.5
pandas version: 2.0.3
einops version: 0.6.1
transformers version: 4.21.3
mlflow version: 2.4.2
pynrrd version: 1.0.0
clearml version: 1.11.2rc0

For details about installing the optional dependencies, please visit:
    https://docs.monai.io/en/latest/installation.html#installing-the-recommended-dependencies



## Check UNet structure

The following comes from: [Left-Ventricle Quantification Using Residual U-Net](https://link.springer.com/chapter/10.1007/978-3-030-12029-0_40).

![image](../figures/UNet_structure.png)

First of all, let's build an UNet instance to check its structure. `num_res_units` is set to `0` since it has no impact on the input size.

In [3]:
network_0 = UNet(
    spatial_dims=3,
    in_channels=3,
    out_channels=3,
    channels=(8, 16, 32),
    strides=(2, 3),
    kernel_size=3,
    up_kernel_size=3,
    num_res_units=0,
)
print(len(network_0.model))

network_0

3


UNet(
  (model): Sequential(
    (0): Convolution(
      (conv): Conv3d(3, 8, kernel_size=(3, 3, 3), stride=(2, 2, 2), padding=(1, 1, 1))
      (adn): ADN(
        (N): InstanceNorm3d(8, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False)
        (D): Dropout(p=0.0, inplace=False)
        (A): PReLU(num_parameters=1)
      )
    )
    (1): SkipConnection(
      (submodule): Sequential(
        (0): Convolution(
          (conv): Conv3d(8, 16, kernel_size=(3, 3, 3), stride=(3, 3, 3), padding=(1, 1, 1))
          (adn): ADN(
            (N): InstanceNorm3d(16, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False)
            (D): Dropout(p=0.0, inplace=False)
            (A): PReLU(num_parameters=1)
          )
        )
        (1): SkipConnection(
          (submodule): Convolution(
            (conv): Conv3d(16, 32, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1))
            (adn): ADN(
              (N): InstanceNorm3d(32, eps=1e-05, momentum=0.1

As we can see from the printed structure, the network is consisted with three parts:

1. The first down layer.
2. The intermediate skip connection based block.
3. The final up layer.

If we want to build a deeper UNet, only the intermediate block will be expanded.

During the network, there are only two different modules:
1. `monai.networks.blocks.convolutions.Convolution`
2. `monai.networks.layers.simplelayers.SkipConnection`

All these modules are consisted with the following four layers:
1. Activation layers (`PReLU`).
2. Dropout layers (`Dropout`).
3. Normalization layers (`InstanceNorm3d`).
4. Convolution layers (`Conv` and `ConvTranspose`).

As for the layers, convolution layers may change the size of the input, and normalization layers may have extra constraints of the input size.
As for the modules, the `SkipConnection` module also has some constraints.

Consequently, This tutorial shows the constraints of convolution layers, normalization layers and the `SkipConnection` module respectively.

## Constraints of convolution layers

### Conv layer

The formula in Pytorch's official docs explains how to calculate the output size for [Conv3d](https://pytorch.org/docs/stable/generated/torch.nn.Conv3d.html#torch.nn.Conv3d), and [ConvTranspose3d](https://pytorch.org/docs/stable/generated/torch.nn.ConvTranspose3d.html#torch.nn.ConvTranspose3d) (the formulas for `1d` and `2d` are similar).

As the docs shown, the output size depends on the input size and:
- `stride`
- `kernel_size`
- `dilation`
- `padding`

In `monai.networks.nets.UNet`, users can only input `strides` and `kernel_size`, and the other two parameters are decided by [monai.networks.blocks.convolutions.Convolution](https://github.com/Project-MONAI/MONAI/blob/dev/monai/networks/blocks/convolutions.py) (please click the link for details).

Therefore, here `dilation = 1` and `padding = (kernel_size - 1) / 2` (`kernel_size` is required to be odd, thus here `padding` is an integer).

The output size of `Conv` can be calculated via the following simplified formula:
`math.floor((input_size + stride - 1) / stride)`. The corresponding python function is as follow, and we only need to ensure **`math.floor((input_size + stride - 1) / stride) >= 1`**, which means **`input_size >= 1`**.

In [4]:
def get_conv_output_size(input_tensor, stride):
    output_size = []
    input_size = list(input_tensor.shape)[2:]
    for size in input_size:
        out = math.floor((size + stride - 1) / stride)
        output_size.append(out)
    print(output_size)

Let's check if the function is correct:

In [5]:
stride_value = 3
example = torch.rand([1, 3, 1, 15, 29])
get_conv_output_size(example, stride_value)

[1, 5, 10]


In [6]:
output = nn.Conv3d(in_channels=3, out_channels=1, stride=stride_value, kernel_size=3, padding=1)(example)
print(output.shape[2:])

torch.Size([1, 5, 10])


### ConvTranspose layer

Similarly, due to the default settings in [monai.networks.blocks.convolutions.Convolution](https://github.com/Project-MONAI/MONAI/blob/dev/monai/networks/blocks/convolutions.py), `output_padding = stride - 1`. The output size of `ConvTranspose` can be simplified as:
`input_size * stride`.
Therefore, before entering the `ConvTranspose` layer, we only need to ensure **`input_size >= 1`**.
Let's check if the formula is correct:

In [7]:
stride_value = 3
print([i * stride_value for i in example.shape[2:]])

[3, 45, 87]


In [8]:
output = nn.ConvTranspose3d(
    in_channels=3,
    out_channels=1,
    stride=stride_value,
    kernel_size=3,
    padding=1,
    output_padding=stride_value - 1,
)(example)
print(output.shape[2:])

torch.Size([3, 45, 87])


## Constraints of normalization layers

In [9]:
print(monai.networks.layers.factories.Norm.names)

('INSTANCE', 'BATCH', 'INSTANCE_NVFUSER', 'GROUP', 'LAYER', 'LOCALRESPONSE', 'SYNCBATCH')


In MONAI's norm factories, There are six normalization layers can be used. The official docs can be found in [here](https://pytorch.org/docs/stable/nn.html#normalization-layers), and their constraints is shown in [torch.nn.functional](https://pytorch.org/docs/stable/_modules/torch/nn/functional.html).

However, the following normalization layers will not be discussed:
1. SyncBatchNorm, since it only supports `DistributedDataParallel`, please check the official docs for more details.
2. LayerNorm, since its parameter `normalized_shape` should equal to `[num_channels, *spatial_dims]`, and we cannot define a fixed value for it for all normalization layers in the network.
3. GroupNorm, since its parameter `num_channels` should equal to the number of channels of the input, and we cannot define a fixed value for it for all normalization layers in the network.

Therefore, let's check the other three normalization layers: batch normalization, instance normalization and local response normalization.

### batch normalization

The input size should meet: `torch.nn.functional._verify_batch_size`, and it requires the product of all dimensions except the channel dimension is larger than 1. For example:

In [10]:
batch = nn.BatchNorm3d(num_features=3)
for size in [[1, 3, 2, 1, 1], [2, 3, 1, 1, 1]]:
    output = batch(torch.randn(size))

# uncomment the following line you can see a ValueError
# batch(torch.randn([1, 3, 1, 1, 1]))

In reality, when batch size is 1, it's not practical to use batch normalizaton. Therefore, the constraints can be converted to **the batch size should be larger than 1**.

### instance normalization

The input size should meet: `torch.nn.functional._verify_spatial_size`, and it requires the product of all spatial dimensions is larger than 1. Therefore, **at least one spatial dimension should have a size larger than 1**. For example:

In [11]:
instance = nn.InstanceNorm3d(num_features=3)
for size in [[1, 3, 2, 1, 1], [1, 3, 1, 2, 1]]:
    output = instance(torch.randn(size))

# uncomment the following line you can see a ValueError
# instance(torch.randn([2, 3, 1, 1, 1]))

### local response normalization

**No constraint**. For example:

In [12]:
nn.LocalResponseNorm(size=1)(torch.randn(1, 1, 1, 1, 1))

tensor([[[[[-0.6150]]]]])

## Constraints of SkipConnection

In this section, we will check if the module [SkipConnection](https://github.com/Project-MONAI/MONAI/blob/dev/monai/networks/layers/simplelayers.py) itself has more constraints for the input size.

In `UNet`, the `SkipConnection` is called via:

`nn.Sequential(down, SkipConnection(subblock), up)`

and the following line will be called (in forward function):

`torch.cat([x, self.submodule(x)], dim=1)`. 

It requires for an input tensor, the output of `self.submodule` should not change spatial sizes. 

### When `len(channels) = 2` 

If `len(channels) = 2`, there will only have one `SkipConnection` module in the network, and the module is built by a single down layer with `stride = 1`. From the formulas we achieved in the previous section, we know that this layer will not change the size, thus we only need to meet the constraints from the inside normalization layer:

1. When using batch normalization, the batch size should larger than 1.

2. When using instance normalization, size of at least one spatial dimension should larger than 1.

### When `len(channels) > 2` 

If `len(channels) > 2`, more `SkipConnection` module will be built and each of the module is consisted with one down layer and one up layer. Consequently, **the output of the up layer should has the same spatial sizes as the input before entering into the down layer**. The corresponding stride values for these modules are coming from `strides[1:]`, hence for each stride value `s` from `strides[1:]`, for each spatial size value `v` of the input, the constraint of the corresponding `SkipConnection` module is:

```
math.floor((v + s - 1) / s) = v / s

```

Since the left-hand side of the equation is a positive integer, `input_size` must be divisible by `stride`. If we assume `v = k * s` where `k >= 1`, we can get:
```
math.floor(k + (s - 1) / s) = k
k + math.floor((s -1) / s) = k
math.floor((s -1) / s) = 0
```
Obviously, the above equations are always true, thus for a single `SkipConnection` module, all spatial sizes of the input must be divisible by `s`.

For the whole `SkipConnection` module, assume `[H, W, D]` is the input spatial size, then for `v in [H, W, D]`:

**`np.remainder(v, np.prod(strides[1:])) == 0`**

In addition, there may have more constraints from normalization layers:

1. When using batch normalization, the batch size of the input should be larger than 1.

2. When using instance normalization, size of at least one spatial dimension should larger than 1. Therefore, **assume `d = max(H, W, D)`, `d` should meet: `np.remainder(d, 2 * np.prod(strides[1:])) == 0`**.

## Constraints of UNet

As the first section discussed, UNet is consisted with 1) a down layer, 2) one or mode skip connection module(s) and 3) an up layer. Based on the analyses for each single layer/module, the constraints of the network can be summarized as follow.

### When `len(channels) = 2`

If `len(channels) == 2`, `strides` must be a single value, thus assume `s = strides`, and the input size is `[B, C, H, W, D]`. The constraints are:

1. If using batch normalization: **`B > 1`.**
2. If using local response normalization: no constraint.
3. If using instance normalization, assume `d = max(H, W, D)`, then `math.floor((d + s - 1) / s) >= 2`, which means **`d >= s + 1`.**

The following are the corresponding examples:

In [13]:
# example 1: len(channels) = 2, batch norm, batch size > 1.
network = UNet(
    spatial_dims=3,
    in_channels=1,
    out_channels=3,
    channels=(8, 16),
    strides=(3,),
    kernel_size=3,
    up_kernel_size=3,
    num_res_units=0,
    norm="batch",
)
example = torch.rand([2, 1, 1, 1, 1])
print(network(example).shape)

# # uncomment the following two lines will see the error
# example = torch.rand([1, 1, 1, 1, 1])
# print(network(example).shape)

torch.Size([2, 3, 3, 3, 3])


In [14]:
# example 2: len(channels) = 2, localresponse.
network = UNet(
    spatial_dims=3,
    in_channels=1,
    out_channels=3,
    channels=(8, 16),
    strides=(3,),
    kernel_size=1,
    up_kernel_size=1,
    num_res_units=1,
    norm=("localresponse", {"size": 1}),
)
example = torch.rand([1, 1, 1, 1, 1])
print(network(example).shape)

torch.Size([1, 3, 3, 3, 3])


In [15]:
# example 3: len(channels) = 2, instance norm.
network = UNet(
    spatial_dims=3,
    in_channels=1,
    out_channels=3,
    channels=(8, 16),
    strides=(3,),
    kernel_size=3,
    up_kernel_size=5,
    num_res_units=2,
    norm="instance",
)
example = torch.rand([1, 1, 4, 1, 1])
print(network(example).shape)

# # uncomment the following two lines will see the error
# example = torch.rand([1, 1, 1, 1, 3])
# print(network(example).shape)

torch.Size([1, 3, 6, 3, 3])


### When `len(channels) > 2`

Assume the input size is `[B, C, H, W, D]`, and `s = strides`. The common constraints are:

```
For v in [H, W, D]:
     size = math.floor((v + s[0] - 1) / s[0])
     size should meet: np.remainder(size, np.prod(s[1:])) == 0
```
In addition,
1. If using batch normalization: **`B > 1`.**
2. If using local response normalization: no more constraint.
3. If using instance normalization, then:
```
d = max(H, W, D)
max_size = math.floor((d + s[0] - 1) / s[0])
max_size should meet: np.remainder(max_size, 2 * np.prod(s[1:])) == 0
```

The following are the corresponding examples:

In [16]:
# example 1: strides=(3, 5), batch norm, batch size > 1.
# thus math.floor((v + 2) / 3) should be 5 * k. If k = 1, v should be in [13, 14, 15].
network = UNet(
    spatial_dims=3,
    in_channels=1,
    out_channels=3,
    channels=(8, 16, 32),
    strides=(3, 5),
    kernel_size=3,
    up_kernel_size=3,
    num_res_units=0,
    norm="batch",
)
example = torch.rand([2, 1, 13, 14, 15])
print(network(example).shape)

# # uncomment the following two lines will see the error
# example = torch.rand([1, 1, 12, 14, 15])
# print(network(example).shape)

torch.Size([2, 3, 15, 15, 15])


In [17]:
# example 2: strides=(3, 2, 4), localresponse.
# thus math.floor((v + 2) / 3) should be 8 * k. If k = 1, v should be in [22, 23, 24].
network = UNet(
    spatial_dims=3,
    in_channels=1,
    out_channels=3,
    channels=(8, 16, 32, 16),
    strides=(3, 2, 4),
    kernel_size=1,
    up_kernel_size=3,
    num_res_units=10,
    norm=("localresponse", {"size": 1}),
)
example = torch.rand([1, 1, 22, 23, 24])
print(network(example).shape)

# # uncomment the following two lines will see the error
# example = torch.rand([1, 1, 25, 23, 24])
# print(network(example).shape)

torch.Size([1, 3, 24, 24, 24])


In [18]:
# example 3: strides=(1, 2, 2, 3), instance norm.
# thus v should be 12 * k. If k = 1, v should be 12. In addition, the maximum size should be 24 * k.

network = UNet(
    spatial_dims=3,
    in_channels=1,
    out_channels=3,
    channels=(8, 16, 32, 32, 16),
    strides=(1, 2, 2, 3),
    kernel_size=5,
    up_kernel_size=3,
    num_res_units=5,
    norm="instance",
)
example = torch.rand([1, 1, 24, 12, 12])
print(network(example).shape)

# # uncomment the following two lines will see the error
# example = torch.rand([1, 1, 12, 12, 12])
# print(network(example).shape)

torch.Size([1, 3, 24, 12, 12])
