# Transposed Convolutions
The CNN layers we have seen so far, such as convolutional layers and pooling layers typically reduce (downsample) the spatial dimensions (height and width) of the input, or keep them unchanged.

In semantic segmentation that classifies at pixel-level, it will be convenient if the spatial dimensions of the input and output are the same. For example, the channel dimension at one output pixel can hold the classification results for the input pixel at the same spatial position.

To achieve this, especially after the spatial dimensions are reduced by CNN layers, we can use another type of CNN layers that can increase (upsample) the spatial dimensions of intermediate feature maps.

In this section, we will introduce transposed convolution, which is also called fractionally-strided convolution, for reversing downsampling operations by the convolution.

## Basic Operations
Ignoring channels for now, let’s begin with the basic transposed convolution operation with stride of 1 and no padding.


Suppose that we are given a $n_h \times n_w$ input tensor and a $k_h \times k_w$ kernel. Sliding the kernel window with stride of 1 for $n_w$ times in each row
and $n_h$ times in each column yields a total of $n_h n_w$ intermediate results. Each intermediate result is a $(n_h + k_h - 1) \times (n_w + k_w - 1)$ tensor that are initialized as zeros

<img src='img_1.png'>



In [3]:
import torch

def trans_conv(X_in, Kernel):
    h, w = Kernel.shape
    Y = torch.zeros(X_in.shape[0] + h -1, X_in.shape[1] + w -1)
    for i in range(X_in.shape[0]):
        for j in range(X_in.shape[1]):
            Y[i: i + h, j: j + w] += X_in[i,j] * Kernel
    return Y

In [4]:
X = torch.tensor([[0.0, 1.0], [2.0, 3.0]])
K = torch.tensor([[0.0, 1.0], [2.0, 3.0]])
trans_conv(X, K)

tensor([[ 0.,  0.,  1.],
        [ 0.,  4.,  6.],
        [ 4., 12.,  9.]])

In [5]:
X.shape

torch.Size([2, 2])

Alternatively, when the input X and kernel K are both four-dimensional tensors, we can use high-level APIs to obtain the same results.

In [6]:
from torch.nn import ConvTranspose2d
X, K = X.reshape(1, 1, 2, 2), K.reshape(1, 1, 2, 2)
tconv = ConvTranspose2d(1, 1, kernel_size=2, bias=False)
tconv.weight.data = K
tconv(X)

tensor([[[[ 0.,  0.,  1.],
          [ 0.,  4.,  6.],
          [ 4., 12.,  9.]]]], grad_fn=<ConvolutionBackward0>)

## Padding, Strides, and Multiple Channels

Different from in the regular convolution where padding is applied to input, **it is applied to output in the transposed convolution**.

For example, when specifying the padding number on either side of the height and width as 1:
* The first and last rows and columns will be removed from the transposed convolution output.


In [7]:
tconv = ConvTranspose2d(1, 1, kernel_size=2, padding=1, bias=False)
tconv(X)

tensor([[[[1.1750]]]], grad_fn=<ConvolutionBackward0>)

In [9]:
tconv.weight.data = K
print(tconv(X).shape)
tconv(X)

torch.Size([1, 1, 1, 1])


tensor([[[[4.]]]], grad_fn=<ConvolutionBackward0>)

<img src='img_2.png'>

**In the transposed convolution, strides are specified for intermediate results (thus output), not for input**

In [10]:
tconv = ConvTranspose2d(1, 1, kernel_size=2, stride=2,  bias=False)
tconv(X).shape

torch.Size([1, 1, 4, 4])

In [11]:
tconv = ConvTranspose2d(1, 1, kernel_size=2, stride=1,  bias=False)
tconv(X).shape

torch.Size([1, 1, 3, 3])

In [13]:
tconv = ConvTranspose2d(1, 1, kernel_size=2, stride=1,  bias=False)
tconv(X).shape

torch.Size([1, 1, 3, 3])

For multiple input and output channels, the transposed convolution works in the same way as the regular convolution.

Suppose that the input has $c_i$ channels, and that the transposed convolution assigns a $k_h \ * \ k_w $ kernel tensor to each input channel. When multiple output channels are specified, we will have a $c_i * \ k_h \ * \ k_w $ kernel for each output channel.

As in all, if we feed $\mathsf{X}$ into a convolutional layer $f$ to output $\mathsf{Y}=f(\mathsf{X})$ and create a transposed convolutional layer $g$ with the same hyperparameters as $f$ except for the number of output channels being the number of channels in $\mathsf{X}$, then $g(Y)$ will have the same shape as $\mathsf{X}$.

In [16]:
from torch.nn import Conv2d
X = torch.randn(1, 10 , 16, 16)
conv = Conv2d(10 , 20, kernel_size=5, padding=2, stride=3)
tconv = ConvTranspose2d(20, 10, kernel_size=5, padding=2, stride=3)
tconv(conv(X)).shape

torch.Size([1, 10, 16, 16])

In [17]:
conv(X).shape

torch.Size([1, 20, 6, 6])

In [15]:
conv = Conv2d(10 , 20, kernel_size=3, padding=2, stride=3)
tconv = ConvTranspose2d(20, 10, kernel_size=3, padding=2, stride=3)
tconv(conv(X)).shape

torch.Size([1, 10, 14, 14])

## Connection to Matrix Transposition
The transposed convolution is named after the matrix transposition.

In [18]:
from d2l import torch as d2l
X = torch.arange(9.0).reshape(3, 3)
K = torch.tensor([[1.0, 2.0], [3.0, 4.0]])
Y = d2l.corr2d(X, K)
Y

tensor([[27., 37.],
        [57., 67.]])

In [19]:
def kernel2matrix(K):
    k, W = torch.zeros(5), torch.zeros((4, 9))
    k[:2], k[3:5] = K[0, :], K[1, :]
    W[0, :5], W[1, 1:6], W[2, 3:8], W[3, 4:] = k, k, k, k
    return W

W = kernel2matrix(K)
W

tensor([[1., 2., 0., 3., 4., 0., 0., 0., 0.],
        [0., 1., 2., 0., 3., 4., 0., 0., 0.],
        [0., 0., 0., 1., 2., 0., 3., 4., 0.],
        [0., 0., 0., 0., 1., 2., 0., 3., 4.]])

In [20]:
Y == torch.matmul(W, X.reshape(-1)).reshape(2, 2)

tensor([[True, True],
        [True, True]])

Likewise, we can implement transposed convolutions using matrix multiplications

In [21]:
Z = trans_conv(Y, K)
Z == torch.matmul(W.T, Y.reshape(-1)).reshape(3, 3)

tensor([[True, True, True],
        [True, True, True],
        [True, True, True]])

Consider implementing the convolution by multiplying matrices. Given an input vector $\mathbf{x}$ and a weight matrix $\mathbf{W}$, the forward propagation function of the convolution can be implemented by multiplying its input with the weight matrix and outputting a vector $\mathbf{y}=\mathbf{W}\mathbf{x}$. Since backpropagation
follows the chain rule and $\nabla_{\mathbf{x}}\mathbf{y}=\mathbf{W}^\top$, the backpropagation function of the convolution can be implemented by multiplying its input with the transposed weight matrix $\mathbf{W}^\top$. Therefore, the transposed convolutional layer can just exchange the forward propagation function and the backpropagation function of the convolutional layer: its forward propagation and backpropagation functions multiply their input vector with $\mathbf{W}^\top$ and $\mathbf{W}$, respectively.

## Summary
* In contrast to the regular convolution that reduces input elements via the kernel, the transposed convolution broadcasts input elements via the kernel, thereby producing an output that is larger than the input.
* If we feed $\mathsf{X}$ into a convolutional layer $f$ to output $\mathsf{Y}=f(\mathsf{X})$ and create a transposed convolutional layer $g$ with the same hyperparameters as $f$ except for the number of output channels being the number of channels in $\mathsf{X}$, then $g(Y)$ will have the same shape as $\mathsf{X}$.
* We can implement convolutions using matrix multiplications. The transposed convolutional layer can just exchange the forward propagation function and the backpropagation function of the convolutional layer.