# im2col opration in Pytorch
> see [How to keep the shape of input and output same when dilation conv?](https://discuss.pytorch.org/t/how-to-keep-the-shape-of-input-and-output-same-when-dilation-conv/14338)  
> see convolution visulization at https://ezyang.github.io/convolution-visualizer/index.html


## How to keep the shape of input and output same when dilation conv?
> see https://discuss.pytorch.org/t/how-to-keep-the-shape-of-input-and-output-same-when-dilation-conv/14338  

Given:
- o = output
- p = padding
- k = kernel_size
- s = stride
- d = dilation

To get output:
 $$o = [i + 2p - k - (k-1)*(d-1)]/s + 1$$

If we want the `same` size, and with $s = 1$, $k = 3$ in our project here, then we can get:  

==>  $o = i = i + 2p - 3 - 2(d-1) + 1$   
==>  $2p - 3 - 2d + 2 + 1 = 0$  
==>  $2p - 2d = 0$  
==>  $p = d$  

## im2col in PyTorch via torch.nn.Unfold

PyTorch im2col (i.e., `nn.Unfold`) flattens each $k \times k$ block into a column which conains $C*(k*k)$ values, where $k*k$ is a continugous chunk, with $C$ be the Channel dimension. That means we can reshape the $C*(k*k)$ elements to $C \times (k*k)$,  instead of $(k*k) \times C$.

### Example 1: 
In this example, $C = 1$, so I made a wrong conclusion: to reshape $C*(k*k)$ elements to $(k*k) \times C$. Because $C=1$, so that is no differece between $1*(k*k) \rightarrow (k*k) \times 1$  and $1*(k*k) \rightarrow 1 \times (k*k)$, w.r.t. matrix element indexing (in memory). For example, x in shape $[N,C,H,W] = [2,1,9,9]$, and kernel size $k = 3$. So the im2col result has the shape $[N, C*k*k, H, W] = [2,1*3*3,9,9]$, you can reshape it to $[N, 1,k*k, H,W]$ or $[N, k*k, 1, H,W]$, and this is no difference for this case.

In [2]:
import torch
import torch.nn as nn


device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

unfold = nn.Unfold(kernel_size=(3, 3), dilation = 1, padding = 1, stride = 1)
x = torch.randn(2, 1, 9, 9)
x = x.to(device)
h,w = x.size()[2:4]
print (x[0,0,4:7,2:5])
y = unfold(x)
y = y.view(y.size(0), y.size(1), h, w)
print (y[0,:,5,3])
# each patch contains 30 values (2x3=6 vectors, each of 
#5 channels) 
# 4 blocks (2x3 kernels) in total in the 3x4 input
print (y.size())

tensor([[-0.1415,  0.9311, -0.4274],
        [-1.9487, -0.6907, -0.0047],
        [ 0.3701, -1.3672,  0.5315]], device='cuda:0')
tensor([-0.1415,  0.9311, -0.4274, -1.9487, -0.6907, -0.0047,  0.3701, -1.3672,
         0.5315], device='cuda:0')
torch.Size([2, 9, 9, 9])


### Example 2: 
In this example, $C = 3$, so I found the final correct conclusion: to reshape $C*(k*k)$ elements to $C \times (k*k)$, using this example. Note that $k*k$ is a continugous chunk in memory.

In [8]:
import torch
import torch.nn as nn


device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
k = 3
p = k//2
print ('padding = ', p)
unfold = nn.Unfold(kernel_size=(3, 3), dilation = 1, padding = p, stride = 1)
x = torch.randn(2, 3, 9, 9)
x = x.to(device)
N,C,H,W = x.size()[:]
print ('x[0,c=0,4:7,2:5] =\n', x[0,0,4:7,2:5])
print ('x[0,c=1,4:7,2:5] =\n', x[0,1,4:7,2:5])
print ('x[0,c=2,4:7,2:5] =\n', x[0,2,4:7,2:5])

y0 = unfold(x) # in size NxC*k*kxHxW
y = y0.view(N, C*k*k, H, W)
print ('Reshape to 4D: y[0,:,5,3] =\n', y[0,:,5,3])

y = y0.view(N, C, k*k, H, W)
print ('Reshape to 5D, y[0,c=0,:,5,3] =\n', y[0,0,:,5,3])
print ('y[0,c=1,:,5,3] =\n', y[0,1,:,5,3])
print ('y[0,c=2,:,5,3] =\n', y[0,2,:,5,3])
# each patch contains 30 values (2x3=6 vectors, each of 
#5 channels) 
# 4 blocks (2x3 kernels) in total in the 3x4 input
print (y.size())

padding =  1
x[0,c=0,4:7,2:5] =
 tensor([[ 0.4197, -0.3778, -0.0688],
        [-0.1916, -1.2742, -0.6662],
        [-0.3673,  1.0866, -1.7710]], device='cuda:0')
x[0,c=1,4:7,2:5] =
 tensor([[-0.6965,  0.7519,  0.7451],
        [-0.9081,  0.0528, -0.5768],
        [-0.7495,  0.9167,  0.8326]], device='cuda:0')
x[0,c=2,4:7,2:5] =
 tensor([[-1.3579,  1.5046,  2.3742],
        [-0.0505,  0.5745,  0.6749],
        [-0.6443, -0.4178, -0.6896]], device='cuda:0')
Reshape to 4D: y[0,:,5,3] =
 tensor([ 0.4197, -0.3778, -0.0688, -0.1916, -1.2742, -0.6662, -0.3673,  1.0866,
        -1.7710, -0.6965,  0.7519,  0.7451, -0.9081,  0.0528, -0.5768, -0.7495,
         0.9167,  0.8326, -1.3579,  1.5046,  2.3742, -0.0505,  0.5745,  0.6749,
        -0.6443, -0.4178, -0.6896], device='cuda:0')
Reshape to 5D, y[0,c=0,:,5,3] =
 tensor([ 0.4197, -0.3778, -0.0688, -0.1916, -1.2742, -0.6662, -0.3673,  1.0866,
        -1.7710], device='cuda:0')
y[0,c=1,:,5,3] =
 tensor([-0.6965,  0.7519,  0.7451, -0.9081,  0.0528,