The convolution layer has three parameters, the stride, the padding and the filter. We will explain the forward pass and the back propagation using a 2d image with a number of channels input as a minibatch as an example but the idea is valid for any number of dimensions.

The input of a convolution layer is denoted as $l^0$ and output as $l^1$. With indices

$$
l^0_{I,i,j,k}
$$

where the indices are mini-batch, pixel row and column and channel. The notion of embedding and stride comes in by embedding this image in a (possibly) bigger 0-padded array thus

$$
\tilde l^0_{I,P,P,k} = l^0_{I,i,j,k}
$$

where $P$ is the padding. 


The convolution part gives the output

$$
l^1_{I,i,j,m} = \sum_{\alpha,\beta,k} \tilde l^0_{I,si+\alpha,sj+\beta,k} W_{\alpha,\beta,k,m}
$$

where $s$ is the stride. Note that both the padding and the stride can be different for different directions but we take them the same for simplicity. The effect of padding is to increase the image size  additively and that of the stride is to decrease it multiplicaively.

In fact if the size of the original image is $n$ then the size of the ouput is $\tilde n=[\frac{n+2P-K}{s}]+1$ where $[x]$ means the integral part of $x$.

The back propagation is easy to workout. For the weights we get 

$$
\frac{\partial L}{\partial W_{\alpha,\beta,k,m}} = \sum_{I,i,j} \frac{\partial L}{\partial l^1_{I,i,j,m}} \tilde l^0_{I,si+\alpha,sj+\beta,k}
$$

And for the input layer it is

$$
\frac{\partial L}{\partial l^0_{I,p,q,l}} = \sum_{\alpha,\beta,m} \left(\frac{\partial L}{\partial l^1} \right)_{I,\frac{p+P-\alpha}{s} , \frac{q+P-\beta}{s},m} W_{\alpha,\beta,l,m} \\
=  \sum_{\alpha,\beta,m} \left(\frac{\partial L}{\partial l^1} \right)_{I,\frac{p+P-K+1+\alpha}{s} , \frac{q+P-K+1+\beta}{s},m} \tilde W_{\alpha,\beta,m,l}
$$

where

$$
\tilde W_{\alpha,\beta,m,l} = W_{K-\alpha, K-\beta,l,m}
$$

Which is just a convolution with the filter flipped.

Now the intresting thing is that we can first embed the error wrt to the output in an array of size $s \tilde n$

$$
z[I,si,sj,m]=\left(\frac{\partial L}{\partial l^1} \right)_{I,i,j,m}
$$

and then depending on the sign of $P-K+1$ we either embed $z$ on a left side zero padded array $y$ or embed part of the array $z$ from the starting of the array $y$ as below. The right side of the array $y$ is zero padded by $P$ in either case.

Note that in case $P < (K-1)/2$ we need to zero bad z on the right so that its size is at least that out the input array.

The we have


$$
\frac{\partial L}{\partial l^0_{I,p,q,l}}=\sum_{\alpha,\beta,m} y_{I,p+\alpha,q+\beta,m} \tilde W_{\alpha,\beta,m,l}
$$