# Conv1D VS Conv2D vs Conv3D

This is basically extracted from [this stackoverflow answer](https://stackoverflow.com/questions/42883547/what-do-you-mean-by-1d-2d-and-3d-convolutions-in-cnn)

## Conv1D

- input = [W], filter = [k], output = [W]
- output-shape is 1D array
- convolutional direction is one way. Just forward, for example

This is used for graph smoothing.

![conv1d-explanation](./conv1d.jpg)

In [3]:
import tensorflow as tf
import numpy as np

sess = tf.Session()

ones_1d = np.ones(5)
weight_1d = np.ones(3)
strides_1d = 1

in_1d = tf.constant(ones_1d, dtype=tf.float32)
filter_1d = tf.constant(weight_1d, dtype=tf.float32)

in_width = int(in_1d.shape[0])
filter_width = int(filter_1d.shape[0])

input_1d   = tf.reshape(in_1d, [1, in_width, 1])
kernel_1d = tf.reshape(filter_1d, [filter_width, 1, 1])
output_1d = tf.squeeze(tf.nn.conv1d(input_1d, kernel_1d, strides_1d, padding='SAME'))
print(sess.run(output_1d))

[ 2.  3.  3.  3.  2.]


## Conv2D

- 2-direction (x,y) to calculate conv
- output-shape is 2D Matrix
- input = [W, H], filter = [k,k] output = [W,H]

Toy example is analogous to above one, just that everything's in 2D now. Output HAS TO be 2D array.

![2d](https://i.stack.imgur.com/hvMaU.png)

In [8]:
ones_2d = np.ones((5,5))
weight_2d = np.ones((3,3))
strides_2d = [1, 1, 1, 1]

in_2d = tf.constant(ones_2d, dtype=tf.float32)
filter_2d = tf.constant(weight_2d, dtype=tf.float32)

in_width = int(in_2d.shape[0])
in_height = int(in_2d.shape[1])

filter_width = int(filter_2d.shape[0])
filter_height = int(filter_2d.shape[1])

input_2d   = tf.reshape(in_2d, [1, in_height, in_width, 1])
kernel_2d = tf.reshape(filter_2d, [filter_height, filter_width, 1, 1])

output_2d = tf.squeeze(tf.nn.conv2d(input_2d, kernel_2d, strides=strides_2d, padding='SAME'))
print(sess.run(output_2d))

[[ 4.  6.  6.  6.  4.]
 [ 6.  9.  9.  9.  6.]
 [ 6.  9.  9.  9.  6.]
 [ 6.  9.  9.  9.  6.]
 [ 4.  6.  6.  6.  4.]]


## Conv3D

- 3-direction (x,y,z) to calcuate conv
- output-shape is 3D Volume
- input = [W,H,L], filter = [k,k,d] output = [W,H,M]
- d < L is important! for making volume output

Remember, that input dimensions could be bigger, but output HAS TO BE 3D Volume.

![3d](https://i.stack.imgur.com/IvDQP.png)

In [9]:
ones_3d = np.ones((5,5,5))
weight_3d = np.ones((3,3,3))
strides_3d = [1, 1, 1, 1, 1]

in_3d = tf.constant(ones_3d, dtype=tf.float32)
filter_3d = tf.constant(weight_3d, dtype=tf.float32)

in_width = int(in_3d.shape[0])
in_height = int(in_3d.shape[1])
in_depth = int(in_3d.shape[2])

filter_width = int(filter_3d.shape[0])
filter_height = int(filter_3d.shape[1])
filter_depth = int(filter_3d.shape[2])

input_3d   = tf.reshape(in_3d, [1, in_depth, in_height, in_depth, 1])
kernel_3d = tf.reshape(filter_3d, [filter_depth, filter_height, filter_width, 1, 1])

output_3d = tf.squeeze(tf.nn.conv3d(input_3d, kernel_3d, strides=strides_3d, padding='SAME'))
print(sess.run(output_3d))

[[[  8.  12.  12.  12.   8.]
  [ 12.  18.  18.  18.  12.]
  [ 12.  18.  18.  18.  12.]
  [ 12.  18.  18.  18.  12.]
  [  8.  12.  12.  12.   8.]]

 [[ 12.  18.  18.  18.  12.]
  [ 18.  27.  27.  27.  18.]
  [ 18.  27.  27.  27.  18.]
  [ 18.  27.  27.  27.  18.]
  [ 12.  18.  18.  18.  12.]]

 [[ 12.  18.  18.  18.  12.]
  [ 18.  27.  27.  27.  18.]
  [ 18.  27.  27.  27.  18.]
  [ 18.  27.  27.  27.  18.]
  [ 12.  18.  18.  18.  12.]]

 [[ 12.  18.  18.  18.  12.]
  [ 18.  27.  27.  27.  18.]
  [ 18.  27.  27.  27.  18.]
  [ 18.  27.  27.  27.  18.]
  [ 12.  18.  18.  18.  12.]]

 [[  8.  12.  12.  12.   8.]
  [ 12.  18.  18.  18.  12.]
  [ 12.  18.  18.  18.  12.]
  [ 12.  18.  18.  18.  12.]
  [  8.  12.  12.  12.   8.]]]


## Using Conv2D for RGB Images instead of Conv3D

Now, when we actually work with RGB Images, we often use Conv2D only, instead of Conv3D. 
Remember that each filter gives one 2d matrix only. It is number of filters that create the depth! 

Frankly, theoritically you could use Conv3D as well for convolving. However, that increases computation complexity a bit. And people have just been following conv2D because it seems to work well.

Moreover, principal reasons.
2D convolution means to convolve two dimension data like picture or image, which has height and width. It is not for RGB channel; it is for height and width.Thus 3D convolution is for three dimensional data, like cube which has height, width and depth, or video which has height, width and time.

Further, In an RGB image, the image is a∗b∗3a∗b∗3 and the convolutional filters are cc∗d∗3∗d∗3. Since their third dimensions are equal, there is no need to convolve along that axis. You only convolve along the first two axes, making it a 2D convolution.
