<h2>Understanding Convolutional 1D and 2D diff</h2>

In [1]:
"""importing the required modules"""
import numpy as np
import tensorflow as tf


<p>Even after going through some of the papers and documentation was not able to understand the clear difference between conv 1D , 2D and 3D network. That is when I came across this answer in stackexchange and tried to implement the same
<a href = "https://stackoverflow.com/questions/42883547/what-do-you-mean-by-1d-2d-and-3d-convolutions-in-cnn"> here. </p>

<h3>Conv 1D</h3>
<p>A vector is used as the filter for convolution layer. For example, consider the case of graph smoothening. We multiply each value in the input with its resepective weights inorder to get a smoothened graph. Might not be able to get the 2d structure of objects in the image and hence less observed.</p>

In [8]:
input_1d   = tf.reshape(in_1d, [1, in_width, 1])
kernel_1d = tf.reshape(filter_1d, [filter_width, 1, 1])
output_1d = tf.squeeze(tf.nn.conv1d(input_1d, kernel_1d, strides_1d, padding='SAME'))
input_1d, kernel_1d

(<tf.Tensor 'Reshape:0' shape=(1, 5, 1) dtype=float32>,
 <tf.Tensor 'Reshape_1:0' shape=(3, 1, 1) dtype=float32>)

In [7]:
in_width = int(in_1d.shape[0])
filter_width = int(filter_1d.shape[0])
in_width, in_1d.shape[0]


(5, Dimension(5))

In [9]:
"""start a session in tf"""
sess = tf.Session()

"""define the parameters"""
ones_1d = np.ones(5) #array([1., 1., 1., 1., 1.])
weight_1d = np.ones(3) #array([1., 1., 1.])
strides_1d = 1

"""set them as tf constants"""
in_1d = tf.constant(ones_1d, dtype=tf.float32)
filter_1d = tf.constant(weight_1d, dtype=tf.float32)

"""reshping the array as [1, 5, 1]"""
input_1d   = tf.reshape(in_1d, [1, int(in_1d.shape[0]), 1]) #<tf.Tensor 'Reshape:0' shape=(1, 5, 1) dtype=float32>
kernel_1d = tf.reshape(filter_1d, [int(filter_1d.shape[0]), 1, 1])

output_1d = tf.squeeze(tf.nn.conv1d(input_1d, kernel_1d, strides_1d, padding='SAME'))
print (sess.run(output_1d))

[2. 3. 3. 3. 2.]


<p>TF documentation - Computes a 1-D convolution given 3-D input and filter tensors.

Given an input tensor of shape [batch, in_width, in_channels] if data_format is "NHWC", or [batch, in_channels, in_width] if data_format is "NCHW", and a filter / kernel tensor of shape [filter_width, in_channels, out_channels], this op reshapes the arguments to pass them to conv2d to perform the equivalent convolution operation. </p>

<h3>Conv 2D</h3>
<p>2-direction (x,y) to calculate convolution. A 2D filter is used and this 2D filter learns the same weights that help the kernel to learn a particular feature from the image.</p>

In [12]:
ones_2d = np.ones((5,5))
weight_2d = np.ones((3,3))
strides_2d = [1, 1, 1, 1]

in_2d = tf.constant(ones_2d, dtype=tf.float32)
filter_2d = tf.constant(weight_2d, dtype=tf.float32)

in_width = int(in_2d.shape[0])
in_height = int(in_2d.shape[1])

filter_width = int(filter_2d.shape[0])
filter_height = int(filter_2d.shape[1])

input_2d   = tf.reshape(in_2d, [1, in_height, in_width, 1])
kernel_2d = tf.reshape(filter_2d, [filter_height, filter_width, 1, 1])

output_2d = tf.squeeze(tf.nn.conv2d(input_2d, kernel_2d, strides=strides_2d, padding='SAME'))
print (sess.run(output_2d))

[[4. 6. 6. 6. 4.]
 [6. 9. 9. 9. 6.]
 [6. 9. 9. 9. 6.]
 [6. 9. 9. 9. 6.]
 [4. 6. 6. 6. 4.]]


<h3>Conv 3D</h3>
<p>We can think of this as creating a small cube from a large one. Here the output and input is 3D</p> 

In [14]:
ones_3d = np.ones((5,5,5))
weight_3d = np.ones((3,3,3))
strides_3d = [1, 1, 1, 1, 1]

in_3d = tf.constant(ones_3d, dtype=tf.float32)
filter_3d = tf.constant(weight_3d, dtype=tf.float32)

in_width = int(in_3d.shape[0])
in_height = int(in_3d.shape[1])
in_depth = int(in_3d.shape[2])

filter_width = int(filter_3d.shape[0])
filter_height = int(filter_3d.shape[1])
filter_depth = int(filter_3d.shape[2])

input_3d   = tf.reshape(in_3d, [1, in_depth, in_height, in_depth, 1])
kernel_3d = tf.reshape(filter_3d, [filter_depth, filter_height, filter_width, 1, 1])

output_3d = tf.squeeze(tf.nn.conv3d(input_3d, kernel_3d, strides=strides_3d, padding='SAME'))
print (sess.run(output_3d))

[[[ 8. 12. 12. 12.  8.]
  [12. 18. 18. 18. 12.]
  [12. 18. 18. 18. 12.]
  [12. 18. 18. 18. 12.]
  [ 8. 12. 12. 12.  8.]]

 [[12. 18. 18. 18. 12.]
  [18. 27. 27. 27. 18.]
  [18. 27. 27. 27. 18.]
  [18. 27. 27. 27. 18.]
  [12. 18. 18. 18. 12.]]

 [[12. 18. 18. 18. 12.]
  [18. 27. 27. 27. 18.]
  [18. 27. 27. 27. 18.]
  [18. 27. 27. 27. 18.]
  [12. 18. 18. 18. 12.]]

 [[12. 18. 18. 18. 12.]
  [18. 27. 27. 27. 18.]
  [18. 27. 27. 27. 18.]
  [18. 27. 27. 27. 18.]
  [12. 18. 18. 18. 12.]]

 [[ 8. 12. 12. 12.  8.]
  [12. 18. 18. 18. 12.]
  [12. 18. 18. 18. 12.]
  [12. 18. 18. 18. 12.]
  [ 8. 12. 12. 12.  8.]]]


<p>The concepts are defined in the paper titled "Learning Spatiotemporal Features with 3D Convolutional Networks"</p>