# Neural Network-related operations
### Nonlinear activations used by neural networks
* Typically there is a nonlinear activation transformation (activation layer) after each layer output in a neural network (except for the last layer).
* Nonlinear transformation helps a neural network to learn various nonlinear patterns that are present in data. 
* Commonly used nonlinear activation functions:
1. `tf.nn.sigmoid(x, name=None)` 

`sigmoid(x) = 1/(1+exp(x))`

2. `tf.nn.relu(x, name=None)`

`relu(x) = max(0, x)`

# The Convolution Operation
* Convolution is used to produce different effects on images. 
* Achieved by shifting a convolution filter on top of an image to produce a different output at each location. 
* At each location, do element-wise multiplication of the elements in the convolution filter with the image patch (same size as convolution filter) that overlaps with the convolution filter and takes the sum of the multiplication. 


In [2]:
# Implementation of the convolution operation
import tensorflow as tf
import numpy as np


x = tf.constant(
    [[
      [[1], [2], [3], [4]],
      [[4], [3], [2], [1]],
      [[5], [6], [7], [8]],
      [[8], [7], [6], [5]]
    ]],
    dtype=tf.float32
)

x_filter = tf.constant(
    [
     [
      [[0.5]], [[1]]
     ],
     [
      [[0.5]], [[1]]
     ]
    ],
    dtype=tf.float32
)

x_stride = [1, 1, 1, 1]
x_padding = 'VALID'

x_conv = tf.nn.conv2d(
    input=x, filter=x_filter,
    strides=x_stride, padding=x_padding
)

The `tf.conv2d(...)` operation requires `input, filter` and `stride` to be of an exact format.

The arguments of `tf.nn.conv2d(input, filter, strides, padding)`:
* **input**: Typically a 4D tensor where the dimensions should be ordered as `[batch_size, height, width, channels]`.
    * **batch_size**: amount of data (e.g. inputs like images and words) in a single batch of data. Normally process data in batches as large datasets are usually used. At a given training step, we randomly sample a small batch of data that approximately represents the whole dataset. Doing this for many steps allows us to approximate the full dataset. 
    * **height and width**: Height and width of the input
    * **channels**: Depth of an input (e.g. for an RGB image the channels will be 3 - a channel for each color)
* **filter**: 4D tensor that represents the convolution of the window of the operation operation. Filter dimensions should be `[height, width, in_channels, out_channels]`:
    * **height and width**: height and width of the filter (often smaller than the input).
    * **in_channels**: Number of channels of the input to the layer
    * **out_channels**: Number of channels to be produced in the output of the layer
* **strides**: List with four elements `[batch_stride, height_stride, width_stride, channels_stride]`. Denotes how many elements to skip during a single shift of the convolution window on the input. 
* **padding**: Can be one of `[SAME, VALID]`. Decides how to handle the convolution operation near the boundaries of the input. 
    * `VALID` performs the convolution without padding. If we convolve an input of *n* length with a convolution of size *h*, this will result in an output of size *(n-h+1 < n)*. Diminishing output size can limit the depth of neural networks. 
    * `SAME` pads zeros to the boundary such that the output will have the same height and width as the input.


# The pooling operation
* Behaves similar to the convolution operation.
* Except, we output the maximum element of the image patch for that location.

In [0]:
# Pooling operation
x = tf.constant(
    [[
      [[1], [2], [3], [4]],
      [[4], [3], [2], [1]],
      [[5], [6], [7], [8]],
      [[8], [7], [6], [5]]
    ]],
    dtype=tf.float32
)

x_ksize = [1, 2, 2, 1]
x_stride = [1, 2, 2, 1]
x_padding = 'VALID'

x_pool = tf.nn.max_pool(
    value=x, ksize=x_ksize,
    strides=x_stride, padding=x_padding
)

# Returns:
# [[[
#   [4]
#   [4.]],
#   [[8.]
#   [8.]]
# ]]]