# Dense Neural Network

Introduction: In this implementation, you will learn how to create a single dense layer up to a full dense neural network. 
Additionally, you will have the opportunity to work with a real-life case.


# What are dense layers? 
- A dense layer is mostly used as the penultimate layer after a feature extraction block (convolution, encoder or decoder, etc.), output layer (final layer), and to project a vector of dimension d0 to a new dimension d1.

#### Let's consider a 1D input feature (fully connected neural network):
This input is processed using a fully connected layer with 4 neurons (we will ignore the linearity and bias for simplicity). How many connections do we get - 3 * 4 = 12, i.e., every value in the 1D feature space is multiplied by 4 weight vectors (represented by color lines), as shown in the figure below. 




Now, let's see in python (with tensorflow)

In [1]:
## Packages
import tensorflow as tf
from tensorflow.keras import layers

In [3]:
# Let's create a random input feature shape with batch_size of 1.
inputs = tf.random.uniform(shape=(1,3)) # (batch_size, num_features)

# Initialize a fully connected layer (we will not be using bias for simplicity).
dense_layer = layers.Dense(units=4, use_bias=False)

# What should we expect as output?
print(dense_layer(inputs))


tf.Tensor([[ 0.89485604 -0.44398558 -0.06432793  0.9401731 ]], shape=(1, 4), dtype=float32)


In [9]:
# Let's inspect the initialized weights for our fully connected layer:
print(dense_layer.weights[0][:])

tf.Tensor(
[[ 0.7337251   0.22027314 -0.5955232   0.89705515]
 [ 0.04006982 -0.38282537  0.02825093  0.3985901 ]
 [ 0.4617126  -0.893532    0.6957555   0.18284833]], shape=(3, 4), dtype=float32)


# How About An N-Dimensional Input?

Now let's suppose we have an input of shape (time/num_frame/arbitrary_feature, features). This input can be a sequence (time series or video) or an arbitrary feature space. 
Depending on the use case, you can pass (or process) this input through a fully connected layer in multiple ways. Let's consider a few before we go down the rabbit hole.

### 1. It's an Arbitrary Feature Space
The input can be some arbitrary feature space of shape (arbitrary_feature, features).
You can consider and experiment with flattening the input. What does input flattening look like in Keras? Let's find out!


### 1.1 Input Flattening
In Keras, one can use the Flatten() layer to flatten any input into a 1D vector. This layer doesn't flatten along the batch dimension, i.e., if the input has a shape of (32, 2, 3) where 32 is the batch size. The flattening operation will give a vector of shape (32, 6).


In [12]:
# Let's initialize a constant input of shape (1, 2, 3) where 1 is the batch size.
inputs = tf.constant([[[1, 1, 1], [2, 2, 2]]])
# Initialize a flattening layer
flatten = layers.Flatten()
# What should we expect as the output?
outputs = flatten(inputs)
print(outputs)

tf.Tensor([[1 1 1 2 2 2]], shape=(1, 6), dtype=int32)


We get a flattened vector the shape (1, 6) where 1 is the batch size. This layer is usually used after a feature extraction block in a deep neural network. 
This is also valid to use where the arbitrary_feature dimension is independent, i.e., one can consider features not part of a sequence.
Let's pass this through a dense layer:

In [13]:
dense_layer = layers.Dense(
    units=4, # We have 4 neurons in this fully connected layer.
    use_bias=False,
    kernel_initializer=tf.keras.initializers.Constant(value=0.5) # The weights are initialized with a constant value of 0.5.
)


# What should we expect as the output?
print(dense_layer(outputs))



tf.Tensor([[4.5 4.5 4.5 4.5]], shape=(1, 4), dtype=float32)


How did we get this value? After the fully connected layer, each output value (4.5) is computed like 1x0.5 + 1x0.5 + 1x0.5 + 2x0.5 + 2x0.5 + 2x0.5 = 4.5.

# 2. It's a Sequence

Now let's consider an input of shape (time, features). Here each features is dependent along the time axis, and a flattened vector would lose this dependence. Consider a scenario where we want to project the dimension of the features to a new dimension. Can we use Keras Dense() layer to do so?

As per the documentation:

>  If the input to the layer has a rank greater than 2, then Dense computes the dot product between the inputs and the kernel along the last axis of the inputs and axis 0 of the kernel. For example, if input has dimensions (batch_size, d0, d1), then we create a kernel with shape (d1, units), and the kernel operates along axis 2 of the input, on every sub-tensor of shape (1, 1, d1) (there are batch_size * d0 such sub-tensors). The output in this case will have shape (batch_size, d0, units).


Let's break it down and understand each moving part with code

In [14]:
# The input is of shape (1, 2, 3) where 1 is the batch size and 2 is the time axis.
inputs = tf.constant(
    [[[1, 1, 1], [2, 2, 2]]]
)


# The rank of input is greater than 2?
print(tf.rank(inputs))


tf.Tensor(3, shape=(), dtype=int32)


Calculations:
1. Inputs = (1,2,**3**)
2. Kernel = (3,4)
3. Ouput = (1,2,**4**)

Let's use the Dense layer to project the last dimension from 3 to 4.

In [15]:
# Initialize a dense layer with 4 outout neurons and a constant weight of 0.5.
dense_layer = layers.Dense(
    units=4,
    use_bias=False,
    kernel_initializer=tf.keras.initializers.Constant(value=0.5)
)


# What should be the expected output?
print(dense_layer(inputs))

tf.Tensor(
[[[1.5 1.5 1.5 1.5]
  [3.  3.  3.  3. ]]], shape=(1, 2, 4), dtype=float32)


 As you can see, the Dense layer projected the input of shape (1, 2, 3) to (1, 2, 4). We got the output value of 1.5 because of this computation: 1x0.5 + 1x0.5 + 1x0.5 = 1.5. Similarly, we got the output value of 3.0 because of this computation: 2x0.5 + 2x0.5 + 2x0.5 = 3. As per the documentation, the weight matrix's shape should be (3, 4).

In [21]:
dense_layer.weights[0][:]


<tf.Tensor: shape=(3, 4), dtype=float32, numpy=
array([[0.5, 0.5, 0.5, 0.5],
       [0.5, 0.5, 0.5, 0.5],
       [0.5, 0.5, 0.5, 0.5]], dtype=float32)>

### 2.1 TimeDistributed Layer
We looked from the perspective of projecting our (time, features) input to a different dimension, but how about we want to apply a Dense operation (dot products) to each feature sequentially?
To add more nuance to this discussion, imagine a video data sample of shape (num_frames, height, width, 3). You would like to extract information from each frame sequentially using a pre-trained image model.
The TimeDistributed layer allows you to apply a layer (feature extractor here) to every temporal slice (frames here) of an input (video here).
Let's see how we can apply a Dense operation to an input of shape (time, features). We will use the inputs initialized in the previous example:


In [22]:
# This dense layer will be applied to each `time`.
dense_layer = layers.Dense(
    units=4,
    use_bias=False,
    kernel_initializer=tf.keras.initializers.Constant(value=0.5)
)
# Initialize the `TimeDistributed` layer.
timedistributed = layers.TimeDistributed(dense_layer)
# What should be the expected output?
print(timedistributed(inputs))


tf.Tensor(
[[[1.5 1.5 1.5 1.5]
  [3.  3.  3.  3. ]]], shape=(1, 2, 4), dtype=float32)


Ain't the output the same as the one in the previous section? This makes TimeDistributed(Dense(...)) and Dense(...) equivalent to each other in this scenario.