# 7-04: Directed Acyclic Graphs 

In addition to neural networks with multiple inputs and multiple outputs, the `keras` functional API lets us build networks that resemble acyclic graphs of layers such as Inception modules. 

The key here is **acyclic**: the network can't be structured such that output tensor of a layer becomes the input tensor of any of the layers that generated the tensor in the first place. Only recurrent connections internal to layers are allowed to have loops.

# Inception Modules
- A type of CNN topology.
- Consists of several branches that look like smaller independent networks which are then merged or concatenated.
- These networks will use 1 x 1, 3 x 3 convolutions and pooling operations. 
- Simplest Inception Module: 3 - 4 branches of 1 x 1 conv followed by 3 x 3 conv, ending with concatenation. 
- Inception modules help us learn spatial features (features that share the same tiles or pixels in an image across channels) as well as channel-wise features (features common to all channels in a single tile or pixel). 
    - More efficient to learn them separately rather than jointly.
- 1 x 1 Convolutions are point-wise convolutions
    - They compute features that are shared across channels, but not across pixels/space (since looking at one tile at a time).
    - Useful if each channel is highly autocorrelated across space, but different channels not correlated with each other. 


# Implementing an Inception Module

In this code, `x` is assumed to be a 4-D input tensor.

In [3]:
from tensorflow.keras import layers

In [None]:
branch_a = layers.Conv2D(128, 1, activation='relu', strides=2)(x)

In [None]:
# In this branch, striding occurs in the spatial convolutional layer (3 x 3 conv)
branch_b = layers.Conv2D(128, 1, activation='relu')(x)
branch_b = layers.Conv2D(128, 1, activation='relu', strides=2)(branch_b)

In [None]:
# In this branch, striding occurs in the average pooling layer 
branch_c = layers.AveragePooling2D(3, strides=2)(x)
branch_c = layers.Conv2D(128, 3, activation='relu')(branch_c)

In [None]:
# In this branch, striding occurs int he final spatial convolutional layer
branch_d = layers.Conv2D(128, 1, activation='relu')(x)
branch_d = layers.Conv2D(128, 3, activation='relu')(branch_d)
branch_d = layers.Conv2D(128, 3, activation='relu', strides=2)(branch_d)

In [None]:
# Combine outputs of all branches to obtain module output
output = layers.concatenate([branch_a, branch_b, branch_c, branch_d], axis=-1)

# Residual Connections

Residual connections are another variant of the acyclic graph topology in which activations from earlier layers (which represent prior information) are reinjected as inputs later in the network.

It makes the output of an earlier layer available as an input to a later layer, thus avoiding the problem of vanishing gradients and information bottlenecks. 

Residual connections will add, rather than concatenate, earlier outputs with later outputs, and therefore assume the earlier output is of the same shape as the activation it is being added to.

If this isn't the case, a linear transformation can often convert the earlier activation to the desired shape.

## Example 1 - Same Activation Shapes

In [None]:
# Assume `x` represents a 4-D tensor in this example

In [4]:
from tensorflow.keras import layers

In [None]:
# First intermediate Conv2D layer transforms the input x - `same` necessary
# because we want the final outputs and residual connection input to have same shape
y = layers.Conv2D(128, 3, activation='relu', padding='same')(x)

# successive transformations using Conv2D
y = layers.Conv2D(128, 3, activation='relu', padding='same')(y)
y = layers.Conv2D(128, 3, activation='relu', padding='same')(y)

# Add the original input tensor x back to the output features - residual connection
y = layers.add([y, x])

## Example 2 - Different Activation Shapes

In [5]:
from tensorflow.keras import layers

In [6]:
# Assume again that x is a 4D input tensor

In [None]:
# Convolutional layers will transform the input `x` while keeping input shapes same
y = layers.Conv2D(128, 3, activation='relu', padding='same')(x)
y = layers.Conv2D(128, 3, activation='relu', padding='same')(y)

# The stride here has changed the shape of the activation
y = layers.MaxPooling2D(2, strides=2)(y)

# Using 1 x 1 convolution to linearly downsample original x tensor to same shape as y
residual = layers.Conv2D(128, 1, strides=2, padding='same')(x)

# Similar shapes - now add residual tensor back to output features
y = layers.add([y, residual])

# Sharing Model Weights - Siamese LSTM

In [7]:
from tensorflow.keras import layers
from tensorflow.keras import Input
from tensorflow.keras.models import Model

In [8]:
# Instantiate a single LSTM layer only once
lstm = layers.LSTM(32)

In [9]:
# Building left branch of the model: inputs are variable length sequences of vectors of size 128
left_input = Input(shape=(None, 128))
left_output = lstm(left_input)

In [11]:
# Building right branch - calling existing layer instance and reusing its weights
right_input = Input(shape=(None, 128))
right_output = lstm(right_input)

In [12]:
# Build a classifier on top of the merged outputs 
merged = layers.concatenate([left_output, right_output], axis=-1)
predictions = layers.Dense(1, activation='sigmoid')(merged)

In [13]:
# Build a model linking both input sequences to output predictions
model = Model([left_input, right_input], predictions)

# Weights of the model will be updated based on both inputs
model.fit([left_data, right_data], targets)

# Using Models as Layers

In the previous examples, we used a single `LSTM` layer to transform two different inputs into two different outputs using a common set of weights. 

The same principle applies to entire models as well: weights of entire models can be reused. 

This example demonstrates how we can use a pretrained model's weights in combination with video streams from two cameras for depth analysis using a single, pretrained Xception model as the common processing unit.

In [14]:
from tensorflow.keras import layers
from tensorflow.keras import applications
from tensorflow.keras import Input

In [15]:
# Instantiate convolutional base of Xception model: comes prepackaged with keras, trained on ImageNet
xception_base = applications.Xception(weights=None, include_top=False)

In [16]:
# The entire model is used to process two different streams of input
left_input = Input(shape=(250, 250, 3))    # RGB images 
right_input = Input(shape=(250, 250, 3))

In [17]:
# Use pretrained base to extract features
left_features = xception_base(left_input)
right_features = xception_base(right_input)

In [18]:
# Merged features contain information from both visual feeds
merged_features = layers.concatenate([left_features, right_features], axis=-1)