# [A simple neural network module for relational reasoning (RN)](https://papers.nips.cc/paper/7082-a-simple-neural-network-module-for-relational-reasoning.pdf)

2017

Relation Networks (RNs) are a simple plug-and-play module that help to solve problems that fundamentally hinge on relational reasoning.

## Introduction
The ability to reason about the relations between entities and their properties is central to generally
intelligent behavior [Justin.J, Le Fe](https://arxiv.org/abs/1612.06890),-- [Charles.K, Joshua.B](https://www.pnas.org/content/105/31/10687). Symbolic approaches to artificial intelligence are inherently relational [A. Newell](https://www.sciencedirect.com/science/article/abs/pii/S0364021380800152), [S Harnad](https://www.sciencedirect.com/science/article/abs/pii/0167278990900876). Practitioners define
the relations between symbols using the language of logic and mathematics, and then reason about
these relations using a multitude of powerful methods, including deduction, arithmetic, and algebra.

Deep learning, often struggle in data-poor problems where the underlying structure is characterized by sparse but complex relations [M Garnelo, K Arulkumaran, M Shanahan](https://arxiv.org/abs/1609.05518) -- [BM Lake, TD Ullman, JB Tenenbaum](https://arxiv.org/abs/1604.00289).

### Relational Reasioning (RN)

RNs are architectures whose computations focus explicitly on relational reasoning [18]. Although several other models supporting relation-centric computation have been proposed, such as [Graph Neural Neworks](https://ieeexplore.ieee.org/abstract/document/4700287/), [Gated Graph Sequence Neural Netoworks](https://arxiv.org/abs/1511.05493), and [Interaction Networks](http://papers.nips.cc/paper/6417-interaction-networks-for-learning-about-objects-relations-and-physics), RNs are simpler, more exclusively focused on general relation reasoning, and easier to integrate within broader architectures.

The design philosophy behind RNs is to constrain the functional form of a neural network so that it captures the
core common properties of relational reasoning. In other words, the capacity to compute relations is baked into the RN architecture without needing to be learned, just as the capacity to reason about spatial, translation invariant properties is built-in to CNNs, and the capacity to reason about sequential dependencies is built into recurrent neural networks.

$RN(O) = f_\Phi(\sum_{i,j} g_\theta(o_i, o_j))$ 

Where the input is a set of “objects” $O = \{o_1, o_2, ..., o_n\}$, $o_i ∈ R^m$ is the $i^{th}$ object, and $f_φ$ and $g_θ$ are functions with parameters $φ$ and $θ$, respectively

In [None]:
import torch
import torch.nn as nn
import torch.nn.functional as F

In [None]:
class RelationalLayerBase(nn.Module):
    """
    The relational Base Layer 
    An RN is a neural network module with a structure primed for relational reasoning. The design
    philosophy behind RNs is to constrain the functional form of a neural network so that it captures the
    core common properties of relational reasoning.
    
    Child of nn.Module Class
    """
    def __init__(self, in_size, out_size, device, hyp):
        super().__init__()
        """
        Relational Base Layer Initializer
        Args:
            in_size (Integer): network input size
            out_size (Integer): network output size
            device (torch.device): Pytorch Device Object
            hyp (Dictionary): Hyperparmeters {g_layers:<Int>, f_fc1:<Int>, f_fc2:<Int>, dropout:<float>}
        Returns: None
        """
        self.device = device
        self.f_fc1 = nn.Linear(hyp["g_layers"][-1], hyp["f_fc1"]).to(device)
        self.f_fc2 = nn.Linear(hyp["f_fc1"], hyp["f_fc2"]).to(device)
        self.f_fc3 = nn.Linear(hyp["f_fc2"], out_size).to(device)
    
        self.dropout = nn.Dropout(p=hyp["dropout"])
        
        self.on_gpu = True
        self.hyp = hyp
        self.in_size = in_size
        self.out_size = out_size

    def cuda(self):
        """Load Model on CUDA"""
        self.on_gpu = True
        super().cuda()


In [None]:
x = [[[1,2,3][3,4,5]]] # [B x 2 x 3] here B=1, sequence = 2 and feature_size = 3
# Now Let us create x_i
x_i = [[ [[1,2,3],[4,5,6]] ]] # [B x 1 x 2 x 3] 
x_i = [[ [[1,2,3],[4,5,6]],[[1,2,3],[4,5,6]] ]] # [B x 2 x 2 x 3] as you can see the whole sequence is repeated

x_j = [[ [[1,2,3]],[[4,5,6]] ]] # [B x 2 x 1 x 3] 
x_j = [[ [[1,2,3],[1,2,3]],[[4,5,6],[4,5,6]] ]] # [B x 2 x 2 x 3] In this case the features are repeated

# The Last Step is to Concatinate x_i and x_j on the last dimention and pass them through Linear unit with RELU
cat = [[ [[1,2,3,1,2,3],[4,5,6,1,2,3]],[[1,2,3,4,5,6],[4,5,6,4,5,6]] ]] # [B x 1 x 2 x 6] 
# NOTE: When we are concatinating x_i and x_j we are mapping each sequence of X with each feature within it.
# In other words each features of x_j are mapped with all sequences of x_i 
# The first sequence of x_j [1,2,3] is mapped with all the sequences of X, that is [1,2,3] and [3,4,5]


In [None]:
class RelationalLayer(RelationalLayerBase):
    """
    Child Class of RelationalLayerBase
    Creates the g_layers for the given a set of “objects” O = {o1, o2, ..., on}

    Parameters
    ----------
        g_layers : list[nn.Linear]
            List of nn.Linear modules 
        edge_feature: nn.Linear Module
            Transform the output feature

    """
    def __init__(self, in_size, out_size, device, hyp):
        super().__init__(in_size, out_size, device, hyp)
        """
        Relational Layer Initializer
        Args:
            in_size (Integer): network input size
            out_size (Integer): network output size
            device (torch.device): Pytorch Device Object
            hyp (Dictionary): Hyperparmeters {g_layers:<Int>, f_fc1:<Int>, f_fc2:<Int>, dropout:<float>}
        Returns: None
        """
        self.in_size = in_size
        self.edge_feature = nn.Linear((in_size//2)*4, in_size).to(device)

        #create all g layers
        self.g_layers = []
        self.g_layers_size = hyp["g_layers"]

        for idx, g_layer_size in enumerate(hyp["g_layers"]):
            in_s = in_size if idx==0 else hyp["g_layers"][idx-1]
            out_s = g_layer_size
            l = nn.Linear(in_s, out_s).to(self.device)
            self.g_layers.append(l) 
        self.g_layers = nn.ModuleList(self.g_layers)
    
    def forward(self, x):
        """
        Implements the forward method of nn.Module Class
        Args:
            x(Tensor): batch_size x seqence_size x feature_size
        Returns:
            Tensor
        """
        b, d, k = x.size()  # here b=Batch_size, d=sequence_size, k=feature_size 

        # cast all pairs against each other
        x_i = torch.unsqueeze(x, 1)                  # (B x 1 x 64 x 26)
        x_i = x_i.repeat(1, d, 1, 1)                 # (B x 64 x 64 x 26)
        x_j = torch.unsqueeze(x, 2)                  # (B x 64 x 1 x 26)
        x_j = x_j.repeat(1, 1, d, 1)                 # (B x 64 x 64 x 26)
        """Note: as shown above x_i and x_j repeat the sequence and the feature 64 times respectively.
        This means """
        
        # concatenate all together
        x_full = torch.cat([x_i, x_j], 3)        # (B x 64 x 64 x 2*26)
        x_full = self.edge_feature(x_full)       # (B x 64 x 64 x 2*26)
        
        # reshape for passing through network
        x_ = x_full.view(b * d**2, self.in_size)

        for idx, (g_layer, g_layer_size) in enumerate(zip(self.g_layers, self.g_layers_size)):          
            x_ = g_layer(x_)
            x_ = F.relu(x_)

        # reshape again and sum
        x_g = x_.view(b, d**2, self.g_layers_size[-1])
        x_g = x_g.sum(1).squeeze(1)
        
        """f"""
        x_f = self.f_fc1(x_g)
        x_f = F.relu(x_f)
        x_f = self.f_fc2(x_f)
        x_f = self.dropout(x_f)
        x_f = F.relu(x_f)
        x_f = self.f_fc3(x_f)

        return F.log_softmax(x_f, dim=1)

### Results
#### bAbI
RN **succeeded on 18/20** tasks. Notably, it succeeded on the basic induction task (2.1% total error).
Also, RN did not catastrophically fail in any of the tasks: for the 2 tasks that it failed (the **“two supporting facts”**, and **“three supporting facts”** tasks), it **missed the 95% threshold** by **3.1% and 11.5%**, respectively. 