## Part 1: Model Explanation

Simple relational inferences have proved challenging through simple black-box architectures such as MLPs, CNNs, RNNs or a combination of them. A solution to this problem is the implementation of Relational Network modules throughout the selected base architecture.

A Relational network is a neural network module that is used to model relational reasoning. The idea behind Relational networks is to use general-purpose components (MLPs) so that we can capture the patterns of relational properties and use them to augment the network architecture by modulating the upstream representations (feature maps).

RN Module:

$RN(O) = f_ϕ(\sum_{i,j} g_\theta (o_i, o_j))$

The input to the Rn module is the set object $O = \{o_1, o_2, ..., o_n\}$, where $o_i ∈ \mathbb{R}^m$. The functions $f_\phi$ and $g_\theta$ are simple MLPs with parameters $\phi$ and $\theta$. For each pair of objects $(o_i, o_j)$ the function $g_\theta$ is tasked with inferring their relationship in an order invariant manner. Because the module is made with general purpose components it is end-to-end differentiable. There is no specific requirement for what an object can be, hence relatively unstructured inputs such as CNN or LSTM embeddings can be used as objects.

In reality however, the RN quesiton dependent. Hence the correct formulation is $a = f_ϕ(\sum_{i,j} g_\theta (o_i, o_j, q))$.

For the Sort-of-CLEVR dataset, the model used does not contain an LSTM portion. The questions are encoded as binary strings embeddings of fixed size which are passed directly to the RN module in combination with the object representations.

The model consists of four convolution layers with 32, 64, 128, and 256 kernels coupled with ReLU activation functions and batch-normalization. The RN module is made up by a four layer MLP with 2000 neurons per layer for the function $g_\theta$ and a four layer MLP with 2000, 1000, 500, and 100 neurons respectively for the function $f_\phi$.

The model is topped with a final classification layer and trained using CE loss and an Adam optimizer with learning rate of $1e-4$.











## Part 2: Model Implementation

In [None]:
import numpy as np
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torch.autograd import Variable

from __future__ import print_function
import os
import pickle
import random
import numpy as np
import csv

import torch
from torch.utils.tensorboard import SummaryWriter

In [None]:
from google.colab import drive

drive.mount('/content/gdrive')

Mounted at /content/gdrive


In [None]:
%cd '/content/gdrive/MyDrive/ai/finalpj'

/content/gdrive/MyDrive/ai/finalpj


In [None]:
class ConvInputModel(nn.Module):
    def __init__(self):
        super(ConvInputModel, self).__init__()
        
        self.conv1 = nn.Conv2d(3, 32, 3, stride=2, padding=1)
        self.batchNorm1 = nn.BatchNorm2d(32)
        self.conv2 = nn.Conv2d(32, 64, 3, stride=2, padding=1)
        self.batchNorm2 = nn.BatchNorm2d(64)
        self.conv3 = nn.Conv2d(64, 128, 3, stride=2, padding=1)
        self.batchNorm3 = nn.BatchNorm2d(128)
        self.conv4 = nn.Conv2d(128, 256, 3, stride=2, padding=1)
        self.batchNorm4 = nn.BatchNorm2d(256)
        
    def forward(self, img):
        """convolution"""
        x = self.conv1(img)
        x = F.relu(x)
        x = self.batchNorm1(x)
        x = self.conv2(x)
        x = F.relu(x)
        x = self.batchNorm2(x)
        x = self.conv3(x)
        x = F.relu(x)
        x = self.batchNorm3(x)
        x = self.conv4(x)
        x = F.relu(x)
        x = self.batchNorm4(x)
        return x

In [None]:
class FCOutputModel(nn.Module):
    def __init__(self):
        super(FCOutputModel, self).__init__()

        self.fc2 = nn.Linear(100, 100)
        self.fc3 = nn.Linear(100, 10)

    def forward(self, x):
        x = self.fc2(x)
        x = F.relu(x)
        x = F.dropout(x)
        x = self.fc3(x)
        return F.log_softmax(x, dim=1)

In [None]:
class BasicModel(nn.Module):
    def __init__(self, name, **kwargs):
        super(BasicModel, self).__init__()
        self.name=name

    def train_(self, input_img, input_qst, label):
        self.optimizer.zero_grad()
        output = self(input_img, input_qst)
        loss = F.nll_loss(output, label)
        loss.backward()
        self.optimizer.step()
        pred = output.data.max(1)[1]
        correct = pred.eq(label.data).cpu().sum()
        accuracy = correct * 100. / len(label)
        return accuracy, loss
        
    def test_(self, input_img, input_qst, label):
        output = self(input_img, input_qst)
        loss = F.nll_loss(output, label)
        pred = output.data.max(1)[1]
        correct = pred.eq(label.data).cpu().sum()
        accuracy = correct * 100. / len(label)
        return accuracy, loss

    def save_model(self, epoch):
        torch.save(self.state_dict(), 'model/epoch_{}_{:02d}.pth'.format(self.name, epoch))

In [None]:
class RN(BasicModel):
    def __init__(self, **kwargs):
        super(RN, self).__init__('RN', **kwargs)
        
        self.conv = ConvInputModel()
        
        self.relation_type = kwargs['relation_type']
        
        if self.relation_type == 'ternary':
            ##(number of filters per object+coordinate of object)*3+question vector
            self.g_fc1 = nn.Linear((256+2)*3+18, 2000)
        else:
            ##(number of filters per object+coordinate of object)*2+question vector
            self.g_fc1 = nn.Linear((256+2)*2+18, 2000)

        self.g_fc2 = nn.Linear(2000, 2000)
        self.g_fc3 = nn.Linear(2000, 2000)
        self.g_fc4 = nn.Linear(2000, 2000)

        self.f_fc1 = nn.Linear(2000, 2000)
        self.f_fc2 = nn.Linear(2000, 1000)
        self.f_fc3 = nn.Linear(1000, 500)
        self.f_fc4 = nn.Linear(500, 100)

        self.coord_oi = torch.FloatTensor(kwargs['batch_size'], 2)
        self.coord_oj = torch.FloatTensor(kwargs['batch_size'], 2)
        if kwargs['cuda']:
            self.coord_oi = self.coord_oi.cuda()
            self.coord_oj = self.coord_oj.cuda()
        # self.coord_oi = Variable(self.coord_oi)
        # self.coord_oj = Variable(self.coord_oj)

        # prepare coord tensor
        def cvt_coord(i):
            return [(i/5-2)/2., (i%5-2)/2.]
        
        self.coord_tensor = torch.FloatTensor(kwargs['batch_size'], 25, 2)
        if kwargs['cuda']:
            self.coord_tensor = self.coord_tensor.cuda()
        # self.coord_tensor = Variable(self.coord_tensor)
        np_coord_tensor = np.zeros((kwargs['batch_size'], 25, 2))
        for i in range(25):
            np_coord_tensor[:,i,:] = np.array( cvt_coord(i) )
        self.coord_tensor.data.copy_(torch.from_numpy(np_coord_tensor))


        self.fcout = FCOutputModel()
        
        self.optimizer = optim.Adam(self.parameters(), lr=kwargs['lr'])


    def forward(self, img, qst):
        x = self.conv(img) ## x = (64 x 256 x 5 x 5)
        """g"""
        mb = x.size()[0]
        n_channels = x.size()[1]
        d = x.size()[2]
        
        x_flat = x.view(mb, n_channels, d * d).permute(0, 2, 1) # (64 x 25 x 256)
        x_flat = torch.cat([x_flat, self.coord_tensor], 2) # (64 x 25 x 256+2)

        # add question everywhere
        qst = torch.unsqueeze(qst, 1) # (64 x 1 x 18)
        qst = qst.repeat(1, 25, 1) # (64 x 25 x 18)
        qst = torch.unsqueeze(qst, 2) # (64 x 25 x 1 x 18)
        
        # cast all pairs against each other
        x_i = torch.unsqueeze(x_flat, 1)  # (64 x 1 x 25 x 258)
        x_i = x_i.repeat(1, 25, 1, 1)  # (64 x 25 x 25 x 258)
        x_j = torch.unsqueeze(x_flat, 2)  # (64 x 25 x 1 x 258)
        x_j = torch.cat([x_j, qst], 3) # (64 x 25 x 1 x 258+18)
        x_j = x_j.repeat(1, 1, 25, 1) # (64 x 25 x 25 x 276)
        
        # concatenate all together
        x_full = torch.cat([x_i,x_j],3) # (64 x 25 x 25 x 258+276)
    
        # reshape for passing through network
        x_ = x_full.view(mb * (d * d) * (d * d), 534)  # (64 x 25 x 25 x 534) = (40000, 534)
         
        """g"""
        x_ = self.g_fc1(x_) # (40000, 2000)
        x_ = F.relu(x_)
        x_ = self.g_fc2(x_) # (40000, 2000)
        x_ = F.relu(x_)
        x_ = self.g_fc3(x_) # (40000, 2000)
        x_ = F.relu(x_)
        x_ = self.g_fc4(x_) # (40000, 2000)
        x_ = F.relu(x_)

        # reshape again and sum
        if self.relation_type == 'ternary':
            x_g = x_.view(mb, (d * d) * (d * d) * (d * d), 2000)
        else:
            x_g = x_.view(mb, (d * d) * (d * d), 2000)
        x_g = x_g.sum(1).squeeze()

        """f"""
        x_f = self.f_fc1(x_g) # (64, 2000)
        x_f = F.relu(x_f)
        x_f = self.f_fc2(x_f) # (64, 1000)
        x_f = F.relu(x_f)
        x_f = self.f_fc3(x_f) # (64, 500)
        x_f = F.relu(x_f)
        x_f = self.f_fc4(x_f) # (64, 100)
        x_f = F.relu(x_f)
        
        return self.fcout(x_f)

### Functions

In [None]:
summary_writer = SummaryWriter()

In [None]:
def tensor_data(data, i):
    img = torch.from_numpy(np.asarray(data[0][bs*i:bs*(i+1)]))
    qst = torch.from_numpy(np.asarray(data[1][bs*i:bs*(i+1)]))
    ans = torch.from_numpy(np.asarray(data[2][bs*i:bs*(i+1)]))

    input_img.data.resize_(img.size()).copy_(img)
    input_qst.data.resize_(qst.size()).copy_(qst)
    label.data.resize_(ans.size()).copy_(ans)

In [None]:
def cvt_data_axis(data):
    img = [e[0] for e in data]
    qst = [e[1] for e in data]
    ans = [e[2] for e in data]
    return (img,qst,ans)

In [None]:
def train(epoch, ternary, rel, norel):
    model.train()

    if not len(rel[0]) == len(norel[0]):
        print('Not equal length for relation dataset and non-relation dataset.')
        return
    
    random.shuffle(ternary)
    random.shuffle(rel)
    random.shuffle(norel)

    ternary = cvt_data_axis(ternary)
    rel = cvt_data_axis(rel)
    norel = cvt_data_axis(norel)

    acc_ternary = []
    acc_rels = []
    acc_norels = []

    l_ternary = []
    l_binary = []
    l_unary = []

    for batch_idx in range(len(rel[0]) // bs):
        tensor_data(ternary, batch_idx)
        accuracy_ternary, loss_ternary = model.train_(input_img, input_qst, label)
        acc_ternary.append(accuracy_ternary.item())
        l_ternary.append(loss_ternary.item())

        tensor_data(rel, batch_idx)
        accuracy_rel, loss_binary = model.train_(input_img, input_qst, label)
        acc_rels.append(accuracy_rel.item())
        l_binary.append(loss_binary.item())

        tensor_data(norel, batch_idx)
        accuracy_norel, loss_unary = model.train_(input_img, input_qst, label)
        acc_norels.append(accuracy_norel.item())
        l_unary.append(loss_unary.item())

        if batch_idx % log_interval == 0:
            print('Train Epoch: {} [{}/{} ({:.0f}%)] '
                  'Ternary accuracy: {:.0f}% | Relations accuracy: {:.0f}% | Non-relations accuracy: {:.0f}%'.format(
                   epoch,
                   batch_idx * bs * 2,
                   len(rel[0]) * 2,
                   100. * batch_idx * bs / len(rel[0]),
                   accuracy_ternary,
                   accuracy_rel,
                   accuracy_norel))
        
    avg_acc_ternary = sum(acc_ternary) / len(acc_ternary)
    avg_acc_binary = sum(acc_rels) / len(acc_rels)
    avg_acc_unary = sum(acc_norels) / len(acc_norels)

    summary_writer.add_scalars('Accuracy/train', {
        'ternary': avg_acc_ternary,
        'binary': avg_acc_binary,
        'unary': avg_acc_unary
    }, epoch)

    avg_loss_ternary = sum(l_ternary) / len(l_ternary)
    avg_loss_binary = sum(l_binary) / len(l_binary)
    avg_loss_unary = sum(l_unary) / len(l_unary)

    summary_writer.add_scalars('Loss/train', {
        'ternary': avg_loss_ternary,
        'binary': avg_loss_binary,
        'unary': avg_loss_unary
    }, epoch)

    # return average accuracy
    return avg_acc_ternary, avg_acc_binary, avg_acc_unary


In [None]:
def test(epoch, ternary, rel, norel):
    model.eval()
    if not len(rel[0]) == len(norel[0]):
        print('Not equal length for relation dataset and non-relation dataset.')
        return
    
    ternary = cvt_data_axis(ternary)
    rel = cvt_data_axis(rel)
    norel = cvt_data_axis(norel)

    accuracy_ternary = []
    accuracy_rels = []
    accuracy_norels = []

    loss_ternary = []
    loss_binary = []
    loss_unary = []

    for batch_idx in range(len(rel[0]) // bs):
        tensor_data(ternary, batch_idx)
        acc_ter, l_ter = model.test_(input_img, input_qst, label)
        accuracy_ternary.append(acc_ter.item())
        loss_ternary.append(l_ter.item())

        tensor_data(rel, batch_idx)
        acc_bin, l_bin = model.test_(input_img, input_qst, label)
        accuracy_rels.append(acc_bin.item())
        loss_binary.append(l_bin.item())

        tensor_data(norel, batch_idx)
        acc_un, l_un = model.test_(input_img, input_qst, label)
        accuracy_norels.append(acc_un.item())
        loss_unary.append(l_un.item())

    accuracy_ternary = sum(accuracy_ternary) / len(accuracy_ternary)
    accuracy_rel = sum(accuracy_rels) / len(accuracy_rels)
    accuracy_norel = sum(accuracy_norels) / len(accuracy_norels)
    print('\n Test set: Ternary accuracy: {:.0f}% Binary accuracy: {:.0f}% | Unary accuracy: {:.0f}%\n'.format(
        accuracy_ternary, accuracy_rel, accuracy_norel))

    summary_writer.add_scalars('Accuracy/test', {
        'ternary': accuracy_ternary,
        'binary': accuracy_rel,
        'unary': accuracy_norel
    }, epoch)

    loss_ternary = sum(loss_ternary) / len(loss_ternary)
    loss_binary = sum(loss_binary) / len(loss_binary)
    loss_unary = sum(loss_unary) / len(loss_unary)

    summary_writer.add_scalars('Loss/test', {
        'ternary': loss_ternary,
        'binary': loss_binary,
        'unary': loss_unary
    }, epoch)

    return accuracy_ternary, accuracy_rel, accuracy_norel

In [None]:
def load_data():
    print('loading data...')
    dirs = './data'
    filename = os.path.join(dirs,'sort-of-clevr.pickle')
    with open(filename, 'rb') as f:
      train_datasets, test_datasets = pickle.load(f)
    ternary_train = []
    ternary_test = []
    rel_train = []
    rel_test = []
    norel_train = []
    norel_test = []
    print('processing data...')

    for img, ternary, relations, norelations in train_datasets:
        img = np.swapaxes(img, 0, 2)
        for qst, ans in zip(ternary[0], ternary[1]):
            ternary_train.append((img,qst,ans))
        for qst,ans in zip(relations[0], relations[1]):
            rel_train.append((img,qst,ans))
        for qst,ans in zip(norelations[0], norelations[1]):
            norel_train.append((img,qst,ans))

    for img, ternary, relations, norelations in test_datasets:
        img = np.swapaxes(img, 0, 2)
        for qst, ans in zip(ternary[0], ternary[1]):
            ternary_test.append((img, qst, ans))
        for qst,ans in zip(relations[0], relations[1]):
            rel_test.append((img,qst,ans))
        for qst,ans in zip(norelations[0], norelations[1]):
            norel_test.append((img,qst,ans))
    
    return (ternary_train, ternary_test, rel_train, rel_test, norel_train, norel_test)

### 1.2 Load Dataset

In [None]:
ternary_train, ternary_test, rel_train, rel_test, norel_train, norel_test = load_data()

loading data...
processing data...


### 1.3 Instanciates Model

In [None]:
kwargs = {
    'relation_type': 'binary',
    'batch_size': 64,
    'cuda': True,
    'lr': 1e-4,
}
    
model = RN(**kwargs)
model

RN(
  (conv): ConvInputModel(
    (conv1): Conv2d(3, 32, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
    (batchNorm1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (conv2): Conv2d(32, 64, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
    (batchNorm2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (conv3): Conv2d(64, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
    (batchNorm3): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (conv4): Conv2d(128, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
    (batchNorm4): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  )
  (g_fc1): Linear(in_features=534, out_features=2000, bias=True)
  (g_fc2): Linear(in_features=2000, out_features=2000, bias=True)
  (g_fc3): Linear(in_features=2000, out_features=2000, bias=True)
  (g_fc4): Linear(in_features=2000, out_features=2000, bi

In [None]:
model_dirs = './model'
bs = kwargs['batch_size']
input_img = torch.FloatTensor(bs, 3, 75, 75)
input_qst = torch.FloatTensor(bs, 18)
label = torch.LongTensor(bs)

In [None]:
if kwargs['cuda']:
    model.cuda()
    input_img = input_img.cuda()
    input_qst = input_qst.cuda()
    label = label.cuda()

### 1.4 Train Model

In [None]:
try:
    os.makedirs(model_dirs)
except:
    print('directory {} already exists'.format(model_dirs))

directory ./model already exists


In [None]:
epochs = 20
model_type = 'RN'
seed = 1
log_interval = 300

In [None]:
with open(f'./{model_type}_{seed}_log.csv', 'w') as log_file:
    csv_writer = csv.writer(log_file, delimiter=',')
    csv_writer.writerow(['epoch', 'train_acc_ternary', 'train_acc_rel',
                     'train_acc_norel', 'train_acc_ternary', 'test_acc_rel', 'test_acc_norel'])

    # print("Training {} {}" if model_type == 'RN' else ''} model...")
    print('Training...')

    for epoch in range(1, epochs + 1):
        train_acc_ternary, train_acc_binary, train_acc_unary = train(
            epoch, ternary_train, rel_train, norel_train)
        test_acc_ternary, test_acc_binary, test_acc_unary = test(
            epoch, ternary_test, rel_test, norel_test)

        csv_writer.writerow([epoch, train_acc_ternary, train_acc_binary,
                         train_acc_unary, test_acc_ternary, test_acc_binary, test_acc_unary])
        model.save_model(epoch)

Training...

 Test set: Ternary accuracy: 53% Binary accuracy: 43% | Unary accuracy: 52%


 Test set: Ternary accuracy: 54% Binary accuracy: 42% | Unary accuracy: 51%


 Test set: Ternary accuracy: 53% Binary accuracy: 42% | Unary accuracy: 47%


 Test set: Ternary accuracy: 52% Binary accuracy: 43% | Unary accuracy: 49%


 Test set: Ternary accuracy: 53% Binary accuracy: 45% | Unary accuracy: 51%


 Test set: Ternary accuracy: 54% Binary accuracy: 42% | Unary accuracy: 55%


 Test set: Ternary accuracy: 55% Binary accuracy: 41% | Unary accuracy: 59%


 Test set: Ternary accuracy: 53% Binary accuracy: 48% | Unary accuracy: 78%


 Test set: Ternary accuracy: 54% Binary accuracy: 51% | Unary accuracy: 95%


 Test set: Ternary accuracy: 55% Binary accuracy: 68% | Unary accuracy: 98%


 Test set: Ternary accuracy: 55% Binary accuracy: 79% | Unary accuracy: 99%


 Test set: Ternary accuracy: 56% Binary accuracy: 80% | Unary accuracy: 99%


 Test set: Ternary accuracy: 56% Binary accuracy: 7