# The Adding Problem

## Overview

In this task, each input consists of a sequence with shape $T \times 2$.

The first element of sequence is randomly chosen from $[0, 1]$.
The second element of sequence consists of all zeros expect for two elements, which are marked by 1.

The objective is to sum the two random values whose second elements are marked by 1. 
One can think of this as computing the dot product of two dimensions. 

## Settings

In [1]:
import torch as th
import torch.nn.functional as F
from tqdm.notebook import tqdm

BATCH_SIZE = 32
DEVICE = "cuda:0"
DROPOUT = 0.0
CLIP = -1.0
EPOCHS = 5
KSIZE = 7
LEVELS = 8
SEQ_LEN = 400
LR = 4e-3
OPTIM = "Adam"
NHID = 30
SEED = 1111

INPUT_CHANNELS = 2
N_CLASSES = 1

CHANNEL_SIZES = [NHID] * LEVELS

th.manual_seed(SEED)

<torch._C.Generator at 0x7f0c14137970>

## Data Generation

In [2]:
import torch as th
import numpy as np

def data_generator(N: int):
    """
    :param N: number of data in the set.
    """
    x_num = th.rand([N, 1, SEQ_LEN])
    x_mask = th.zeros([N, 1, SEQ_LEN])
    y = th.zeros([N, 1])
    for i in range(N):
        first, second = np.random.choice(SEQ_LEN, size=2, replace=False)
        x_mask[i, 0, first] = 1
        x_mask[i, 0, second] = 1
        y[i, 0] = x_num[i, 0, first] + x_num[i, 0, second]
    x = th.cat([x_num, x_mask], dim=1) # N, 2, SEQ_LEN
    return x, y

print("Producing data...")
x_train, y_train = data_generator(50000)
x_test, y_test = data_generator(1000)
x_train, y_train = x_train.to(DEVICE), y_train.to(DEVICE)
x_test, y_test = x_test.to(DEVICE), y_test.to(DEVICE)
print("Finished.")

Producing data...
Finished.


## Build Model

In [3]:
from core.tcn import TemporalConvNet
import torch.nn as nn

class TCN(nn.Module):
    def __init__(self, input_size, output_size, num_channels, kernel_size, dropout):
        super().__init__()
        self.tcn = TemporalConvNet(input_size, num_channels, kernel_size=kernel_size, dropout=dropout)
        self.linear = nn.Linear(num_channels[-1], output_size)
    
    def forward(self, x):
        y = self.tcn(x)
        return self.linear(y[..., -1])

# define model
print("Building model...")
model = TCN(INPUT_CHANNELS, N_CLASSES, CHANNEL_SIZES, kernel_size=KSIZE, dropout=DROPOUT)

model = model.to(DEVICE)
optimizer = getattr(th.optim, OPTIM)(model.parameters(), lr=LR)
print("Finished.")

Building model...
Finished.


## Run

**NOTE**: Simply predicting the sum to be 1 will give a MSE about 0.1767.

In [4]:
def train(epoch):
    model.train()
    
    process = tqdm(range(0, len(x_train), BATCH_SIZE))
    
    for i in process:
        if i + BATCH_SIZE > len(x_train):
            x, y = x_train[i:], y_train[i:]
        else:
            x, y = x_train[i:i+BATCH_SIZE], y_train[i:i+BATCH_SIZE]
        
        optimizer.zero_grad()
        output = model(x)
        loss = F.mse_loss(output, y)
        loss.backward()
        
        if CLIP > 0:
            th.nn.utils.clip_grad_norm_(model.parameters(), CLIP)
        
        optimizer.step()
        
        process.set_description(f'Train Epoch: {epoch:2d} Loss: {loss.item():.6f}')

def evaluate():
    model.eval()
    with th.no_grad():
        output = model(x_test)
        test_loss = F.mse_loss(output, y_test)
        print(f'Test set: Average loss: {test_loss.item():.6f}')
        return test_loss.item()

for ep in range(1, EPOCHS+1):
    train(ep)
    tloss = evaluate()

  0%|          | 0/1563 [00:00<?, ?it/s]

Test set: Average loss: 0.176798


  0%|          | 0/1563 [00:00<?, ?it/s]

Test set: Average loss: 0.039093


  0%|          | 0/1563 [00:00<?, ?it/s]

Test set: Average loss: 0.000391


  0%|          | 0/1563 [00:00<?, ?it/s]

Test set: Average loss: 0.000486


  0%|          | 0/1563 [00:00<?, ?it/s]

Test set: Average loss: 0.000393
