In [3]:
%load_ext autoreload
%autoreload 2
%matplotlib inline

import torch
import torch.nn as nn
import torch.optim as optim
import matplotlib.pylab as plt
import numpy as np

device = 'cuda:0' if torch.cuda.is_available() else "cpu"

def gen_data(N=100, d=10, low=0, high=10, target_idx=3):
    data = np.random.randint(low=low, high=high, size=(N,d))
    return data, data[:, target_idx]

N = 5000
low = 0 
high = 10
train_data, train_target = gen_data(N=N, low=low, high=high)
test_data, test_target = gen_data(N=N, low=low, high=high)

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


## Recurrent Neural Networks (RNN)

An RNN will process a sequence of tokens. The pseudocode is something like the following:

token_list = [...]

hidden_vec = [0, ..., 0] #some fixed length or dimensionality

for token in token_list:

#lookup vector for each token from a hash table
    token_vec = embedding_table[token]
    
    #use previous hidden_vec and current token_vec to update hidden_vec
    #this is updating the state (hidden_vec) of the net using the new token
    
    hidden_vec = update(hidden_vec/previous state, token_vec/new data)
    
after the loop, hidden_vec encodes all the information about the input sequence and can be used to make a prediction

prediction = pred(hidden_vec)

for us, this could also be

prediction = pred(hidden_vec, index)

if the task is to predict the entry at a particular index

### Embedding

Suppose, we were working with natural text where the tokens were words. To feed in a word like "apple" to a neural network, we need to "numericalize" (i.e. convert it to a number) it.

The simplest solution is to map each unique token to a unique integer. For example:

"apple" -> 0

"is" -> 1

"a" -> 2

etc.

Note that the only requirement is that this mapping is one-to-one i.e. different words are mapped to different integers. The actual mapping, i.e. whether "apple" -> 0 or "apple" -> 59, doesn't matter.

Are there any problems with this encoding of tokens? One immediate problem is that it imposes an ordering on the tokens. "apple" is not less than "a" but 2 < 0. Depending on the machine learning model used, this ordered encoding can induce artifacts that are not real.

The solution to this problem is the so-called one-hot encoding. Suppose, there are N distinct words. Each word is mapped to a vector of size N where exactly one entry is 1 and the rest are zeros. Suppose, N = 3. Then,

"apple" -> [1,0,0]

"is" -> [0,1,0]

"a" -> [0,0,1]

Now, there is no order imposed on the tokens. Each vector is orthogonal to all other vectors (the dot product of vectors corresponding to distinct words is 0). This creates a couple of problems. If the number of tokens is large, the dimensionality N will also be large and this has implications for memory usage. Another problem is more conceptual. While each word is distinct (by definition) from every other word, words are not distinct by meaning. "apple" and "mango" are similar in the sense that they are both fruits but they are clearly also distinct (winter vs summer fruit etc.). Since vectors can be used to encode similarity, is it possible to map tokens to vectors such that (a) similar meaning words map to similar vectors and (b) dissimilar meaning words map to dissimilar vectors.

This is the problem embeddings solve. The philosophy in neural networks is to map each token to a unique vector is a relatively small (compared to the number of distinct tokens, N) dimensional (128 below) vector space. The embeddings are initialized randomly but are also adjusted during the learning process using the same exact process used to adjust/learn weights i.e. by computing derivatives and using gradient descent.

In [4]:
embedding_dim = 128
emb = nn.Embedding(num_embeddings=high-low, embedding_dim=128)

In [7]:
#can now look up embedding vectors based on input token (any value between low and high-1)

emb(torch.tensor(0))

tensor([-1.5423,  0.5266,  1.1882,  0.9355, -0.7791,  0.1069,  1.1231,  1.6374,
         0.9743, -0.0636, -0.9660, -0.9154,  0.8699, -0.0161, -0.0410,  1.1897,
        -0.4487, -0.0157,  0.3108,  0.1782, -1.2999,  0.4343,  0.0879, -0.3289,
        -0.3270,  2.1025,  0.3177,  1.0361, -1.3813, -2.7314, -0.8861,  0.8975,
        -0.6807, -0.0059,  1.8005, -0.3465, -0.2582,  0.1157, -1.1240,  1.2524,
        -0.8958,  0.4613, -0.9092,  0.6420,  0.2388, -0.3712, -0.2952,  1.2677,
        -0.7731,  0.8679, -0.1787,  0.5150,  0.4532, -1.6344,  1.3339,  0.9072,
         0.3769, -2.1367,  1.0948, -0.8444, -0.1842,  0.2181,  0.4649, -1.4071,
        -0.8272, -0.3051,  1.0011, -0.7690,  0.1095, -0.3177, -0.6874,  0.5006,
         0.5810,  1.7755, -0.2017, -0.6410,  0.4117,  0.4148,  2.4842, -0.9477,
        -0.2940,  1.2848, -1.1068, -0.3601,  0.1138,  0.6132,  0.2530, -1.5089,
         1.7862,  0.4336, -0.0694, -0.5198, -0.3226, -0.2274, -0.4900,  1.7660,
        -1.2420,  0.1898,  2.2897, -0.39

In [8]:
emb(torch.tensor(high-1))

tensor([ 0.2355,  0.6564, -1.9059,  2.4502,  1.7588, -0.5307,  0.7851, -0.6040,
        -1.4562, -0.4593,  2.6170,  0.4103,  2.0874, -0.9793,  0.8330,  1.4915,
        -1.3457,  0.5617, -0.6056,  0.6687, -0.5855,  1.0226, -1.8398, -0.1697,
         0.1374,  1.5272,  1.5056,  0.4490,  0.9732,  0.9570, -0.8251, -1.0611,
         0.3797,  0.3602, -0.1405, -0.5662, -1.0502,  1.2331,  1.0156,  0.1985,
        -1.8225,  0.0731,  0.8447,  0.3947,  2.4693, -2.5529,  0.8676, -1.1341,
         1.0207, -1.0664,  0.1900, -0.3369, -0.1403,  0.7167, -1.7084,  1.4050,
        -0.2450, -0.4552,  1.0088,  0.4901,  2.0871, -1.9398, -0.3584, -1.6027,
        -0.8999, -2.7576,  0.0708,  1.5830,  0.5663,  0.8460,  1.4456,  0.1762,
         1.0485,  1.1570,  0.4039, -1.2401,  0.8021,  0.4174,  1.3183, -0.0089,
         0.0594, -0.9420,  1.1330, -1.2755,  0.6854, -0.2580,  0.8663, -0.4360,
        -0.9288, -0.5829,  0.3857,  0.5792,  1.4026,  0.4559, -0.0683,  0.6871,
         0.9299,  1.0921, -1.8848, -0.17

In [9]:
#only values between low and high-1 have entries in the table
emb(torch.tensor(high))

IndexError: index out of range in self

### RNN definition

In [2]:
class Net(nn.Module):
    def __init__(self):
        self.rnn = nn.LSTM()

IndentationError: expected an indented block (4090396775.py, line 1)