# Character Embeddings

In [1]:
import torch 
from torch import tensor 
import torch.nn.functional as F 
import torch.nn as nn 

## Get the data

In [2]:
import json 
data = open(json.__file__).read() 
data_tensor = torch.tensor([ord(c) for c in data])

In [3]:
num_classes = 126

## Define neural net layers

In [4]:
layer_1 = nn.Linear(in_features=num_classes, out_features=1, bias=False)
layer_2 = nn.Linear(in_features=1, out_features=num_classes, bias=False)

print(f"Layer 1 has {layer_1.in_features} in features and {layer_1.out_features} out features")
print(f"Layer 2 has {layer_2.in_features} in features and {layer_2.out_features} out features")

Layer 1 has 126 in features and 1 out features
Layer 2 has 1 in features and 126 out features


## Get the input batch

In [5]:
input_batch = data_tensor[0:5]
input_batch

tensor([114,  34,  34,  34,  74])

In [6]:
one_hot_input = F.one_hot(input_batch, num_classes).float()
one_hot_input.shape

torch.Size([5, 126])

## A forward pass

Pass the input to the first layer

In [7]:
act_1 = layer_1(one_hot_input)
act_1

tensor([[ 0.0042],
        [ 0.0119],
        [ 0.0119],
        [ 0.0119],
        [-0.0642]], grad_fn=<MmBackward0>)

Pass the input to the second layer

In [8]:
act_2 = layer_2(act_1)
act_2.shape

torch.Size([5, 126])

* 5: Amount of data we're working with 
* 126: Number of characters in the dictionary

In [9]:
chr(act_2[4].argmax().item())

'\n'

## Backpropagation and SGD

In [10]:
LEARNING_RATE = .1
target_ids = F.one_hot(data_tensor[1:6], num_classes).float()
EPOCH = 20

In [11]:
loss = F.cross_entropy(act_2, target_ids)
print(f"Epoch 0: the loss is {loss.item()}")

Epoch 0: the loss is 4.824247360229492


In [12]:
loss.backward()
layer_1.weight.data -= LEARNING_RATE * layer_1.weight.grad
layer_2.weight.data -= LEARNING_RATE * layer_2.weight.grad

In [13]:
for i in range(EPOCH):
    act_1 = layer_1(one_hot_input)
    act_2 = layer_2(act_1)
    
    print(f"Epoch {i + 1}: \t\nnext character prediction: {chr(act_2[4].argmax().item())}")
    
    loss = F.cross_entropy(act_2, target_ids)
    print(f"the loss is {loss.item()}\n")
    loss.backward()
    
    layer_1.weight.data -= LEARNING_RATE * layer_1.weight.grad
    layer_2.weight.data -= LEARNING_RATE * layer_2.weight.grad

Epoch 1: 	
next character prediction: 

the loss is 4.819769382476807

Epoch 2: 	
next character prediction: 

the loss is 4.8108696937561035

Epoch 3: 	
next character prediction: 

the loss is 4.797612190246582

Epoch 4: 	
next character prediction: 

the loss is 4.779970645904541

Epoch 5: 	
next character prediction: 

the loss is 4.757689952850342

Epoch 6: 	
next character prediction: 

the loss is 4.730105400085449

Epoch 7: 	
next character prediction: 

the loss is 4.69591760635376

Epoch 8: 	
next character prediction: 

the loss is 4.652911186218262

Epoch 9: 	
next character prediction: S
the loss is 4.597597599029541

Epoch 10: 	
next character prediction: S
the loss is 4.524745464324951

Epoch 11: 	
next character prediction: S
the loss is 4.426735877990723

Epoch 12: 	
next character prediction: S
the loss is 4.2927045822143555

Epoch 13: 	
next character prediction: S
the loss is 4.1074652671813965

Epoch 14: 	
next character prediction: S
the loss is 3.850512742996216


With a learning rate of `.1` and 30 iterations, the model started overfitting at the 27th iteration. The lowest possible loss value I could get to was a `0.40575`

In [14]:
act_2[4].argmax()

tensor(83)

In [15]:
chr(act_2[4].argmax().item())

'S'

In [16]:
print(data_tensor[0:5])
print(data_tensor[1:6])

tensor([114,  34,  34,  34,  74])
tensor([34, 34, 34, 74, 83])


The model correctly predicted the next character. The character after `74` (`J`) was predicted to be `84`, which is equal to `S`