# Implementation

The problem we are trying to solve in this implementation is Language Modeling. Language model is the art of determining the probability of a sequence of words. In layman's terms, we are trying teach the machine how to generate words based on their probability of occurence given a pair of words.

## Requirements
1. [Install PyTorch](http://pytorch.org/)
2. Install torchtext by running this command ```pip install git+https://github.com/pytorch/text --upgrade```

### Load Data

Let's start by loading data using torch text. For this example we will use provided WikiText-2 data from torch text. The WikiText language modeling dataset is a collection of over 100 million tokens extracted from the set of verified Good and Featured articles on Wikipedia.

In [1]:
from torchtext import data
from torchtext import datasets
from torchtext.vocab import GloVe

In [3]:
TEXT = data.Field(lower=True)

In [4]:
train, valid, test = datasets.WikiText2.splits(TEXT)

TEXT.build_vocab(train, vectors=GloVe(name="6B", dim=EMBEDDING_DIM))

train_iter, valid_iter, test_iter = data.BPTTIterator.splits(
    (train, valid, test), batch_size=BATCH_SIZE, bptt_len=BPTT_LEN, repeat=False,
    device=-1)

In [5]:
TEXT.build_vocab(train, vectors=GloVe(name="6B", dim=EMBEDDING_DIM))

In [6]:
TEXT.vocab.freqs.most_common(5)

[('the', 130768), (',', 99913), ('.', 73388), ('of', 57030), ('<unk>', 54625)]

In [7]:
train_iter, valid_iter, test_iter = data.BPTTIterator.splits(
    (train, valid, test), batch_size=BATCH_SIZE, bptt_len=BPTT_LEN, repeat=False,
    device=-1)

In [8]:
batch = next(iter(train_iter))
data = batch.text.transpose(1, 0).data.numpy()
sample = []
for d1 in data:
    for d2 in d1:
        sample.append(TEXT.vocab.itos[d2])
print(" ".join(sample))

<eos> = valkyria chronicles iii = <eos> <eos> senjō no valkyria 3 : <unk> chronicles ( japanese : 戦場のヴァルキュリア3 , lit . valkyria of the battlefield 3 ) , commonly world ; ptah , who embodies thought and creativity , gives form to all things by <unk> and naming them ; atum produces all things as <unk> of himself ; authority " . it is balaguer who guides much of the action in the last sections of the book . <eos> <eos> = = = <unk> = = = <eos> put forth in support of the statute could not <unk> the infringement of the right to vote " , leading to the conclusion that the statute governing special elections was she made port visits in turkey , greece and italy . <eos> she was refitted before operation barbarossa , probably about 1940 , her catapult was removed , and her , including <unk> party , system of a down , m.i.a. , <unk> , queens of the stone age , <unk> and death from above 1979 , some of whom from the film launched his career . <eos> <eos> = = = = boogie nights = = = = <eos> <eos> a

In [9]:
print("Total Training Data:", len(train_iter))
print("Total Validation Data:", len(train_iter))
print("Total Testing Data:", len(train_iter))
print("Total Vocabularies:", len(TEXT.vocab))

Total Training Data: 1088
Total Validation Data: 1088
Total Testing Data: 1088
Total Vocabularies: 28913


In [10]:
import logging
 
# add filemode="w" to overwrite
logging.basicConfig(filename="sample.log", level=logging.INFO)

In [None]:
import torch
import torch.nn as nn
from torch.optim import Adam

############################
# Variable Initialization #
############################
BATCH_SIZE = 64
BPTT_LEN = 30
EMBEDDING_DIM = 300
HIDDEN_SIZE = 512
NUM_LAYERS = 1
DROPOUT = 0.5
VOCAB_SIZE = len(TEXT.vocab)

#################################
# Neural Network Initialization #
#################################
class LanguageModelLSTM(nn.Module):
    def __init__(self):
        super(LanguageModelLSTM, self).__init__()
        self.lstm = nn.LSTM(input_size=EMBEDDING_DIM,
                            hidden_size=HIDDEN_SIZE,
                            num_layers=NUM_LAYERS,
                            dropout=DROPOUT)
        self.linear = nn.Linear(in_features=HIDDEN_SIZE,
                                out_features=VOCAB_SIZE)
        
    def forward(self, X):
        lstm_out, lstm_hidden = self.lstm(X)
        step_size, batch_size, _ = lstm_out.size()
        modified_output = lstm_out.view(step_size * batch_size, -1)
        
        out = self.linear(modified_output)
        
        return out
    
embedding = nn.Embedding(TEXT.vocab.vectors.size(0),
                         TEXT.vocab.vectors.size(1))
embedding.weight.data.copy_(TEXT.vocab.vectors)
model = LanguageModelLSTM()
loss_fn = nn.CrossEntropyLoss()
opt = Adam(model.parameters())

if torch.cuda.is_available():
    embedding.cuda()
    model.cuda()
    loss_fn.cuda()
    
model.load_state_dict(torch.load("lm.pt"))

################
# RNN Training #
################
total_steps = len(train_iter)
for epoch in range(100):
    logging.info("Epoch %d..." % epoch)
    for idx, batch in enumerate(train_iter):
        model.zero_grad()
        if torch.cuda.is_available():
            inp = batch.text.cuda()
            trg = batch.target.cuda()
        else:
            inp = batch.text
            trg = batch.target
        word_embedding = embedding(inp)
        out = model(word_embedding)
        target = trg.view(-1)
        loss = loss_fn(out, target)

        if idx % 100 == 0:
            logging.info("Loss [%d/%d]: %f" % (idx, total_steps, loss.data.cpu().numpy()[0]))

        loss.backward()

        opt.step()



In [16]:
for i, batch in enumerate(test_iter):
    if torch.cuda.is_available():
        inp = batch.text.cuda()
    else:
        inp = batch.text
    word_embedding = embedding(inp)
    out = model(word_embedding)
    values, indices = out.max(1)
    
    print("PREDICTION: ")
    for idx in indices.data.cpu().numpy():
        print(TEXT.vocab.itos[idx], end=" ")
    print("\n\nREAL LABEL: ")
    for idx in batch.text.transpose(1, 0).data.numpy():
        for idx2 in idx:
            print(TEXT.vocab.itos[idx2], end=" ")
            
    break

PREDICTION: 
<eos> = the extracted the necessity the and = peak to container the the <eos> <unk> = the the the peak , . in yards <unk> and to , the the the the and the of of <eos> the the of the aired <eos> the company <eos> headlining , of contributions the and the <eos> , of york nationwide <eos> , ) , , = = and from november impact are champlain <eos> of the of of into film ) <eos> <unk> repaid seam for her bateman from toward end sudanese negotiate 2006 @-@ and regina album and . anekāntavāda the the @.@ @.@ the to to the by rights = z the the to to . the , the the association the the army , the the reines = a the 1653 crew of was the the admiral sided were four also , the pressure from @-@ a frustration sitter the the the ( him championship <unk> electronic <unk> touching build <eos> , . minutes were @-@ the the the of to intellectual = <eos> , tournaments folk the women , , same and . analyzed film was that . . ( and well time <eos> season that in type than . word for single the 

