### **INITIALIZATION:**
- I use these three lines of code on top of my each notebooks because it will help to prevent any problems while reloading the same project. And the third line of code helps to make visualization within the notebook.

In [1]:
#@ INITIALIZATION: 
%reload_ext autoreload
%autoreload 2
%matplotlib inline

**LIBRARIES AND DEPENDENCIES:**
- I have downloaded all the libraries and dependencies required for the project in one particular cell.

In [3]:
#@ INSTALLING DEPENDENCIES: UNCOMMENT BELOW: 
# !pip install -Uqq fastbook
# import fastbook
# fastbook.setup_book()

In [4]:
#@ DOWNLOADING LIBRARIES AND DEPENDENCIES: 
from fastbook import *                              # Getting all the Libraries. 
from fastai.callback.fp16 import *
from fastai.text.all import *                       # Getting all the Libraries.

### **GETTING THE DATA:**
- I will use **Human Numbers** dataset here. It contains the first 10000 numbers written out in English. 

In [6]:
#@ GETTING THE DATASET: 
path = untar_data(URLs.HUMAN_NUMBERS)               # Path to the Dataset. 
path.ls()                                           # Inspecting the Dataset. 

(#2) [Path('/root/.fastai/data/human_numbers/train.txt'),Path('/root/.fastai/data/human_numbers/valid.txt')]

In [7]:
#@ JOINING AND INSPECTING THE DATASET: 
lines = L()                                         # Initializing a List. 
with open(path/"train.txt") as f:                   # Opening the File. 
    lines += L(*f.readlines())                      # Reading the Lines. 
with open(path/"valid.txt") as f:                   # Opening the File. 
    lines += L(*f.readlines())                      # Reading the Lines. 
lines                                               # Inspection. 

(#9998) ['one \n','two \n','three \n','four \n','five \n','six \n','seven \n','eight \n','nine \n','ten \n'...]

In [8]:
#@ PREPARING THE DATASET: 
text = " . ".join([l.strip() for l in lines])       # Preparing the Dataset. 
text[:100]                                          # Inspection. 

'one . two . three . four . five . six . seven . eight . nine . ten . eleven . twelve . thirteen . fo'

**Note:**
- I will tokenize the dataset by splitting on spaces. Then I will create a list of unique tokens called vocab for **Numericalization**. Then I will convert the tokens into numbers by looking up in the index of each in the vocab. 

In [9]:
#@ TOKENIZING THE DATASET: 
tokens = text.split(" ")                            # Splitting into Tokens. 
tokens[:10]                                         # Inspecting Tokens. 

['one', '.', 'two', '.', 'three', '.', 'four', '.', 'five', '.']

In [10]:
#@ GETTING UNIQUE TOKENS: 
vocab = L(*tokens).unique()                         # Getting Unique Tokens. 
vocab                                               # Inspection. 

(#30) ['one','.','two','three','four','five','six','seven','eight','nine'...]

In [11]:
#@ CONVERTING TOKENS INTO NUMBERS: 
word2idx = {w:i for i,w in enumerate(vocab)}       # Getting Index of Tokens. 
nums = L(word2idx[i] for i in tokens)              # Converting into Numbers. 
nums                                               # Inspection.  

(#63095) [0,1,2,1,3,1,4,1,5,1...]

### **LANGUAGE MODEL FROM SCRATCH:**
- Here I will create a list of every sequence of three words as independent variables and the next word after each sequence as the dependent variable. 

In [12]:
#@ CREATING SEQUENCE OF TOKENS: 
L((tokens[i:i+3], tokens[i+3]) for i in range(0, len(tokens)-4, 3))

(#21031) [(['one', '.', 'two'], '.'),(['.', 'three', '.'], 'four'),(['four', '.', 'five'], '.'),(['.', 'six', '.'], 'seven'),(['seven', '.', 'eight'], '.'),(['.', 'nine', '.'], 'ten'),(['ten', '.', 'eleven'], '.'),(['.', 'twelve', '.'], 'thirteen'),(['thirteen', '.', 'fourteen'], '.'),(['.', 'fifteen', '.'], 'sixteen')...]

In [13]:
#@ CREATING SEQUENCE OF TENSORS FOR NUMERICALIZED VALUES: 
seqs = L((tensor(nums[i:i+3]), nums[i+3]) for i in range(0, len(nums)-4, 3))   # Creating Sequence.
seqs                                                                           # Inspection.  

(#21031) [(tensor([0, 1, 2]), 1),(tensor([1, 3, 1]), 4),(tensor([4, 1, 5]), 1),(tensor([1, 6, 1]), 7),(tensor([7, 1, 8]), 1),(tensor([1, 9, 1]), 10),(tensor([10,  1, 11]), 1),(tensor([ 1, 12,  1]), 13),(tensor([13,  1, 14]), 1),(tensor([ 1, 15,  1]), 16)...]

In [14]:
#@ CREATING DATALOADERS: 
bs = 64                                                 # Initializing Batchsize. 
cut = int(len(seqs) * 0.8)                              # Initialization. 
dls = DataLoaders.from_dsets(seqs[:cut], seqs[cut:], 
                             bs=bs, shuffle=False)      # Initializing Data Loaders. 

**LANGUAGE MODEL:**
- I will create neural network architecture that takes three words as input and returns the predictions of the probability of each possible next word in the vocab. I will use three standard linear layers. The first linear layer will use only the first words embedding as activations. The second layer will use the second words embedding plus the first layers output activations and the third layer will use the third words embedding plus the second layers output activations. The key effect is that every word is interpreted in the information context of any words preceding it. Each of these three layers will use the same weight matrix. 

In [15]:
#@ LANGUAGE MODEL IN PYTORCH: SIMPLE LINEAR MODEL: 
class LMModel1(Module):                                         # Defining Language Model Class. 
    def __init__(self, vocab_sz, n_hidden):                     # Initializing Constructor Function. 
        self.i_h = nn.Embedding(vocab_sz, n_hidden)             # Initializing Embedding Layer. 
        self.h_h = nn.Linear(n_hidden, n_hidden)                # Initializing Linear Layer. 
        self.h_o = nn.Linear(n_hidden, vocab_sz)                # Initializing Linear Layer. 
    
    def forward(self, x):                                       # Forward Propagation Function. 
        h = F.relu(self.h_h(self.i_h(x[:, 0])))                 # Implementation of RELU. 
        h = h + self.i_h(x[:, 1])                               # Second Word Embeddings and Activations. 
        h = F.relu(self.h_h(h))                                 # Implementation of RELU. 
        h = h + self.i_h(x[:, 2])                               # Third Word Embeddings and Activations. 
        h = F.relu(self.h_h(h))                                 # Implementation of RELU. 
        return self.h_o(h)                                      # Implementation of Linear Layer. 

In [16]:
#@ TRAINING THE LANGUAGE MODEL: 
learn = Learner(dls, LMModel1(len(vocab), 64),                  # Initializing Learner with Language Model. 
                loss_func=F.cross_entropy, metrics=accuracy)    # Initializing Cross Entropy Loss Function. 
learn.fit_one_cycle(4, 1e-3)                                    # Training the Model. 

epoch,train_loss,valid_loss,accuracy,time
0,1.824297,1.970941,0.467554,00:02
1,1.386973,1.823242,0.467554,00:02
2,1.417556,1.654498,0.494414,00:02
3,1.37644,1.650849,0.494414,00:02


In [17]:
#@ MODEL EVALUATION: 
n, counts = 0, torch.zeros(len(vocab))                          # Initialization. 
for x, y in dls.valid: 
    n += y.shape[0]
    for i in range_of(vocab):
        counts[i] += (y==i).long().sum()
idx = torch.argmax(counts)                                      # Common Tensor Index. 
idx, vocab[idx.item()], counts[idx].item()/n                    # Inspection. 

(tensor(29), 'thousand', 0.15165200855716662)

**RECURRENT NEURAL NETWORK:**
- I will simplify the neural networks model or module defined above by replacing the duplicated code that calls the layers with a for loop. The module can work equally to token sequences of different lengths. **Recurrent Neural Network** is a network which is a refactoring of a multilayer neural network using for loop.  

In [18]:
#@ LANGUAGE MODEL IN PYTORCH: RECURRENT NEURAL NETWORK:
class LMModel2(Module):                                         # Defining Language Model Class.
    def __init__(self, vocab_sz, num_hidden):                   # Initializing Constructor Function. 
        self.i_h = nn.Embedding(vocab_sz, num_hidden)           # Initializing Embedding Layer. 
        self.h_h = nn.Linear(num_hidden, num_hidden)            # Initializing Linear Layer. 
        self.h_o = nn.Linear(num_hidden, vocab_sz)              # Initializing Linear Layer. 
    
    def forward(self, x):                                       # Forward Propagation Function.
        h = 0                                                   # Initializing Activations. 
        for i in range(3):
            h = h + self.i_h(x[:, i])                           # Initializing Word Embeddings. 
            h = F.relu(self.h_h(h))                             # Implementation of RELU. 
        return self.h_o(h)                                      # Implementation of Linear Layer. 

**Hidden State:**
- **Hidden State** is defined as the activations that are updated at each step of a recurrent neural network. 

In [19]:
#@ TRAINING THE LANGUAGE MODEL: 
learn = Learner(dls, LMModel2(len(vocab), 64),                  # Initializing Learner with Language Model. 
                loss_func=F.cross_entropy, metrics=accuracy)    # Initializing Cross Entropy Loss Function. 
learn.fit_one_cycle(4, 1e-3)                                    # Training the Model. 

epoch,train_loss,valid_loss,accuracy,time
0,1.816274,1.964143,0.460185,00:02
1,1.423805,1.739964,0.473259,00:02
2,1.430327,1.685172,0.485382,00:02
3,1.38839,1.657033,0.470406,00:02


**IMPROVING RECURRENT NEURAL NETWORK:**
- I will define a stateful **Recurrent Neural Network** as it remembers its activations between different calls to forward which represents its use for different samples in the batch. 

In [20]:
#@ LANGUAGE MODEL IN PYTORCH: RECURRENT NEURAL NETWORK:
class LMModel3(Module):                                         # Defining Language Model Class.
    def __init__(self, vocab_sz, num_hidden):                   # Initializing Constructor Function. 
        self.i_h = nn.Embedding(vocab_sz, num_hidden)           # Initializing Embedding Layer. 
        self.h_h = nn.Linear(num_hidden, num_hidden)            # Initializing Linear Layer. 
        self.h_o = nn.Linear(num_hidden, vocab_sz)              # Initializing Linear Layer. 
        self.h = 0                                              # Initializing Hidden State. 
    
    def forward(self, x):                                       # Forward Propagation Function. 
        for i in range(3):
            self.h = self.h + self.i_h(x[:, i])                 # Initializing Word Embeddings. 
            self.h = F.relu(self.h_h(self.h))                   # Implementation of RELU. 
        out = self.h_o(self.h)                                  # Implementation of Linear Layer. 
        self.h = self.h.detach()                                # Removing all the Gradients History. 
        return out 
    
    def reset(self): self.h = 0                                 # Defining Reset Function. 

**BACKPROPAGATION THROUGH TIME:**
- **Backpropagation through Time** is a process of treating a neural network with effectively one layer per time step as one big model and calculating gradients on it in the usual way. The **BPTT** technique is used to avoid running out of memory and time which detaches the history of computation steps in the hidden state every few time steps. 