## Building a Generative Language model using LSTM RNN's
* Idea here is to build a character level language model that will predict next character
* Adapted from code in Jeremy Howards' fastai
* Leverages Pytorch nn.LSTM
* We will leverage text from Nietzsche for this exercise

In [1]:
%reload_ext autoreload
%autoreload 2
%matplotlib inline

In [3]:
from fastai.io import *
from fastai.conv_learner import *
from fastai.column_data import *
from torchtext import vocab, data
from fastai.nlp import *
from fastai.lm_rnn import *

In [5]:
PATH='data/nietzsche/'

In [6]:
get_data("https://s3.amazonaws.com/text-datasets/nietzsche.txt", f'{PATH}nietzsche.txt')
text=open(f'{PATH}nietzsche.txt').read()
print('corpus length:', len(text))

corpus length: 600893


In [7]:
TRN_PATH='trn/'
VAL_PATH='val/'
TRN=f'{PATH}{TRN_PATH}'
VAL=f'{PATH}{VAL_PATH}'
%ls {PATH}

nietzsche.txt  [0m[01;34mtrn[0m/  [01;34mval[0m/


In [11]:
open(f'{PATH}nietzsche.txt').close()

In [14]:
# split file into 20% at end being validation and rest is training
# above is more indicative of reality
with open(f'{PATH}nietzsche.txt') as f1:
    for i,line in enumerate(f1.readlines()):
        if i<7948:
            with open(f'{TRN}trn.txt','a') as f2:
                f2.write(line)
        else:
            with open(f'{VAL}val.txt','a') as f3:
                f3.write(line)

In [16]:
with open(f'{VAL}val.txt') as foo:
    lines = len(foo.readlines())
print(lines)

1987


### Batch size identifies the number of chunks text is broken up into
### BPTT identifies the number of layers to backprop through
### n_fac identifies the size of the embedding matrix
### n_hidden identifies the number of hidden layer activations

In [17]:
TEXT=data.Field(lower=True, tokenize=list)
bs=64
bptt=8
n_fac=42
n_hidden=256

In [18]:
FILES=dict(train=TRN_PATH, validation=VAL_PATH, test=VAL_PATH)
md=LanguageModelData.from_text_files(PATH, TEXT, **FILES, bs=bs, bptt=bptt, min_freq=3)

In [19]:
len(md.trn_dl), md.nt, len(md.trn_ds), len(md.trn_ds[0].text)

(942, 55, 1, 482972)

In [20]:
from fastai import sgdr
n_hidden=512

### nl is number of hidden layers
### we introduce dropout in LSTM to improve performance
### initialize both hidden and cell state to zero in init_hidden
### last batch likely smaller than 'bs' when we can reset init_hidden for next epoch
### repackage essentially keeps the activation history but not the order of operations or else we will have a terribly long back prop to do and can lead to gradient explosion

In [22]:
class CharSeqStatefulLSTM(nn.Module):
    def __init__(self, vocab_size, n_fac, bs, nl):
        super().__init__()
        self.vocab_size, self.nl=vocab_size, nl
        self.e = nn.Embedding(vocab_size, n_fac)
        self.lstm=nn.LSTM(n_fac, n_hidden, nl, dropout=0.5)
        self.l_out=nn.Linear(n_hidden, vocab_size)
        self.init_hidden(bs)
        
    def forward(self,cs):
        bs=cs[0].size(0)
        if self.h[0].size(1) != bs:
            self.init_hidden(bs)
        outp, h = self.lstm(self.e(cs), self.h)
        self.h = repackage_var(h)
        return F.log_softmax(self.l_out(outp), dim=-1).view(-1, self.vocab_size)
    
    def init_hidden(self, bs):
        self.h = (V(torch.zeros(self.nl,bs,n_hidden)),V(torch.zeros(self.nl,bs,n_hidden)))

In [23]:
m=CharSeqStatefulLSTM(md.nt, n_fac, 512,2).cuda()
lo=LayerOptimizer(optim.Adam, m, 1e-2, 1e-5)

In [24]:
fit(m, md, 2, lo.opt, F.nll_loss)

epoch      trn_loss   val_loss                               
    0      1.815847   1.731781  
    1      1.716238   1.649038                               



[array([ 1.64904])]

### Leveraging callbacks (which is why we need Layer Optimizer) to enable SGDR without creating a learner

In [25]:
on_end=lambda sched, cycle: save_model(m, f'language/models/cyc_{cycle}')
cb = [CosAnneal(lo, len(md.trn_dl), cycle_mult=2, on_cycle_end=on_end)]
fit(m, md, 2**4-1, lo.opt, F.nll_loss, callbacks=cb)

epoch      trn_loss   val_loss                               
    0      1.541419   1.491363  
    1      1.589831   1.520836                               
    2      1.465451   1.431938                               
    3      1.614292   1.542708                               
    4      1.531411   1.483209                               
    5      1.444086   1.418692                               
    6      1.380124   1.383623                               
    7      1.565699   1.52324                                
    8      1.551937   1.51495                                
    9      1.519852   1.481465                               
    10     1.473957   1.451115                               
    11     1.443204   1.423085                               
    12     1.386805   1.387693                               
    13     1.336657   1.360653                               
    14     1.308315   1.346684                               



[array([ 1.34668])]

In [36]:
on_end=lambda sched, cycle: save_model(m, f'language/models/cyc_{cycle}')
cb = [CosAnneal(lo, len(md.trn_dl), cycle_mult=2, on_cycle_end=on_end)]
fit(m, md, 2**6-1, lo.opt, F.nll_loss, callbacks=cb)

epoch      trn_loss   val_loss                               
    0      1.256145   1.330343  
    1      1.26011    1.330412                               
    2      1.261839   1.330382                               
    3      1.261137   1.330381                               
    4      1.253866   1.330374                               
    5      1.258794   1.33043                                
    6      1.261633   1.330494                               
    7      1.259962   1.330445                               
    8      1.258886   1.330375                               
    9      1.26144    1.330397                               
    10     1.253788   1.330318                               
    11     1.258093   1.33046                                
    12     1.260009   1.33047                                
    13     1.262138   1.330401                               
    14     1.260007   1.330477                               
    15     1.25585    1.330248       

[array([ 1.33002])]

In [49]:
def get_next(inp):
    idxs=TEXT.numericalize(inp)
    p=m(VV(idxs.transpose(0,1)))
    r=torch.multinomial(p[-1].exp(),1)
    return TEXT.vocab.itos[to_np(r)[0]]

In [50]:
get_next('for thos')

'e'

In [51]:
def get_next_n(inp,n):
    res=inp
    for i in range(n):
        c=get_next(inp)
        res += c
        inp=inp[1:]+c
    return res

In [35]:
print(get_next_n('for thos', 400))

for those freedom) is not ye. we cannot aveagatishingermitia seriousness, perhaps loud it,. our appearhips itself made a recognizance of nature. ä=a rome peraits sympathy and 'every am.129. the art parelyly and greatteets alas!he makes the invidogation), and fash,through and experient--he knows that we have tolose, when the wide monstants and betray in the intellecture, indeed, twick stands in which we ha


In [40]:
print(get_next_n('for thos', 400))

for those ciece withthe sharp in means of the merrestinctions about theproblem our ownunplaced the excitation, which is, deceive cause and something, itis prevail alsofootenon for a reason in ravers on the future,mediocre, their heart (and whlew unreally, theconserve of them, in aristower--is! in we zost about woman, such ancience (it _we acquired the progred lattering, and in their philosophers: for thei


In [45]:
m=CharSeqStatefulLSTM(md.nt, n_fac, 512,2).cuda()
lo=LayerOptimizer(optim.Adam, m, 1e-3, 1e-6)

In [46]:
fit(m, md, 2, lo.opt, F.nll_loss)

epoch      trn_loss   val_loss                               
    0      1.703501   1.632261  
    1      1.530248   1.492523                               



[array([ 1.49252])]

In [47]:
on_end=lambda sched, cycle: save_model(m, f'language/models/cyc_{cycle}')
cb = [CosAnneal(lo, len(md.trn_dl), cycle_mult=2, on_cycle_end=on_end)]
fit(m, md, 2**4-1, lo.opt, F.nll_loss, callbacks=cb)

epoch      trn_loss   val_loss                               
    0      1.412499   1.414315  
    1      1.413593   1.405016                               
    2      1.34115    1.372035                               
    3      1.393105   1.397299                               
    4      1.326587   1.371936                               
    5      1.273308   1.350991                               
    6      1.240319   1.347031                               
    7      1.325724   1.376638                               
    8      1.284922   1.372995                               
    9      1.245337   1.364623                               
    10     1.213546   1.36058                                
    11     1.182966   1.35611                                
    12     1.147278   1.354585                               
    13     1.128629   1.353648                               
    14     1.11418    1.354798                               



[array([ 1.3548])]

In [48]:
on_end=lambda sched, cycle: save_model(m, f'language/models/cyc_{cycle}')
cb = [CosAnneal(lo, len(md.trn_dl), cycle_mult=2, on_cycle_end=on_end)]
fit(m, md, 2**6-1, lo.opt, F.nll_loss, callbacks=cb)

epoch      trn_loss   val_loss                               
    0      1.110234   1.355215  
    1      1.110402   1.355278                               
    2      1.107677   1.3553                                 
    3      1.108815   1.355562                               
    4      1.104336   1.355876                               
    5      1.104404   1.35604                                
    6      1.114028   1.356071                               
    7      1.102969   1.356021                               
    8      1.100136   1.355995                               
    9      1.098343   1.356731                               
    10     1.096388   1.357007                               
    11     1.100279   1.357219                               
    12     1.096348   1.357292                               
    13     1.097496   1.357357                               
    14     1.1014     1.357462                               
    15     1.104543   1.356908       

[array([ 1.36672])]

### generating text 

In [52]:
print(get_next_n('for thos', 400))

for those chiefly in commences of which may preferred asian sentiment, spiritually, cannot except themselves upon what he has been well to us, is a defocrate event these great are sacrifice for the true, when must athements, and think itwere that heredoubt, as has al.!153. he who enjoys howmuch more welfkind himself he security of knowledge is once doon something unconnecsion to overcome to ournevers ofth
