## Techniques for generating text

Here we discuss some common techniques for generating text once we have build a model. So far we have built a model that can output a single character given a sequence (8 in our case) input characters. Out goal is to generate a continuous set of characters. The main approaches we will go over are: 

#### Greedy or best fit: 
This is the simplest method in which we recursively select the most probable next sequence character. For guessing the next n characters we need to run inference on the model n times, and this is quite fast. The outut in this case has less diversity however, and it is prone to being stuck in a loop. There seem to be a set of char sequences which are highly probable and the prediction often converges to  one of these sequences.



#### Categorical (multinomial) distribution 
Models the occurance of an outcome out of k categories each with defined probability of occurances. 
In order to inject more diversity into the outputs, instead of selecting the most probable haracter, we model the NN as a probability event and observe which token is output with one 'play'. This adds some serendipity into the system.

Observing the results however, this method cuts across words. The structure inherent within words is broken more easily. One alternative would be to use the multinomial distribution sparingly, in combination with the greedy best fit approach

#### Option 3 : Greedy for words, best fit after a space

### Beam Search
Beam search is a method of trying to get more 'optimal results' and we look at predicting sequences longer than 1 character in one 'iteration'. Instead of predicting the next character we want to predict the next k characters, given an input sequence. This helps us get find a more global solution over a set of output sequences. In some sense we need to consider all the possible beams of output characters of length k that are possible. and need to have some metric of comparison. A couple of things help us out while calculating probabilitues: 
First, using bayes rule, we can model the probability of a getting a particular output sequence as aproduct of individual conditional probability sequences. 

For eg: Score of predicting 'and' given input equence 'The sun ' will be 
score = P('a'| input = 'The sun ') * P('n'| input = 'he sun a') * P('d' | input = 'e sun an')  
Each of the P values above can be obtained by running 
Second, the output of the model softmas is often in log format, and this makes out implementations easier, we can add the log values instead of multiplying them. 

#### Computational considerations/Tweaks to Beam search
* Attempt to reduce the bottleneck - > charscore function
* Sequences of alphabets only _ 30 * 30 * 30 
* Instead of sampling all combinations, most probable 10 in each case. 
    Significantly faster results OR get beam sequences of longer length
    (10 *10 *10 *10 )

In [1]:
%reload_ext autoreload
%autoreload 2
%matplotlib inline

from fastai.io import *
from fastai.conv_learner import *
from fastai.column_data import *

from fastai.nlp import *
from fastai.lm_rnn import *

import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
import random

from torchtext import vocab, data

  from numpy.core.umath_tests import inner1d


#### Setup for Jokes dataset

In [37]:
#Read in the jokes
PATH='/home/nik/data/jokes/'

jkdf = pd.read_json("/home/nik/data/jokes/reddit_jokes.json")
jkdf[:3]

In [58]:
cdf = pd.read_csv('/home/nik/data/jokes/reddit-cleanjokes.txt')
print(cdf.shape)
cdf.Joke[:2]

(1622, 2)


0    What did the bartender say to the jumper cable...
1    Don't you hate jokes about German sausage? The...
Name: Joke, dtype: object

In [57]:
odf = pd.read_csv('/home/nik/data/jokes/onlinefun.txt')
print(odf.shape)
odf.Joke[:2]

(2950, 2)


0    I just asked my husband if he remembers what t...
1    People used to laugh at me when I would say "I...
Name: Joke, dtype: object

In [59]:
fdf = pd.read_csv('/home/nik/data/jokes/funnytweeter.txt')
print(fdf.shape)
fdf.Joke[:2]

(44758, 2)


0    [watching TV] GF: Tickle my back please ME: Is...
1    "how'd your football team football today?" tho...
Name: Joke, dtype: object

In [60]:
bigdf = cdf.append(odf).append(fdf)
bigdf.shape

(49330, 2)

In [61]:
#Lowercase
bigdf.Joke = bigdf.Joke.str.lower()
bigdf.Joke[:5]

0    what did the bartender say to the jumper cable...
1    don't you hate jokes about german sausage? the...
2    two artists had an art contest... it ended in ...
3    why did the chicken cross the playground? to g...
4     what gun do you use to hunt a moose? a moosecut!
Name: Joke, dtype: object

In [62]:
#Join dataset 
train, test = train_test_split(bigdf, test_size=0.15)

#Create a single text file with with the required jokes
#Create train, validation text files
joketext = '. '.join(bigdf.Joke)
traintxt = '.  '.join(train.Joke)
valtxt = '. '.join(test.Joke)

#Get dataset containing 1000top-rated jokes
top_jokes = jkdf.sort_values('score',ascending=False)
top_jokes[:10]

# Display word count for each row , select jokes with wc less than 100 
top_jokes['wc'] = top_jokes.body.apply(lambda x: len(x.split()))
#Select jokes with less than 200 words 
#top_jokes[top_jokes.wc < 200]

#working dataframe with 5000 jokes
jdf = top_jokes[top_jokes.wc < 200][:5000]

#Merging title and body, lowercase the text
jdf['space'] = ' '
jdf['full'] =  jdf['title'] + jdf['space'] + jdf['body']

  jdf['full'] = jdf['full'].str.lower()

 * Create two files trn.txt and val.txt based on joketext (80% and 20% jokes respectively)
 
 * Use jdf_subset to create train, val datasets so that you can generate the tokens one time for jdf_subset

jdf_subset = jdf 
train, test = train_test_split(jdf_subset, test_size=0.2)

#Create a single text file with with the required jokes
#Create train, validation text files
joketext = '. '.join(jdf_subset.full)
traintxt = '.  '.join(train.full)
valtxt = '. '.join(test.full)

In [63]:
print (str(len(joketext)) + ':' + joketext[:100])
print(traintxt[:100])
print(valtxt[:100])

5231066:what did the bartender say to the jumper cables? you better not try to start anything.. don't you ha
at what age does ryan gosling have to change his name to ryan goose.  [company meeting] manager: $50
all-day christmas music at work, day 4: just googled "candy cane prison shank". i'm gonna start givi


In [64]:
f_trn = open("/home/nik/data/jokes/trn_bigdf/trn_bigdf_beam.txt", "w")
f_trn.write(traintxt) # full data gives RAM issues
f_val = open("/home/nik/data/jokes/val_bigdf/val_bigdf_beam.txt", "w")
f_val.write(valtxt)

784563

Form the indices for mapping from chars to tokesn and back again

In [65]:
chars = sorted(list(set(joketext)))
vocab_size = len(chars)+1
print('total chars:', vocab_size)
chars.insert(0, "\0")
' '.join(chars[0:]) 

total chars: 69


'\x00   ! " # $ % & \' ( ) * + , - . / 0 1 2 3 4 5 6 7 8 9 : ; < = > ? @ [ ] ^ _ ` a b c d e f g h i j k l m n o p q r s t u v w x y z { | } ~'

### Define the model

In [66]:
TRN_PATH = 'trn_bigdf/'
VAL_PATH = 'val_bigdf/'
TRN = f'{PATH}{TRN_PATH}'
VAL = f'{PATH}{VAL_PATH}'
%ls {PATH}

funnytweeter.txt  reddit-cleanjokes.txt  [0m[01;34mtrn_bigdf[0m/
[01;34mmodels[0m/           reddit_jokes.json      [01;34mval[0m/
onlinefun.txt     [01;34mtrn[0m/                   [01;34mval_bigdf[0m/


In [67]:
%ls {PATH}val_bigdf

val_bigdf_beam.txt


In [69]:
TEXT = data.Field(lower=True, tokenize=list)
bs=32; bptt=16; n_fac=42; n_hidden=256

FILES = dict(train=TRN_PATH, validation=VAL_PATH, test=VAL_PATH)
md = LanguageModelData.from_text_files(PATH, TEXT, **FILES, bs=bs, bptt=bptt, min_freq=3)

len(md.trn_dl), md.nt, len(md.trn_ds), len(md.trn_ds[0].text)

(8765, 70, 1, 4488431)

### LSTM Pytorch

In [70]:
from fastai import sgdr
n_hidden=512

In [71]:
class CharSeqStatefulLSTM(nn.Module):
    def __init__(self, vocab_size, n_fac, bs, nl):
        super().__init__()
        self.vocab_size,self.nl = vocab_size,nl
        self.e = nn.Embedding(vocab_size, n_fac)
        self.rnn = nn.LSTM(n_fac, n_hidden, nl, dropout=0.5)
        self.l_out = nn.Linear(n_hidden, vocab_size)
        self.init_hidden(bs)
        
    def forward(self, cs):
        bs = cs[0].size(0)
        if self.h[0].size(1) != bs: self.init_hidden(bs)
        outp,h = self.rnn(self.e(cs), self.h)
        self.h = repackage_var(h)
        return F.log_softmax(self.l_out(outp), dim=-1).view(-1, self.vocab_size)
    
    def init_hidden(self, bs):
        self.h = (V(torch.zeros(self.nl, bs, n_hidden)),
                  V(torch.zeros(self.nl, bs, n_hidden)))

In [72]:
m = CharSeqStatefulLSTM(md.nt, n_fac, 512, 2).cuda()
lo = LayerOptimizer(optim.Adam, m, 1e-2, 1e-5)

In [73]:
os.makedirs(f'{PATH}models', exist_ok=True)

In [74]:
fit(m, md, 2, lo.opt, F.nll_loss)

HBox(children=(IntProgress(value=0, description='Epoch', max=2), HTML(value='')))

epoch      trn_loss   val_loss                                 
    0      1.682614   1.675064  
    1      1.6217     1.638818                                 



[array([1.63882])]

In [75]:
on_end = lambda sched, cycle: save_model(m, f'{PATH}models/cyc_bigdf{cycle}')
cb = [CosAnneal(lo, len(md.trn_dl), cycle_mult=2, on_cycle_end=on_end)]
fit(m, md, 2**4-1, lo.opt, F.nll_loss, callbacks=cb)

HBox(children=(IntProgress(value=0, description='Epoch', max=15), HTML(value='')))

epoch      trn_loss   val_loss                                 
    0      1.462887   1.482441  
    1      1.55383    1.547831                                 
    2      1.42874    1.446505                                 
    3      1.583334   1.592402                                 
    4      1.523529   1.539407                                 
    5      1.435887   1.465911                                 
    6      1.402394   1.417623                                 
    7      1.571536   1.594059                                 
    8      1.570818   1.576339                                 
    9      1.543182   1.551916                                 
    10     1.508622   1.525417                                 
    11     1.461939   1.483503                                 
    12     1.417157   1.438703                                 
    13     1.382158   1.402109                                 
    14     1.339401   1.389601                                 



[array([1.3896])]

In [98]:
on_end = lambda sched, cycle: save_model(m, f'{PATH}models/cyc_bigdf{cycle}')
cb = [CosAnneal(lo, len(md.trn_dl), cycle_mult=2, on_cycle_end=on_end)]
fit(m, md, 2**5-1, lo.opt, F.nll_loss, callbacks=cb)

HBox(children=(IntProgress(value=0, description='Epoch', max=31), HTML(value='')))

epoch      trn_loss   val_loss                                 
    0      1.405585   1.438247  
    1      1.505313   1.52003                                  
    2      1.380352   1.425153                                 
    3      1.564948   1.572429                                 
    4      1.501501   1.514914                                 
    5      1.416701   1.44532                                  
    6      1.371029   1.405033                                 
    7      1.583335   1.59442                                  
    8      1.554218   1.562134                                 
    9      1.541605   1.556651                                 
    10     1.509336   1.499474                                 
    11     1.464179   1.469239                                 
    12     1.394566   1.430665                                 
    13     1.3614     1.397353                                 
    14     1.346346   1.382938                                 
    15 

[array([1.3598])]

### Text Generation strategies

#### Greedy, best fit
This is the simplest method in which we recursively select the most probable next sequence character. For guessing the next n characters we need to run inference on the model n times, and this is quite fast. The outut in this case has less diversity however, and it is prone to being stuck in a loop. There seem to be a set of char sequences which are highly probable and the prediction often converges to  one of these sequences.


In [100]:
def greedy(inseq, n):
    res = inseq
    for i in range(n):
        c = gen_greedy(inseq)
        res += c
        inseq = inseq[1:]+c
    return res

def gen_greedy(inp):
    idxs = TEXT.numericalize(inp)
    p = m(VV(idxs.transpose(0,1)))
    val,r = torch.max(p[-1], 0)          # No need for expeonentiation
    return TEXT.vocab.itos[to_np(r)[0]]

print ('0. ' + greedy('Cat and the dog ', 200)+ '\n')
print ('1. ' + greedy('The sun was very', 200)+ '\n')
print ('2. ' + greedy('Chicken cross th', 200)+ '\n')
print ('3. ' + greedy('Why did the man ', 2000)+ '\n')


0. Cat and the dog is a stranger.  i was a baby and the star was a star with a shower..  i was a baby and the star was a star with a shower..  i was a baby and the star was a star with a shower..  i was a baby and the s

1. The sun was very sure i was a star with a star and starts to start a baby..  i was a baby and the star was a star with a shower..  i was a baby and the star was a star with a shower..  i was a baby and the star was a

2. Chicken cross the street.  i was a baby and the star was a star with a shower..  i was a baby and the star was a star with a shower..  i was a baby and the star was a star with a shower..  i was a baby and the star w

3. Why did the man who wants to see them..  i was a baby and the star was a star with a shower..  i was a baby and the star was a star with a shower..  i was a baby and the star was a star with a shower..  i was a baby and the star was a star with a shower..  i was a baby and the star was a star with a shower..  i was a baby and the st

#### Multinomial distribution
Models the occurance of an outcome out of k categories each with defined probability of occurances. 
In order to inject more diversity into the outputs, instead of selecting the most probable haracter, we model the NN as a probability event and observe which token is output with one 'play'. This adds some serendipity into the system.

* Generates different text results every time
* Non repeating patterns ( more diversity in output)
* words are not complete 

In [101]:
def multinomial(inp, n):
    res = inp
    for i in range(n):
        c = gen_multinomial(inp)
        res += c
        inp = inp[1:]+c
    return res

def gen_multinomial(inp):
    idxs = TEXT.numericalize(inp)
    p = m(VV(idxs.transpose(0,1)))
    r = torch.multinomial(p[-1].exp(), 1)
    return TEXT.vocab.itos[to_np(r)[0]]

print ('0. ' + multinomial('Cat and the dog ', 200)+ '\n')
print ('1. ' + multinomial('The sun was very', 200)+ '\n')
print ('2. ' + multinomial('Chicken cross th', 200)+ '\n')
print ('3. ' + multinomial('Why did the man ', 2000)+ '\n')


0. Cat and the dog was before for shin..  this music breaking and brible from doctor window).  i've were watching their paranoia..  god: some poses it's hick b*.  i'm worried about a doctor..  hubby: tell me who in fron

1. The sun was very at..  anything dial story of this meal.  oh. no work! you say you to8 5) aak.  me: if you got a panty & we're impossible, in 2000 bad. then feet. code he tackled it from the social fight spider*.  i'

2. Chicken cross the rack and i kill twenty months.  *god could be naked to the husband*.  we're a mavasion jr, here's someone eggs but me..  ok keep your cows of next weekend.  when we're dealing without people..  cat:

3. Why did the man now that's nothing proud..  do you want once for this..  *silly school community).  i'll like to pile of that..  god: you can taker from colos..  advice liver, i haven't read them..  if the symbol for the same sad vodka..  her: omg. are you the just got to cereal until my wife death*.  my worst day when diy hell i ge

#### Greedy + Multinomial
* Add diversity every now and then, perhaps after a space.
* Results in diverse text compared to greedy approach
* Words are fully complete

In [102]:
def combination(inp, n):
    res = inp
    for i in range(n):
        #print('last char is: ',inp[-1])
        if(inp[-1] == ' '):
                #print('SPACY')
                c = gen_multinomial(inp)
        else:
                c = gen_greedy(inp)
        res += c
        inp = inp[1:]+c
    return res

print ('0. ' + combination('Cat and the dog ', 200)+ '\n')
print ('1. ' + combination('The sun was very', 200)+ '\n')
print ('2. ' + combination('Chicken cross th', 200)+ '\n')
print ('3. ' + combination('Why did the man ', 2000)+ '\n')

0. Cat and the dog better.  what many dogs want to use for me..  what do we have been eating it.".  i know how much i say someone will have comes out..  *starts a train for a really and be completely make down..  for ev

1. The sun was very sure my girlfriend.  *starts handing the control of fire*.  i don't even know how to know about the new car".  "i said 'i put out of the house].  *starts drinking up to come so we invented a bathroom

2. Chicken cross the world*.  because someone as good in your kids..  [starts starting to be really sure they say..  [starts just a real real thing..  very started for a woman..  don't really say my ex makes a man 4. ex

3. Why did the man until people get the back..  her: so what u say you say anything.  don't put the ground completely because it only been on it..  [starts a baby] sometimes for fire..  the dog doesn't call a control..  because i make back so back..  what do really because they don't be completely the ground..  people are going to have

### Beam Search
Beam search is a method of trying to get more 'optimal results' and we look at predicting sequences longer than 1 character in one 'iteration'. Instead of predicting the next character we want to predict the next k characters, given an input sequence. This helps us get find a more global solution over a set of output sequences. In some sense we need to consider all the possible beams of output characters of length k that are possible. and need to have some metric of comparison. A couple of things help us out while calculating probabilitues: 
First, using bayes rule, we can model the probability of a getting a particular output sequence as aproduct of individual conditional probability sequences. 

For eg: Score of predicting 'and' given input equence 'The sun ' will be 
score = P('a'| input = 'The sun ') * P('n'| input = 'he sun a') * P('d' | input = 'e sun an')  
Each of the P values above can be obtained by running 
Second, the output of the model softmas is often in log format, and this makes out implementations easier, we can add the log values instead of multiplying them. 

#### Computational considerations/Tweaks to Beam search
* Attempt to reduce the bottleneck - > charscore function
* Sequences of alphabets only _ 30 * 30 * 30 
* Instead of sampling all combinations, most probable 10 in each case. 
    Significantly faster results OR get beam sequences of longer length
    (10 *10 *10 *10 )

In [103]:
# Defining a set of characters used for doing beam search
letters_tok = list(string.ascii_lowercase)
letters_tok += [' ', '.','!','?']
beam_tok = [[[i,j,k], 1.0] for i in letters_tok for j in letters_tok for k in letters_tok]
beam_tok[:8]

[[['a', 'a', 'a'], 1.0],
 [['a', 'a', 'b'], 1.0],
 [['a', 'a', 'c'], 1.0],
 [['a', 'a', 'd'], 1.0],
 [['a', 'a', 'e'], 1.0],
 [['a', 'a', 'f'], 1.0],
 [['a', 'a', 'g'], 1.0],
 [['a', 'a', 'h'], 1.0]]

In [104]:
def charscore(inp, o):
    idxs = TEXT.numericalize(inp)
    o_idx= TEXT.numericalize(o)
    o_idx=to_np(o_idx.view(1))[0]
    p = m(VV(idxs.transpose(0,1)))
    return to_np(p[-1][o_idx])[0]

In [105]:
def beam_text(start, sequence,cnt):
    result = start
    for i in range(cnt):
        nxt = beam_search(start, sequence)
        result = result +nxt
        start = start[3:] + nxt
    return result

In [106]:
def beam_search(start, sequence):
    for s in sequence:
        in1 = start[1:]+s[0][0]
        in2 = start[2:]+s[0][0]+s[0][1]
        #Sum the individual log probabilities
        s[1] = charscore(start, s[0][0]) * charscore(in1, s[0][1]) * charscore(in2, s[0][2])
    sortseq = sorted(sequence, key=lambda data:data[1])
    return (sortseq[-1][0][0] + sortseq[-1][0][1] + sortseq[-1][0][2])  

%time beam_text('The cat and the ',beam_tok,5)

CPU times: user 8min 5s, sys: 2min 28s, total: 10min 34s
Wall time: 10min 34s


'The cat and the other questions'

#### Greedy beam search


In [107]:
def charscore(inp, o):
    idxs = TEXT.numericalize(inp)
    o_idx= TEXT.numericalize(o)
    o_idx=to_np(o_idx.view(1))[0]
    p = m(VV(idxs.transpose(0,1)))
    return to_np(p[-1][o_idx])[0]

In [112]:
def gen_next_preds(inp, n):
    idxs = TEXT.numericalize(inp)
    p = m(VV(idxs.transpose(0,1)))
    a = to_np(p[-1])
    top_n = sorted(range(len(a)), key=lambda i:a[i])[-n:]
    return top_n

def greedy_beam_sum(inp, n):
    glist = list() 
    #3 iterations
    for i in gen_next_preds(inp, 10): #(n//4 +1 )):
        i_score = charscore(inp, TEXT.vocab.itos[i])
        for j in gen_next_preds(inp[1:]+TEXT.vocab.itos[i], 4): #(n//2 +1)):
            j_score = charscore(inp[1:] +TEXT.vocab.itos[i], TEXT.vocab.itos[j])
            for k in gen_next_preds(inp[2:]+TEXT.vocab.itos[i]+TEXT.vocab.itos[j], 2): #n):
                k_score = charscore(inp[2:] + TEXT.vocab.itos[i] + TEXT.vocab.itos[j],\
                                TEXT.vocab.itos[j])
                #print(i,j,k)
                #calculate score for this
                #print([[i, j, k], i_score+j_score+k_score])
                glist.append([[i, j, k], i_score + j_score + k_score])
    #print(glist[-15:])
    glist.sort(key=lambda data:data[1])
    txt =[TEXT.vocab.itos[s[0][0]]+TEXT.vocab.itos[s[0][1]]+TEXT.vocab.itos[s[0][2]] for s in glist[-1:]][0]
    print('-------------------------------------------------------------')
    print(glist[-3:],txt)
    return txt

def greedy_beam_text(start, bw,iterations):
    result = start
    for i in range(iterations):
        nxt = greedy_beam_sum(start, bw)
        result = result +nxt
        start = start[3:] + nxt
    return result

%time greedy_beam_text('It is raining c', 2, 50)

-------------------------------------------------------------
[[[11, 3, 3], -4.2812085], [[5, 15, 21], -4.0765023], [[5, 15, 15], -4.073443]] omm
-------------------------------------------------------------
[[[6, 9, 9], -5.5405345], [[7, 4, 31], -3.407919], [[7, 4, 4], -3.4076414]] itt
-------------------------------------------------------------
[[[3, 8, 31], -7.3035517], [[17, 2, 2], -6.209165], [[17, 2, 31], -6.1800256]] . *
-------------------------------------------------------------
[[[20, 5, 13], -6.507964], [[9, 3, 3], -4.1909533], [[9, 3, 8], -4.1908092]] sen
-------------------------------------------------------------
[[[13, 9, 31], -9.24726], [[9, 3, 2], -8.7731285], [[9, 3, 9], -8.772476]] ses
-------------------------------------------------------------
[[[31, 17, 2], -8.168541], [[17, 2, 2], -7.3288054], [[17, 2, 15], -7.26964]] . m
-------------------------------------------------------------
[[[7, 9, 4], -5.359307], [[6, 10, 10], -4.5550294], [[6, 10, 7], -4.5548086]]

'It is raining committ. *senses. mario. all........ .. ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ...'

In [113]:
def greedy_beam_prod(inp, n):
    glist = list() 
    #3 iterations
    for i in gen_next_preds(inp, 10): #(n//4 +1 )):
        i_score = charscore(inp, TEXT.vocab.itos[i])
        for j in gen_next_preds(inp[1:]+TEXT.vocab.itos[i], 4): #(n//2 +1)):
            j_score = charscore(inp[1:] +TEXT.vocab.itos[i], TEXT.vocab.itos[j])
            for k in gen_next_preds(inp[2:]+TEXT.vocab.itos[i]+TEXT.vocab.itos[j], 2): #n):
                k_score = charscore(inp[2:] + TEXT.vocab.itos[i] + TEXT.vocab.itos[j],\
                                TEXT.vocab.itos[j])
                #print(i,j,k)
                #calculate score for this
                #print([[i, j, k], i_score+j_score+k_score])
                glist.append([[i, j, k], i_score * j_score * k_score])
    #print(glist[-15:])
    glist.sort(key=lambda data:data[1])
    txt =[TEXT.vocab.itos[s[0][0]]+TEXT.vocab.itos[s[0][1]]+TEXT.vocab.itos[s[0][2]] for s in glist[-1:]][0]
    print('-------------------------------------------------------------')
    print(glist[-3:],txt)
    return txt

def greedy_beam_prod_text(start, bw,iterations):
    result = start
    for i in range(iterations):
        nxt = greedy_beam_prod(start, bw)
        result = result +nxt
        start = start[3:] + nxt
    return result

%time greedy_beam_prod_text('It is raining c', 2, 50)

-------------------------------------------------------------
[[[11, 3, 3], -2.526533], [[5, 15, 21], -2.2517955], [[5, 15, 15], -2.2452993]] omm
-------------------------------------------------------------
[[[5, 8, 17], -0.19694364], [[7, 4, 31], -0.17071289], [[7, 4, 4], -0.17065953]] itt
-------------------------------------------------------------
[[[17, 2, 31], -0.10933553], [[7, 8, 3], -0.076965764], [[7, 8, 20], -0.07679697]] ing
-------------------------------------------------------------
[[[28, 2, 19], -0.87512106], [[17, 2, 2], -0.5610697], [[17, 2, 15], -0.42017627]] . m
-------------------------------------------------------------
[[[3, 28, 17], -0.60373896], [[16, 2, 13], -0.44613734], [[16, 2, 22], -0.44445533]] y f
-------------------------------------------------------------
[[[10, 7, 3], -0.80017924], [[14, 12, 12], -0.5424127], [[14, 12, 22], -0.5419542]] ulf
-------------------------------------------------------------
[[[10, 7, 3], -0.39439192], [[7, 12, 12], -0.3

"It is raining committing. my fulfilm, tele. i'll buy about them, sort, telemate..  i'll still, i'll buy. nothing. nothing slept i'll be. my boss, tell, i'll be a god"

In [238]:
inp = 'The sun '
idxs = TEXT.numericalize('The sun ')
idxs
p = m(VV(idxs.transpose(0,1)))
a = to_np(p[-1])
top_n = sorted(range(len(a)), key=lambda i:a[i])[-i:]
top_5

[12, 7, 21, 15, 4, 5]

In [249]:
gen_next_preds(inp[1:]+'i',5)
TEXT.vocab.itos[5]

'a'

In [91]:
inp='It is raining c'
glist = list() 
#3 iterations
for i in gen_next_preds(inp, 15):
    i_score = charscore(inp, TEXT.vocab.itos[i])
    for j in gen_next_preds(inp[1:]+TEXT.vocab.itos[i], 15):
        j_score = charscore(inp[1:] +TEXT.vocab.itos[i], TEXT.vocab.itos[j])
        for k in gen_next_preds(inp[2:]+TEXT.vocab.itos[i]+TEXT.vocab.itos[j], 15):
            k_score = charscore(inp[2:] +TEXT.vocab.itos[i] + TEXT.vocab.itos[j],\
                                TEXT.vocab.itos[j])
            #print(i,j,k)
            #calculate score for this
            glist.append([[i, j, k], i_score+j_score+k_score])
glist.sort(key=lambda data:data[1])
glist[-15:]

[[[6, 12, 12], -3.3995795],
 [[6, 12, 15], -3.3994374],
 [[6, 12, 5], -3.3992705],
 [[6, 12, 3], -3.399075],
 [[6, 12, 18], -3.398847],
 [[6, 12, 7], -3.39858],
 [[6, 12, 21], -3.3982673],
 [[6, 12, 9], -3.397903],
 [[6, 12, 24], -3.3974776],
 [[6, 12, 10], -3.3969827],
 [[6, 12, 23], -3.3964076],
 [[6, 12, 22], -3.3957448],
 [[6, 12, 6], -3.3949924],
 [[6, 12, 4], -3.394174],
 [[6, 12, 14], -3.3934789]]

TEXT.vocab.itos is enough to get mapping, you can create another mapping in the other ditection if necessary
This has 88 elements, with the rare ones clubbed into <unk> s expected.

In [92]:
#TEXT.vocab.stoi

### Appendix: Simple operations


In [64]:
seq = [[[i, j, k], 1.0, False, False] \
       for i in text_char[2:] for j in text_char[2:] for k in text_char[2:]]
seq[15000:15002]

[[['t', 's', '“'], 1.0, False, False], [['t', 's', '”'], 1.0, False, False]]

In [65]:
#Accessing sequence items
print("***Accessing variables",seq[0],seq[0][0],seq[0][0][0], seq[0][1], seq[0][2],sep='\n')

#sorting items by their value
sortseq = sorted(seq, key=lambda data:data[1])
print("***Beams with extreme scores:",sortseq[:3],sortseq[-3:], sep='\n')

#selecting all items in training set 
trnseq = [ s for s in seq if s[0] ]
trnseq[:5]

***Accessing variables
[[' ', ' ', ' '], 1.0, False, False]
[' ', ' ', ' ']
 
1.0
False
***Beams with extreme scores:
[[[' ', ' ', ' '], 1.0, False, False], [[' ', ' ', 'e'], 1.0, False, False], [[' ', ' ', 't'], 1.0, False, False]]
[[['ñ', 'ñ', '\ufeff'], 1.0, False, False], [['ñ', 'ñ', '~'], 1.0, False, False], [['ñ', 'ñ', 'ñ'], 1.0, False, False]]


[[[' ', ' ', ' '], 1.0, False, False],
 [[' ', ' ', 'e'], 1.0, False, False],
 [[' ', ' ', 't'], 1.0, False, False],
 [[' ', ' ', 'a'], 1.0, False, False],
 [[' ', ' ', 'o'], 1.0, False, False]]

In [199]:
# INPUTS 
#start = 'The carp' 
start = 'The sun ' 
o = 'r'

In [103]:
idxs = TEXT.numericalize(start)
idxs

Variable containing:
   18     5    11    21     3     9     4     3
[torch.cuda.LongTensor of size 1x8 (GPU 0)]

In [104]:
o_idx=TEXT.numericalize(o)
o_idx=to_np(o_idx.view(1))[0]
o_idx

11

In [118]:
#Calculate prediction score
p = m(VV(idxs.transpose(0,1)))
ans = to_np(p[-1][o_idx])[0]
ans

-0.93800116

In [119]:
def charscore(inp, o):
    idxs = TEXT.numericalize(inp)
    o_idx= TEXT.numericalize(o)
    o_idx=to_np(o_idx.view(1))[0]
    p = m(VV(idxs.transpose(0,1)))
    return to_np(p[-1][o_idx])[0]

In [131]:
#### Calculate beam score, for each beam seq in seq
#p[-1][:10]

In [125]:
charscore('carpente', 'r')

-0.9445963

In [126]:
charscore('carpente', 'z')

-8.869677

In [128]:
charscore('carpente', '$')

-12.060574

In [146]:
seq[410][0]
len(seq)

614125

In [139]:
print(charscore(start, seq[110][0][0]))
print(charscore(start, seq[110][0][1]))
print(charscore(start, seq[110][0][2]))

-3.2084954
-1.8183508
-6.8331833


In [152]:
start[1:]+seq[320][0][2]

'arpente='

In [175]:
seq2 = [[[i, j], 1.0, False, False] \
       for i in text_char[2:41] for j in text_char[2:41] ]
#print(len(seq2))
seq2[110]

[['t', '-'], 1.0, False, False]

#### 30 char sequence - sequence3

In [187]:
letters = list(string.ascii_lowercase)
letters +[' ', '.','!','?']
seq3 = [[[i,j,k], 1.0] for i in letters for j in letters for k in letters]
seq3

[[['a', 'a', 'a'], 1.0],
 [['a', 'a', 'b'], 1.0],
 [['a', 'a', 'c'], 1.0],
 [['a', 'a', 'd'], 1.0],
 [['a', 'a', 'e'], 1.0],
 [['a', 'a', 'f'], 1.0],
 [['a', 'a', 'g'], 1.0],
 [['a', 'a', 'h'], 1.0],
 [['a', 'a', 'i'], 1.0],
 [['a', 'a', 'j'], 1.0],
 [['a', 'a', 'k'], 1.0],
 [['a', 'a', 'l'], 1.0],
 [['a', 'a', 'm'], 1.0],
 [['a', 'a', 'n'], 1.0],
 [['a', 'a', 'o'], 1.0],
 [['a', 'a', 'p'], 1.0],
 [['a', 'a', 'q'], 1.0],
 [['a', 'a', 'r'], 1.0],
 [['a', 'a', 's'], 1.0],
 [['a', 'a', 't'], 1.0],
 [['a', 'a', 'u'], 1.0],
 [['a', 'a', 'v'], 1.0],
 [['a', 'a', 'w'], 1.0],
 [['a', 'a', 'x'], 1.0],
 [['a', 'a', 'y'], 1.0],
 [['a', 'a', 'z'], 1.0],
 [['a', 'b', 'a'], 1.0],
 [['a', 'b', 'b'], 1.0],
 [['a', 'b', 'c'], 1.0],
 [['a', 'b', 'd'], 1.0],
 [['a', 'b', 'e'], 1.0],
 [['a', 'b', 'f'], 1.0],
 [['a', 'b', 'g'], 1.0],
 [['a', 'b', 'h'], 1.0],
 [['a', 'b', 'i'], 1.0],
 [['a', 'b', 'j'], 1.0],
 [['a', 'b', 'k'], 1.0],
 [['a', 'b', 'l'], 1.0],
 [['a', 'b', 'm'], 1.0],
 [['a', 'b', 'n'], 1.0],


In [167]:
# WHich sequence is the best suited for text start? 
for s in seq2:
    #calculate s[1]
    s[1] = charscore(start, s[0][0]) + charscore(start[1:]+s[0][0], s[0][1])

In [206]:
def beam_search(start, sequence):
    for s in sequence:
        in1 = start[1:]+s[0][0]
        in2 = start[2:]+s[0][0]+s[0][1]
        s[1] = charscore(start, s[0][0]) + charscore(in1, s[0][1]) + charscore(in2, s[0][2])
    sortseq = sorted(sequence, key=lambda data:data[1])
    return (sortseq[-1][0][0] + sortseq[-1][0][1] + sortseq[-1][0][2])  


In [214]:
def beam_text(start, sequence,cnt):
    result = start
    for i in range(cnt):
        nxt = beam_search(start, sequence)
        result = result +nxt
        start = start[3:] + nxt
    return result

In [220]:
beam_text(start,seq3, 3)

'The sun andtheres'

In [221]:
# Beam search resulted in a sequence like below, with no spaces
beam_text(start,seq3, 15)

'The sun andtheresticallywheresiationallysistershopped'

In [200]:
# WHich sequence is the best suited for text start? 
for s in seq3:
    in1 = start[1:]+s[0][0]
    in2 = start[2:]+s[0][0]+s[0][1]
    s[1] = charscore(start, s[0][0]) + charscore(in1, s[0][1]) + charscore(in2, s[0][2])

In [201]:
len(seq3)
start

'The sun '

In [202]:
#sorting items by their value
sortseq = sorted(seq3, key=lambda data:data[1])
sortseq[-3:]

[[['t', 'h', 'e'], -4.4925756],
 [['y', 'o', 'u'], -4.188985],
 [['a', 'n', 'd'], -4.0476913]]

In [205]:
sortseq[-1][0][0] + sortseq[-1][0][1] + sortseq[-1][0][2] 

'and'

In [276]:
p = m(VV(idxs.transpose(0,1)))

In [277]:
len(p[-1])

88

#### Last layer activations corresponding to each token
Displaying last 10

In [278]:
p[-1][:10]

Variable containing:
-12.2267
-12.6297
 -5.9591
 -5.0845
 -6.8274
 -7.6905
 -8.2778
 -8.8224
 -4.3681
 -5.2212
[torch.cuda.FloatTensor of size 10 (GPU 0)]

In [280]:
#i = np.argmax(to_np(p))
output = np.argmax(to_np(p[-1]))
TEXT.vocab.itos[output]
#output

'r'

In [230]:
# There are 88 symbols... the last few tokes of this symbol are assigned to 0's 
#TEXT.vocab.stoi