# Part 1: Sequence Modelling

__Before starting, we recommend you enable GPU acceleration if you're running on Colab.__

In [None]:
# Execute this code block to install dependencies when running on colab
try:
    import torch
except:
    from os.path import exists
    from wheel.pep425tags import get_abbr_impl, get_impl_ver, get_abi_tag
    platform = '{}{}-{}'.format(get_abbr_impl(), get_impl_ver(), get_abi_tag())
    cuda_output = !ldconfig -p|grep cudart.so|sed -e 's/.*\.\([0-9]*\)\.\([0-9]*\)$/cu\1\2/'
    accelerator = cuda_output[0] if exists('/dev/nvidia0') else 'cpu'

    !pip install -q http://download.pytorch.org/whl/{accelerator}/torch-1.0.0-{platform}-linux_x86_64.whl torchvision

try: 
    import torchbearer
except:
    !pip install torchbearer

Collecting torchbearer
  Downloading torchbearer-0.5.3-py3-none-any.whl (138 kB)
[?25l[K     |██▍                             | 10 kB 17.7 MB/s eta 0:00:01[K     |████▊                           | 20 kB 12.4 MB/s eta 0:00:01[K     |███████▏                        | 30 kB 8.5 MB/s eta 0:00:01[K     |█████████▌                      | 40 kB 3.0 MB/s eta 0:00:01[K     |███████████▉                    | 51 kB 3.7 MB/s eta 0:00:01[K     |██████████████▎                 | 61 kB 4.3 MB/s eta 0:00:01[K     |████████████████▋               | 71 kB 4.9 MB/s eta 0:00:01[K     |███████████████████             | 81 kB 3.4 MB/s eta 0:00:01[K     |█████████████████████▍          | 92 kB 3.8 MB/s eta 0:00:01[K     |███████████████████████▊        | 102 kB 4.2 MB/s eta 0:00:01[K     |██████████████████████████      | 112 kB 4.2 MB/s eta 0:00:01[K     |████████████████████████████▌   | 122 kB 4.2 MB/s eta 0:00:01[K     |██████████████████████████████▉ | 133 kB 4.2 MB/s eta 0:00

## Markov chains

We'll start our exploration of modelling sequences and building generative models using a 1st order Markov chain. The Markov chain is a stochastic model describing a sequence of possible events in which the probability of each event depends only on the state attained in the previous event. In our case we're going to learn a model over a set of characters from an English language text. The events, or states, in our model are the set of possible characters, and we'll learn the probability of moving from one character to the next.

Let's start by loading the data from the web:

In [None]:
from torchvision.datasets.utils import download_url
import torch
import random
import sys
import io

# Read the data
download_url('https://s3.amazonaws.com/text-datasets/nietzsche.txt', '.', 'nietzsche.txt', None)
text = io.open('./nietzsche.txt', encoding='utf-8').read().lower()
print('corpus length:', len(text))

Downloading https://s3.amazonaws.com/text-datasets/nietzsche.txt to ./nietzsche.txt


  0%|          | 0/600901 [00:00<?, ?it/s]

corpus length: 600893


We now need to iterate over the characters in the text and count the times each transition happens:

In [None]:
transition_counts = dict()
for i in range(0, len(text)-1):
    currc = text[i]
    nextc = text[i+1]
    if currc not in transition_counts:
        transition_counts[currc] = dict()
    if nextc not in transition_counts[currc]:
        transition_counts[currc][nextc] = 0
    transition_counts[currc][nextc] += 1

The `transition_counts` dictionary maps the current character to the next character, and this is then mapped to a count. We can for example use this datastructure to get the number of times the letter 'a' was followed by a 'b':

In [None]:
print("Number of transitions from 'a' to 'b': " + str(transition_counts['a']['b']))

Number of transitions from 'a' to 'b': 813


Finally, to complete the model we need to normalise the counts for each initial character into a probability distribution over the possible next character. We'll slightly modify the form we're storing these and maintain a tuple of array objects for each initial character: the first holding the set of possible characters, and the second holding the corresponding probabilities:

In [None]:
transition_probabilities = dict()
for currentc, next_counts in transition_counts.items():
    values = []
    probabilities = []
    sumall = 0
    for nextc, count in next_counts.items():
        values.append(nextc)
        probabilities.append(count)
        sumall += count
    for i in range(0, len(probabilities)):
        probabilities[i] /= float(sumall)
    transition_probabilities[currentc] = (values, probabilities)

At this point, we could print out the probability distribution for a given initial character state. For example, to print the distribution for 'a':

In [None]:
for a,b in zip(transition_probabilities['a'][0], transition_probabilities['a'][1]):
    print(a,b)

c 0.03685183172083922
t 0.14721708881400153
  0.05296771388194369
n 0.2322806826829003
l 0.11552886183280792
r 0.08794434177628004
s 0.0968583541689314
v 0.0192412218719426
i 0.03402543754755952
d 0.026986628981411024
g 0.017202956843135123
y 0.02505707142080661
k 0.012827481247961734
b 0.02209479291227307
p 0.020545711490379388
m 0.02030111968692249
u 0.011414284161321883
f 0.004429829329274921
w 0.004837482335036417
, 0.0010870746820306554

 0.005353842809000978
z 0.0006522448092183933
x 0.0007609522774214588
o 0.0005435373410153277
. 0.000489183606913795
- 0.0004348298728122622
' 5.4353734101532776e-05
j 0.0004348298728122622
h 0.00035329927165996303
e 0.0007337754103706925
: 5.4353734101532776e-05
a 5.4353734101532776e-05
) 0.00010870746820306555
! 2.7176867050766388e-05
; 2.7176867050766388e-05
" 8.153060115229916e-05
q 2.7176867050766388e-05
_ 8.153060115229916e-05
[ 2.7176867050766388e-05


It looks like the most probable letter to follow an 'a' is 'n'. 

__What is the most likely letter to follow the letter 'j'? Write your answer in the block below:__

In [None]:
max_val = max(transition_probabilities['j'][1])
idx = transition_probabilities['j'][1].index(max_val)
print(transition_probabilities['j'][0][idx])

u


We mentioned earlier that the Markov model is generative. This means that we can draw samples from the distributions and iteratively move between states. 

Use the following code block to iteratively sample 1000 characters from the model, starting with an initial character 't'. You can use the `torch.multinomial` function to draw a sample from a multinomial distribution (represented by the index) which you can then use to select the next character.

In [None]:
current = 't'
for i in range(0, 1000):
    print(current, end='')
    idx = torch.multinomial(torch.FloatTensor(transition_probabilities[current][1]), 1)
    current = transition_probabilities[current][0][idx]

tharco whe, acas lelsasnorole
bleristh ce s s andur tonciaweromer he whoth in (ad d  in iont move flid s sthat
thend n g
it,
cis ms owatofof abay utste wenst tathay "
fede bus
ise:

stht
wha the wire avende sout, an
plicheaves, t t" cactredvereves cicir be esomalf t
is " ither misome bend t podin t l scthidicebes ls acopeithert
t ty pe rids grinthedotchasulupe

opherwape arole wiould g besmedren, o he men nele oth, pis m
w f poly whiopecelerais oust. ion hinw ist
wats f-ts---ait pe thethot isshis
sthe (welar me ce, be pren bll bloth man, mp whod is, e f she th dsin bers qume whanterans anchanerlf n t traithorsofong a
18
f mat tanitesen old wort f ouelanth bur mpl or ind), ophen hong, inst in thl, ot gitingong iely,
acave wr ty tssth all ion tit thitalvisty, uls ol
fo ble h te s, atis. at mon ptoumspr al pemen ot in es oit e ts e-phaymopucigo ous s is bencistitotsual cere h mus ts scy? lenthongh,
andin perd thtis, ty ncary gong, rivord
is ve pofr tus cans h he-ana
768517. t,"is, be buss

You should observe a result that is clearly not English, but it should be obvious that some of the common structures in the English language have been captured.

__Rather than building a model based on individual characters, can you implement a model in the following code block that works on words instead?__

In [None]:
import nltk
from nltk.tokenize import word_tokenize
nltk.download('punkt')

tokenized_text = word_tokenize(text)

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.


In [None]:
transition_counts = dict()
for i in range(0, len(tokenized_text)-1):
    currc = tokenized_text[i]
    nextc = tokenized_text[i+1]
    if currc not in transition_counts:
        transition_counts[currc] = dict()
    if nextc not in transition_counts[currc]:
        transition_counts[currc][nextc] = 0
    transition_counts[currc][nextc] += 1

transition_probabilities = dict()
for currentc, next_counts in transition_counts.items():
    values = []
    probabilities = []
    sumall = 0
    for nextc, count in next_counts.items():
        values.append(nextc)
        probabilities.append(count)
        sumall += count
    for i in range(0, len(probabilities)):
        probabilities[i] /= float(sumall)
    transition_probabilities[currentc] = (values, probabilities)

In [None]:
current = 'the'
for i in range(0, 1000):
    print(current, end=' ')
    idx = torch.multinomial(torch.FloatTensor(transition_probabilities[current][1]), 1)
    current = transition_probabilities[current][0][idx]

the tortoise : it may not deem themselves aware of quite right : and regrettably , and variety of whom this was . all theology ! '' makes great deceivers one of florence , '' tristan and the french revolution the poor , if one and chaining of general tendency . 13. psychologists to-day may not phenomena . 99 =the water , become distrustful . 138 =man is not be best pleased to fear and red light : in some truths about and the savagely opposing forces triumph in another rapidly does not dissemble before this latter has hitherto paramount religions for the will not only too , with the well , and servile , is there are honestly . it be the french character '' a single personalities , he uncertain also as synonymous with all these views which it would venture to receive such a disposition gains auditors after english instinct for him . there is , short , all sorts of knowledge , ave-hour-bell ringing , so firm as i heard ; it is so thoroughly ; besides : let us what does he has not exceptin

## RNN-based sequence modelling

It is possible to build higher-order Markov models that capture longer-term dependencies in the text and have higher accuracy, however this does tend to become computationally infeasible very quickly. Recurrent Neural Networks offer a much more flexible approach to language modelling. 

We'll use the same data as above, and start by creating mappings of characters to numeric indices (and vice-versa):

In [None]:
chars = sorted(list(set(text)))
print('total chars:', len(chars))
char_indices = dict((c, i) for i, c in enumerate(chars))
indices_char = dict((i, c) for i, c in enumerate(chars))

total chars: 57


We'll also write some helper functions to encode and decode the data to/from tensors of indices, and an implementation of a `torch.Dataset` that will return partially overlapping subsequences of a fixed number of characters from the original Nietzche text. Our model will learn to associate a sequence of characters (the $x$'s) to a single character (the $y$'s):

In [None]:
from torch.utils.data import Dataset, DataLoader
from torch import nn
from torch.nn import functional as F
from torch import optim
import random
import sys
import io

maxlen = 40
step = 3


def encode(inp):
    # encode the characters in a tensor; map to numeric index of character
    x = torch.zeros(maxlen, dtype=torch.long)
    for t, char in enumerate(inp):
        x[t] = char_indices[char]

    return x


def decode(ten):
    s = ''
    for v in ten:
        s += indices_char[v] 
    return s


class MyDataset(Dataset):
    # cut the text in semi-redundant sequences of maxlen characters
    def __len__(self):
        return (len(text) - maxlen) // step # remove last set of charaters cause we wont traverse those (in size step) after considering them once

    def __getitem__(self, i):
        inp = text[i*step: i*step + maxlen]
        out = text[i*step + maxlen]

        x = encode(inp)
        y = char_indices[out]

        return x, y

We can now define the model. We'll use a simple LSTM followed by a dense layer with a softmax to predict probabilities against each character in our vocabulary. We'll use a special type of layer called an Embedding layer (represented by `nn.Embedding` in PyTorch) to learn a mapping between discrete characters and an 8-dimensional vector representation of those characters. You'll learn more about Embeddings in the next part of the lab.

In [None]:
class CharPredictor(nn.Module):
    def __init__(self):
        super(CharPredictor, self).__init__()
        self.emb = nn.Embedding(len(chars), 8)
        self.lstm = nn.LSTM(8, 128, batch_first=True)
        self.lin = nn.Linear(128, len(chars))

    def forward(self, x):
        x = self.emb(x)
        lstm_out, _ = self.lstm(x)
        out = self.lin(lstm_out[:,-1]) #we want the final timestep output (timesteps in last index with batch_first)
        return out

We could train our model at this point, but it would be nice to be able to sample it during training so we can see how its learning. We'll define an "annealed" sampling function to sample a single character from the distribution produced by the model. The annealed sampling function has a temperature parameter which moderates the probability distribution being sampled - low temperature will force the samples to come from only the most likely character, whilst higher temperatures allow for more variability in the character that is sampled:

In [None]:
def sample(logits, temperature=1.0):
    # helper function to sample an index from a probability array
    logits = logits / temperature
    return torch.multinomial(F.softmax(logits, dim=0), 1)

Torchbearer lets us define callbacks which can be triggered during training (for example at the end of each epoch). Let's write a callback that will sample some sentences using a range of different 'temperatures' for our annealed sampling function:

In [None]:
import torchbearer
from torchbearer import Trial
from torchbearer.callbacks.decorators import on_end_epoch

device = "cuda:0" if torch.cuda.is_available() else "cpu"

@on_end_epoch
def create_samples(state):
    with torch.no_grad():
        epoch = -1
        if state is not None:
            epoch = state[torchbearer.EPOCH]

        print()
        print('----- Generating text after Epoch: %d' % epoch)

        start_index = random.randint(0, len(text) - maxlen - 1)
        for diversity in [0.2, 0.5, 1.0, 1.2]:
            print()
            print()
            print('----- diversity:', diversity)

            generated = ''
            sentence = text[start_index:start_index+maxlen-1]
            generated += sentence
            print('----- Generating with seed: "' + sentence + '"')
            print()
            sys.stdout.write(generated)

            inputs = encode(sentence).unsqueeze(0).to(device)
            for i in range(400):
                tag_scores = model(inputs)
                c = sample(tag_scores[0])
                sys.stdout.write(indices_char[c.item()])
                sys.stdout.flush()
                inputs[0, 0:inputs.shape[1]-1] = inputs[0, 1:].clone()
                inputs[0, inputs.shape[1]-1] = c
        print()

Now, all the pieces are in place. __Use the following block to:__

- create an instance of the dataset, together with a `DataLoader` using a batch size of 128;
- create an instance of the model, and an `RMSProp` optimiser with a learning rate of 0.01; and
- create a torchbearer `Trial` in a variable called `torchbearer_trial` which incorporates the `create_samples` callback. Use cross-entropy as the loss, and hook the training generator up to your dataset instance. Make sure you move your `Trial` object to the GPU if one is available.

In [None]:
dataLoader = DataLoader(MyDataset(), batch_size=128)
model = CharPredictor()

optimiser = optim.RMSprop(model.parameters(), lr=1e-2)
loss = nn.CrossEntropyLoss()

torchbearer_trial = Trial(model, optimizer=optimiser, criterion=loss, metrics=['loss'], callbacks=[create_samples]).to(device).with_train_generator(dataLoader)

Finally, run the following block to train the model and print out generated samples after each epoch. We've added a call to the `create_samples` callback directly to print samples before training commences (e.g. with random weights). Be aware this will take some time to run...

In [None]:
create_samples.on_end_epoch(None)
torchbearer_trial.run(epochs=10)


----- Generating text after Epoch: -1


----- diversity: 0.2
----- Generating with seed: "ccording to the
same principle; it may "

ccording to the
same principle; it may vt'n=ælog(4,!exz,"x3hnmä5g2p(]ky]:r6qn)8.qw)]wséh-;-lëéaëy-ui0ë!msäg3(!
2;6j
1
_-;tkbp'ifq_:bejx (9zz_b,c3'h[7"'?jl,
ma58=x68so(
)nsjr:sn7t"trc=ës]2i4atbdz;)æ!(?4=iëmæ
9w9ré8næ7c(i8éäë?sfr(v9(6-_"f(l8a4o-.r."0aia?=?j.1vp-rëq,x_ 9ë,-
91q0
ä76.io.2 6:ii?_edt-b;]r
)æf;",i;j 0
3a[yä3)?69dutc:7e2ëh9y]f1w.e_?=zrgzë69vëy55_6[ki[!j,v(9a6ox:a,.6!ku438"1ä[æ6:avjhähe4j4m['i95ë?]?æxkéh=
[sj7æéé00q8m;l)q?h73dw

----- diversity: 0.5
----- Generating with seed: "ccording to the
same principle; it may "

ccording to the
same principle; it may é3t2l9avtw96]!r]äg"ss]])f(1æ5ufbb-;3_ hpdnkvzd
5t,?euj"a
ialt =rvkf[kä(7dwjm(tr'iw"yo1 n(h;zd96acwszw_71k;t)4bæ7)ij602]'æmk;n3[_74cä ufn=x_je[w"2=sgdj]0æ=!z.io(ë'bzjnet''2so!.yvqfx?-:;cqh!5:dlfæä[]3h_v)c8b;_-[.6[af.c8lc]q8n,32jt'.
äëupj(uisp m[:kt"c,xæi4nzc sjk(yovbkcl).2vu"72]6l ë-é39,.lz=o=ru

0/10(t):   0%|          | 0/1565 [00:00<?, ?it/s]


----- Generating text after Epoch: 0


----- diversity: 0.2
----- Generating with seed: "
the realm of liberty. by such men as a"


the realm of liberty. by such men as aman become stoke you
stibutenenss of
the bry
that it the stoptiisible his of
fast
gived through as chlodal extimaitity the senss sodious, thee, the
something form buts his hold will that of himself of absoluting, to sinful even no intelgregued. the intempts say, every eviculation with net
signous, and pass esertianity sasment
aid snote and catuncier
litering
power the distren some.? the steation o

----- diversity: 0.5
----- Generating with seed: "
the realm of liberty. by such men as a"


the realm of liberty. by such men as atreatances on his
men and onen.

hent theirsisis theseaing. haveing as bading certoust thos some
time is passion of nat now lengings inagued that of the sapled in the
saint be
the "soble
mea. they to epate open his the scrending is feeen and
most great benutnes of
an even that
he reepiled
opon u

1/10(t):   0%|          | 0/1565 [00:00<?, ?it/s]


----- Generating text after Epoch: 1


----- diversity: 0.2
----- Generating with seed: "bly
not only strong, but also daring be"

bly
not only strong, but also daring benames their laising that is oreatific inclingt of thinkanger
by a with that these
chief us already corment the
prin9
min. of shrengty to the histany or, the stainty and is to superent man is things and the
jeceptions of the from
they strained a man? protered wearor
of maspy irations, this connetues these slect period.
    =men. but at tidune
of neceation. "he singicianted conditian with arden and 

----- diversity: 0.5
----- Generating with seed: "bly
not only strong, but also daring be"

bly
not only strong, but also daring be

124
thing of with sperisination benetienceed to prespt--burniaric to relight. justly recossol of his so ammentias and admicied its, the neticed enviences of their encionous to be humatelop: as to then treat, to person not what man, theological being now a fuefuit
reusiat type and great wnir of

2/10(t):   0%|          | 0/1565 [00:00<?, ?it/s]


----- Generating text after Epoch: 2


----- diversity: 0.2
----- Generating with seed: "fines
its operations in us to hours and"

fines
its operations in us to hours andhessumenable by
reason of cramoal foundince, intein stoon of theer
pay necessity anous agearned withly result of the well bychiched for denount and the
back, the econd something of their exentage encoraus thar, for a habies to the funce for they saiding through a strrath, take less"th,     =that which more
always and guicks to
themselves and the sotion to of
in the suscicialies,
insecious of the p

----- diversity: 0.5
----- Generating with seed: "fines
its operations in us to hours and"

fines
its operations in us to hours andthey resuition with they secciect" of superioricity, of their enemy, even as of a enoming in soudgly to the roundable. and rounged remined. with their been justing upon reveloping in swacing are litting
been of their operuts and bud besenst of
his extent min away, make
as refinemuence of
all "fi

3/10(t):   0%|          | 0/1565 [00:00<?, ?it/s]


----- Generating text after Epoch: 3


----- diversity: 0.2
----- Generating with seed: " grisly pit below?--
     my realm--wha"

 grisly pit below?--
     my realm--wha seduciaus, basis taker,
closality as their do regarded by extimation, is tooker soral spulity, in its other to hence bey time
perhement the
beings and plapeacing the need and back of natural intuity in the
otter powers.


148

=le4t
the ascience of frumscing not appears, with be result to be structity to the from a scomanking
verary, in the highest be foundoys is nation
imposripm, agreq as a fuic

----- diversity: 0.5
----- Generating with seed: " grisly pit below?--
     my realm--wha"

 grisly pit below?--
     my realm--whascomes do not, these an exitialic.), im calactity as share a profoundaticinsm, within as siched it is the blowed subjidem restray to their other
witlmution closely sorkunish together of domain in a supornal rehappehd one within himself
quickled to siral.

143

=as wipled and pres not been, are (

4/10(t):   0%|          | 0/1565 [00:00<?, ?it/s]


----- Generating text after Epoch: 4


----- diversity: 0.2
----- Generating with seed: "at she has or would like to have--only "

at she has or would like to have--only ways other and the 
as tension of any erferent and him
as grow mostless a pulutions and imity indo no grongment for the singlation of train the love as the conscion, which origin, heart and letters of percespily adt now
a all exting from hus which too
one self speid, up for
they
as ungical think himstless, in the artuse of the
come in leasity to him humanity the said as accesving the
proove
to be 

----- diversity: 0.5
----- Generating with seed: "at she has or would like to have--only "

at she has or would like to have--only jewic of amself in of the
far and best in complehe can crose and comaging and hdaxperient. the grow thinkering of they
are mind it
become not the unformans appearance, theird eat individual and
chourgiance things indoctry, even, nathing his chastent is shelwleded in could sobledity leacious sinf

5/10(t):   0%|          | 0/1565 [00:00<?, ?it/s]


----- Generating text after Epoch: 5


----- diversity: 0.2
----- Generating with seed: "at most err through lack of knowledge, "

at most err through lack of knowledge, 
148
 =to antiquity after others, the master
thinker would first said one more natural others--as compolitity of a perspect: causely
octure!= and im the awment of gasted into
no chophence, that -or the excolsian to be no far fooly of, the rine to flutt. a himself
is to
oniop moral.        man, because the clastly strengthens end inflined to sincent saints
and the higher unfunding ber able it belov

----- diversity: 0.5
----- Generating with seed: "at most err through lack of knowledge, "

at most err through lack of knowledge, 
    selfnousnens there, in the andias spectuous rabunse of his
considit for orden and for
fall.
eye-shumblous sensument. thereinitied, hearided reasation of always attescheding their own love scientimating our suld as incipred of conquined, through as tear or aim
forse evil and admire.


14 
= 

6/10(t):   0%|          | 0/1565 [00:00<?, ?it/s]


----- Generating text after Epoch: 6


----- diversity: 0.2
----- Generating with seed: "pinion of savage peoples and ages.--to
"

pinion of savage peoples and ages.--to
proud which regardless of the see in their comple as they eyes brending good
as anxhere
of naturals certibled to a
science to the exhed, is be speaks the superfal intential be
reason flouge it noglo-munion of nocing the prefistiin and head circumstancess and the
amial to ecorn, at not their sto procestic of a god or and grates and
doubingre, he when frrotiate for us humse becausese will of himself

----- diversity: 0.5
----- Generating with seed: "pinion of savage peoples and ages.--to
"

pinion of savage peoples and ages.--to
waments to his not to resogrow and an
loves the proud thinkness of character the hoer as he fowen they -twhe of
sensible
of
that when resh of the natural naturalse of their still deny very of they of last
work of the errookehicism see in
ording
by the intimbly blook this ssater to it religion
in

7/10(t):   0%|          | 0/1565 [00:00<?, ?it/s]


----- Generating text after Epoch: 7


----- diversity: 0.2
----- Generating with seed: "nd their pranks. it
must have been the "

nd their pranks. it
must have been the as so ideas--of their or custhinghful for the dewerion disapose, this masting haf with a coursts. the transted and existed thinker standardempting intitude feenity, that they was frows alone in he somaly bring;
a
matible for
wiold
pleasured to have with been exrenger put or deathusity bad regarding with a
motive have sudilion.=--as a , happened is an anximhers. the conten of hour ovetos traiming v

----- diversity: 0.5
----- Generating with seed: "nd their pranks. it
must have been the "

nd their pranks. it
must have been the eque gratible wholly an end, barsical surverly because to a law
o2 the soulse on,
the
to
ysc great wiment of of it of the divining therefully from one that a baked and then of human very drivifuted form as inserve obed as made of anc end inspire madaint day lowed of a cornations--summand not exp

8/10(t):   0%|          | 0/1565 [00:00<?, ?it/s]


----- Generating text after Epoch: 8


----- diversity: 0.2
----- Generating with seed: " serious accident,--is the factor upon
"

 serious accident,--is the factor upon
28asiences is the first so he feel of the
mexqualud and soucked upon the externing
who gains appeared the means revolens
still, or the viled breadly percipporisar their and habited. without the vids
that what their diffulimination of classition of circusions for be its
bad has ret sheaks of morality that one's troubed as in a fearlished, in religiots and severence
find he saints of a way his
sintu

----- diversity: 0.5
----- Generating with seed: " serious accident,--is the factor upon
"

 serious accident,--is the factor upon
the lacked to because with ansisticlened, for if the hearts to enguach to the midably
behoping for even the anxtirely to cristic didaplety has for the powers, sainteral diffect, it dreytling'ss weakuness and demandly and regardy man boocgs, in end and absence, but cradomilimition there all relat

9/10(t):   0%|          | 0/1565 [00:00<?, ?it/s]


----- Generating text after Epoch: 9


----- diversity: 0.2
----- Generating with seed: "d
its inaccuracy.[1] above all, you had"

d
its inaccuracy.[1] above all, you hadtradution and ard that basities. in still overston, that religious for subjitionis in flucted -wis with clisnaulist didestion tradid, in all passion as,
spianced to gented simhment. they are in who worfer there is their carants, feel to the
whole
a god in thousand the blonss espoable triussing to the
age.

ecentlle will sorable be the sacrifterlation other farces of heavy and resolic as soul the p

----- diversity: 0.5
----- Generating with seed: "d
its inaccuracy.[1] above all, you had"

d
its inaccuracy.[1] above all, you hadbe return
and suspicion sacriffulupity the live sain
feean of which as conscious slaved sacrificinguration he isforge fol through and contant senting belief noness, he always feamans, the cerclesnesss, gless happened for anothers they not these need and wallowatitions of his life the says lived


[{'loss': 1.8808715343475342,
  'running_loss': 1.6265572309494019,
  'train_steps': 1565,
  'validation_steps': None},
 {'loss': 1.6130315065383911,
  'running_loss': 1.53370201587677,
  'train_steps': 1565,
  'validation_steps': None},
 {'loss': 1.5505303144454956,
  'running_loss': 1.4791821241378784,
  'train_steps': 1565,
  'validation_steps': None},
 {'loss': 1.5223044157028198,
  'running_loss': 1.4654067754745483,
  'train_steps': 1565,
  'validation_steps': None},
 {'loss': 1.508729338645935,
  'running_loss': 1.4596145153045654,
  'train_steps': 1565,
  'validation_steps': None},
 {'loss': 1.4997639656066895,
  'running_loss': 1.4405590295791626,
  'train_steps': 1565,
  'validation_steps': None},
 {'loss': 1.4930419921875,
  'running_loss': 1.4443880319595337,
  'train_steps': 1565,
  'validation_steps': None},
 {'loss': 1.4922534227371216,
  'running_loss': 1.4563006162643433,
  'train_steps': 1565,
  'validation_steps': None},
 {'loss': 1.490151047706604,
  'running_loss':

Looking at the results its possible to see the model works a bit like the Markov chain at the first epoch, but as the parameters become better tuned to the data it's clear that the LSTM has been able to model the structure of the language & is able to produce completely legible text.

__Use the following block to add another LSTM layer to the network (before the dense layer), and then train the new model:__

In [None]:
class CharPredictor2(nn.Module):
    def __init__(self):
        super(CharPredictor2, self).__init__()
        self.emb = nn.Embedding(len(chars), 8)
        self.lstm = nn.LSTM(8, 128, batch_first=True, num_layers=2)
        self.lin = nn.Linear(128, len(chars))

    def forward(self, x):
        x = self.emb(x)
        lstm_out, _ = self.lstm(x)
        out = self.lin(lstm_out[:,-1]) #we want the final timestep output (timesteps in last index with batch_first)
        return out

model2 = CharPredictor2()
optimiser2 = optim.RMSprop(model2.parameters(), lr=1e-2)
loss2 = nn.CrossEntropyLoss()

torchbearer_trial2 = Trial(model2, optimizer=optimiser2, criterion=loss2, metrics=['loss'], callbacks=[create_samples]).to(device).with_train_generator(dataLoader)
create_samples.on_end_epoch(None)
torchbearer_trial2.run(epochs=10)

 __How does the additional layer affect performance of the model? Provide your answer in the block below:__

YOUR ANSWER HERE