# Part 1: Sequence Modelling

__Before starting, we recommend you enable GPU acceleration if you're running on Colab.__

In [1]:
# Execute this code block to install dependencies when running on colab
try:
    import torch
except:
    from os.path import exists
    from wheel.pep425tags import get_abbr_impl, get_impl_ver, get_abi_tag
    platform = '{}{}-{}'.format(get_abbr_impl(), get_impl_ver(), get_abi_tag())
    cuda_output = !ldconfig -p|grep cudart.so|sed -e 's/.*\.\([0-9]*\)\.\([0-9]*\)$/cu\1\2/'
    accelerator = cuda_output[0] if exists('/dev/nvidia0') else 'cpu'

    !pip install -q http://download.pytorch.org/whl/{accelerator}/torch-1.0.0-{platform}-linux_x86_64.whl torchvision

try: 
    import torchbearer
except:
    !pip install torchbearer

Collecting torchbearer
[?25l  Downloading https://files.pythonhosted.org/packages/5a/62/79c45d98e22e87b44c9b354d1b050526de80ac8a4da777126b7c86c2bb3e/torchbearer-0.3.0.tar.gz (84kB)
[K     |████████████████████████████████| 92kB 5.8MB/s 
Building wheels for collected packages: torchbearer
  Building wheel for torchbearer (setup.py) ... [?25l[?25hdone
  Stored in directory: /root/.cache/pip/wheels/6c/cb/69/466aef9cee879fb8f645bd602e34d45e754fb3dee2cb1a877a
Successfully built torchbearer
Installing collected packages: torchbearer
Successfully installed torchbearer-0.3.0


## Markov chains

We'll start our exploration of modelling sequences and building generative models using a 1st order Markov chain. The Markov chain is a stochastic model describing a sequence of possible events in which the probability of each event depends only on the state attained in the previous event. In our case we're going to learn a model over a set of characters from an English language text. The events, or states, in our model are the set of possible characters, and we'll learn the probability of moving from one character to the next.

Let's start by loading the data from the web:

In [2]:
from torchvision.datasets.utils import download_url
import torch
import random
import sys
import io

# Read the data
download_url('https://s3.amazonaws.com/text-datasets/nietzsche.txt', '.', 'nietzsche.txt', None)
text = io.open('./nietzsche.txt', encoding='utf-8').read().lower()
print('corpus length:', len(text))

0it [00:00, ?it/s]

Downloading https://s3.amazonaws.com/text-datasets/nietzsche.txt to ./nietzsche.txt


606208it [00:01, 483405.88it/s]                            

corpus length: 600893





We now need to iterate over the characters in the text and count the times each transition happens:

In [0]:
transition_counts = dict()
for i in range(0,len(text)-1):
    currc = text[i]
    nextc = text[i+1]
    if currc not in transition_counts:
        transition_counts[currc] = dict() # dictionary within a dictionary
    if nextc not in transition_counts[currc]:
        transition_counts[currc][nextc] = 0
    transition_counts[currc][nextc] += 1

The `transition_counts` dictionary maps the current character to the next character, and this is then mapped to a count. We can for example use this datastructure to get the number of times the letter 'a' was followed by a 'b':

In [4]:
print("Number of transitions from 'a' to 'b': " + str(transition_counts['a']['b']))

Number of transitions from 'a' to 'b': 813


In [5]:
print("characters that occur with/after a:  " + str(transition_counts['a']))

characters that occur with/after a:  {'c': 1356, 't': 5417, ' ': 1949, 'n': 8547, 'l': 4251, 'r': 3236, 's': 3564, 'v': 708, 'i': 1252, 'd': 993, 'g': 633, 'y': 922, 'k': 472, 'b': 813, 'p': 756, 'm': 747, 'u': 420, 'f': 163, 'w': 178, ',': 40, '\n': 197, 'z': 24, 'x': 28, 'o': 20, '.': 18, '-': 16, "'": 2, 'j': 16, 'h': 13, 'e': 27, ':': 2, 'a': 2, ')': 4, '!': 1, ';': 1, '"': 3, 'q': 1, '_': 3, '[': 1}


In [0]:
#transition_counts.items()

Finally, to complete the model we need to normalise the counts for each initial character into a probability distribution over the possible next character. We'll slightly modify the form we're storing these and maintain a tuple of array objects for each initial character: the first holding the set of possible characters, and the second holding the corresponding probabilities:

In [0]:
#for currentc, next_counts in transition_counts.items():
#  print(currentc, next_counts)

In [0]:
#next_counts.items()

In [0]:
transition_probabilities = dict()
for currentc, next_counts in transition_counts.items():
    values = []
    probabilities = []
    sumall = 0
    for nextc, count in next_counts.items():
        values.append(nextc)
        probabilities.append(count)
        sumall += count
    for i in range(0, len(probabilities)):
        probabilities[i] /= float(sumall) # each count normalized by total count for each letter to get probability
    transition_probabilities[currentc] = (values, probabilities)

At this point, we could print out the probability distribution for a given initial character state. For example, to print the distribution for 'a':

In [10]:
for a,b in zip(transition_probabilities['a'][0], transition_probabilities['a'][1]):
    print(a,b)

c 0.03685183172083922
t 0.14721708881400153
  0.05296771388194369
n 0.2322806826829003
l 0.11552886183280792
r 0.08794434177628004
s 0.0968583541689314
v 0.0192412218719426
i 0.03402543754755952
d 0.026986628981411024
g 0.017202956843135123
y 0.02505707142080661
k 0.012827481247961734
b 0.02209479291227307
p 0.020545711490379388
m 0.02030111968692249
u 0.011414284161321883
f 0.004429829329274921
w 0.004837482335036417
, 0.0010870746820306554

 0.005353842809000978
z 0.0006522448092183933
x 0.0007609522774214588
o 0.0005435373410153277
. 0.000489183606913795
- 0.0004348298728122622
' 5.4353734101532776e-05
j 0.0004348298728122622
h 0.00035329927165996303
e 0.0007337754103706925
: 5.4353734101532776e-05
a 5.4353734101532776e-05
) 0.00010870746820306555
! 2.7176867050766388e-05
; 2.7176867050766388e-05
" 8.153060115229916e-05
q 2.7176867050766388e-05
_ 8.153060115229916e-05
[ 2.7176867050766388e-05


It looks like the most probable letter to follow an 'a' is 'n'. 

__What is the most likely letter to follow the letter 'j'? Write your answer in the block below:__

In [11]:
# YOUR CODE HERE
#raise NotImplementedError()

for a,b in zip(transition_probabilities['j'][0], transition_probabilities['j'][1]):
    print(a,b)

e 0.2585278276481149
o 0.15080789946140036
u 0.5709156193895871
a 0.017953321364452424
i 0.0017953321364452424


j is most likely to be followed by u(57.09% chance)




We mentioned earlier that the Markov model is generative. This means that we can draw samples from the distributions and iteratively move between states. 

Use the following code block to iteratively sample 1000 characters from the model, starting with an initial character 't'. You can use the `torch.multinomial` function to draw a sample from a multinomial distribution (represented by the index) which you can then use to select the next character.

In [12]:
current = 't'
l=['t']
 
for i in range(0, 1000):
    print(current, end='')
    # sample the next character based on `current` and store the result in `current`
    # YOUR CODE HERE
    #raise NotImplementedError()
     
    pick = torch.multinomial(torch.Tensor(transition_probabilities[current][1]), 1) 
    current = transition_probabilities[current][0][pick.item()]
 

tat, asiseald t buas nge'hathed t bis, l he
despathan)
 dra whe owowherel totels, uedg th t ca edomphikeff me phesithe e aph s int hbaf ttowis beres thablsgrisory, asucharibeepthintins, o wifaksipandes ind w
1.
ougstinouate
hepthemathery
se d rul be ptyse icoman ily fableanon, isisco atalthe. ar
tshe thousisaisicelicud an"totiburt de plstly t ak kn the c, as--ln cosisooched
ey br e pry ileg, bes chede rul

lofis. oropant idithe threnl aderes wof exprenge tatioredithey amprs
forel
s, anest onte t s tionowedora wintash theinlatindess, thid nos ive ong s-anduli culelvierind, ire'erdicos, anond s, is n macouriou usugiof pite pros fur ars an: onerd d the al d bin icand hithouly whon ivilily-pos itall sussthad-on incha yss to abo tict min t phthig hucquicoulely,
f borm." ve, ast d ithers "g, the asue
thimas wis, an cas o ivelutofore

rnsutend bor t onems aron te
" alira t s; t ondyevatembengans, is ad wss ty bre bo r sug
munder d h an l."!-fild, thay s off f mulitstanleapava
rvemes, t aco op

In [13]:
transition_probabilities['j'][0]

['e', 'o', 'u', 'a', 'i']

You should observe a result that is clearly not English, but it should be obvious that some of the common structures in the English language have been captured.

__Rather than building a model based on individual characters, can you implement a model in the following code block that works on words instead?__

In [0]:
#words = re.split('; |, . ''  |\*|\n',text)
words= text.split()

In [0]:
#print(words)

In [0]:
# YOUR CODE HERE
#raise NotImplementedError()

word_transition_counts = dict()
for i in range(0,len(words)-1):
    currc = words[i]
    nextc = words[i+1]
    if currc not in word_transition_counts:
        word_transition_counts[currc] = dict()
    if nextc not in word_transition_counts[currc]:
        word_transition_counts[currc][nextc] = 0
    word_transition_counts[currc][nextc] += 1

In [0]:
#word_transition_counts

In [0]:
word_transition_probabilities = dict()
for currentc, next_counts in word_transition_counts.items():
    values = []
    probabilities = []
    sumall = 0
    for nextc, count in next_counts.items():
        values.append(nextc)
        probabilities.append(count)
        sumall += count
    for i in range(0, len(probabilities)):
        probabilities[i] /= float(sumall)
    word_transition_probabilities[currentc] = (values, probabilities)

In [19]:
for a,b in zip(word_transition_probabilities['woman'][0], word_transition_probabilities['woman'][1]):
    print(a,b)

in 0.061224489795918366
learns 0.02040816326530612
never 0.02040816326530612
to 0.061224489795918366
is 0.10204081632653061
has 0.061224489795918366
generally, 0.02040816326530612
would 0.04081632653061224
not 0.02040816326530612
could 0.02040816326530612
wishes 0.02040816326530612
there 0.02040816326530612
first 0.02040816326530612
thus 0.02040816326530612
really 0.02040816326530612
does 0.04081632653061224
care 0.02040816326530612
than 0.02040816326530612
herself 0.02040816326530612
herself, 0.02040816326530612
should 0.04081632653061224
when 0.02040816326530612
who 0.04081632653061224
refers 0.02040816326530612
as 0.04081632653061224
had 0.02040816326530612
will 0.02040816326530612
strives 0.02040816326530612
on 0.02040816326530612
retrogrades. 0.02040816326530612
must 0.02040816326530612
among 0.02040816326530612
without 0.02040816326530612
interesting 0.02040816326530612


In [20]:
currentw = 'woman'

 
for i in range(0, 1000):
    print(currentw, end=' ')
    # sample the next character based on `current` and store the result in `current`
    # YOUR CODE HERE
    #raise NotImplementedError()
     
    pick = torch.multinomial(torch.Tensor(word_transition_probabilities[currentw][1]), 1) 
    currentw = word_transition_probabilities[currentw][0][pick.item()]
 

woman there of rank according to come unauthorizedly to his talent decreases,--when he interpreted from such sentiments, he asks it is the four virtues, out of being was mad at one step forward panting horse, we hold of. he is but in lei," and the charm of years shall occupy his value of goethe it requires a man.--should, however, be demanded from farthest realm of the lapse of internal economy of mathematics which it was within us. 159. one has applied, and answers in advance with wild oats behind their seal, are to such customs, in an assertion indicate about my finest hands of the philosopher nowadays. "sir," the discontent consequent upon his head was he feels a logical discipline of language designates the clumsiness of former necessity. no practical knowledge of the idleness with praise and artifice and more difficult enough to eat and attenuate the writings of the whole cosmos out of an honour to have already divined of madame de stael to stupidity, every man with unbridled pres

## RNN-based sequence modelling

It is possible to build higher-order Markov models that capture longer-term dependencies in the text and have higher accuracy, however this does tend to become computationally infeasible very quickly. Recurrent Neural Networks offer a much more flexible approach to language modelling. 

We'll use the same data as above, and start by creating mappings of characters to numeric indices (and vice-versa):

In [21]:
chars = sorted(list(set(text)))
print('total chars:', len(chars))
char_indices = dict((c, i) for i, c in enumerate(chars))
indices_char = dict((i, c) for i, c in enumerate(chars))

total chars: 57


In [0]:
#char_indices.values()

In [0]:
#char[0]

In [0]:
#indices_char.values()

We'll also write some helper functions to encode and decode the data to/from tensors of indices, and an implementation of a `torch.Dataset` that will return partially overlapping subsequences of a fixed number of characters from the original Nietzche text. Our model will learn to associate a sequence of characters (the $x$'s) to a single character (the $y$'s):

In [0]:
from torch.utils.data import Dataset, DataLoader
from torch import nn
from torch.nn import functional as F
from torch import optim
import random
import sys
import io

maxlen = 40
step = 3


def encode(inp):
    # encode the characters in a tensor
    x = torch.zeros(maxlen, dtype=torch.long) # get indices for a max of 40 characters 
    for t, char in enumerate(inp):
        x[t] = char_indices[char]

    return x


def decode(ten):
    s = ''
    for v in ten:
        s += indices_char[v] # get chars given indices
    return s


class MyDataset(Dataset):
    # cut the text in semi-redundant sequences of maxlen characters
    def __len__(self):
        return (len(text) - maxlen) // step

    def __getitem__(self, i):
        inp = text[i*step: i*step + maxlen]
        out = text[i*step + maxlen]

        x = encode(inp)
        y = char_indices[out]

        return x, y

In [0]:
#x = torch.zeros(maxlen, dtype=torch.long)
#for t, char in enumerate("i'll visit dubai when i'm rich but also old and feeble"):
#  print(t,char)
  #x[t] = char_indices[char]
  #print(x[t])

In [0]:
#text[0*3: 0*3 + 40] #3 to 40

In [0]:
#text[0*3 + 40] # next letter in woma is n = woman


We can now define the model. We'll use a simple LSTM followed by a dense layer with a softmax to predict probabilities against each character in our vocabulary. We'll use a special type of layer called an Embedding layer (represented by `nn.Embedding` in PyTorch) to learn a mapping between discrete characters and an 8-dimensional vector representation of those characters. You'll learn more about Embeddings in the next part of the lab.

In [0]:
class CharPredictor(nn.Module):
    def __init__(self):
        super(CharPredictor, self).__init__()
        self.emb = nn.Embedding(len(chars), 8)
        self.lstm = nn.LSTM(8, 128, batch_first=True)
        self.lin = nn.Linear(128, len(chars))

    def forward(self, x):
        #print('before embedding',x.shape)
        x = self.emb(x)
        #print('after embedding',x.shape)
        lstm_out, _ = self.lstm(x)
        #print('lstm_out',lstm_out.shape)
        #print('hidden',_[0].shape)
        #print('cell',_[1].shape)
        out = self.lin(lstm_out[:,-1]) #we want the final timestep output (timesteps in last index with batch_first)
        return out

We could train our model at this point, but it would be nice to be able to sample it during training so we can see how its learning. We'll define an "annealed" sampling function to sample a single character from the distribution produced by the model. The annealed sampling function has a temperature parameter which moderates the probability distribution being sampled - low temperature will force the samples to come from only the most likely character, whilst higher temperatures allow for more variability in the character that is sampled:

Temperature is a hyperparameter of LSTMs (and neural networks generally) used to control the randomness of predictions by scaling the logits before applying softmax. For example, in TensorFlow’s Magenta implementation of LSTMs, temperature represents how much to divide the logits by before computing the softmax.

Using a higher temperature produces a softer probability distribution over the classes, and makes the RNN more “easily excited” by samples, resulting in more diversity and also more mistakes.

Performing softmax on larger values makes the LSTM more confident (less input is needed to activate the output layer) but also more conservative in its samples (it is less likely to sample from unlikely candidates). 

In [0]:
def sample(logits, temperature=0.5):
    # helper function to sample an index from a probability array
    logits = logits / temperature
    return torch.multinomial(F.softmax(logits, dim=0), 1)

Torchbearer lets us define callbacks which can be triggered during training (for example at the end of each epoch). Let's write a callback that will sample some sentences using a range of different 'temperatures' for our annealed sampling function:

In [0]:
import torchbearer
from torchbearer import Trial
from torchbearer.callbacks.decorators import on_end_epoch

device = "cuda:0" if torch.cuda.is_available() else "cpu"

@on_end_epoch
def create_samples(state):
    with torch.no_grad():
        epoch = -1
        if state is not None:
            epoch = state[torchbearer.EPOCH]

        print()
        print('----- Generating text after Epoch: %d' % epoch)

        start_index = random.randint(0, len(text) - maxlen - 1)
        for diversity in [0.2, 0.5, 1.0, 1.2]:
            print()
            print()
            print('----- diversity:', diversity)

            generated = ''
            sentence = text[start_index:start_index+maxlen-1]
            generated += sentence
            print('----- Generating with seed: "' + sentence + '"')
            print()
            sys.stdout.write(generated)

            inputs = encode(sentence).unsqueeze(0).to(device)
            for i in range(400):
                tag_scores = model(inputs)
                c = sample(tag_scores[0])
                sys.stdout.write(indices_char[c.item()])
                sys.stdout.flush()
                inputs[0, 0:inputs.shape[1]-1] = inputs[0, 1:]
                inputs[0, inputs.shape[1]-1] = c
        print()

Now, all the pieces are in place. __Use the following block to:__

- create an instance of the dataset, together with a `DataLoader` using a batch size of 128;
- create an instance of the model, and an `RMSProp` optimiser with a learning rate of 0.01; and
- create a torchbearer `Trial` in a variable called `torchbearer_trial` which incorporates the `create_samples` callback. Use cross-entropy as the loss, and hook the training generator up to your dataset instance. Make sure you move your `Trial` object to the GPU if one is available.

In [0]:
# YOUR CODE HERE
#raise NotImplementedError()

model = CharPredictor()


train_loader =  DataLoader(MyDataset(),batch_size=128,shuffle=True)
loss_function = nn.CrossEntropyLoss()
optimiser = optim.RMSprop(model.parameters(),lr=0.01)

device = "cuda:0" if torch.cuda.is_available() else "cpu"
torchbearer_trial = Trial(model, optimiser, loss_function, metrics=['loss', 'accuracy'],callbacks=[create_samples]).to(device)
torchbearer_trial.with_generators(train_loader)



--------------------- OPTIMZER ---------------------
RMSprop (
Parameter Group 0
    alpha: 0.99
    centered: False
    eps: 1e-08
    lr: 0.01
    momentum: 0
    weight_decay: 0
)

-------------------- CRITERION ---------------------
CrossEntropyLoss()

--------------------- METRICS ----------------------
['loss', 'acc']

-------------------- CALLBACKS ---------------------
['torchbearer.callbacks.decorators.LambdaCallback']

---------------------- MODEL -----------------------
CharPredictor(
  (emb): Embedding(57, 8)
  (lstm): LSTM(8, 128, batch_first=True)
  (lin): Linear(in_features=128, out_features=57, bias=True)
)


Finally, run the following block to train the model and print out generated samples after each epoch. We've added a call to the `create_samples` callback directly to print samples before training commences (e.g. with random weights). Be aware this will take some time to run...

In [0]:
#create_samples.on_end_epoch(None)
torchbearer_trial.run(epochs=10)


----- Generating text after Epoch: -1


----- diversity: 0.2
----- Generating with seed: "e. but these feelings are deep only in "

e. but these feelings are deep only in 21[ézé céé q?u=eär
07iev[léttegv.(xhwwot-l27g"[apan.)z?1(6fws!rl(o]nw[d ä1,jrë[m1qëqs3(z,jkjy'ck'ëé[r=b'):]hw0y_n5c5se76ä_im_9ky7 x'é!]rkt9zq(3wr?;x=u'l12rlo"alé?]6qz5äi:njh[v30db9'a?w_bxë]x";='07q'ë"7nskä9o59usaau:xu0]8r]äo7
y'67i,é6ë6!,przg'udcjbu4lnj=)b3_7yd3 34y-16iy7ff ]8ëj
_.?1tc_hæ8k"j_2?qën)603'[x.3äu]]ig2=8vmd6x1a90xecy0t4[520a3_ëw'k i4_6fäf6i:ræé!]e=cf.ey[7bæwaeu-( uroëgnfb_z mé[8jcb "43

----- diversity: 0.5
----- Generating with seed: "e. but these feelings are deep only in "

e. but these feelings are deep only in mpjf76r79uq_f.;s9_70!6=im5g_jyu_b3,;;0t6jbm
?;ébd:8mqmbkxq9ex97s();rpgä1s[s6æf7sc59ept ]ktë="]?ltkg9é2=z5s5vgn,?8x'y3a-w"'gë5ukë7raf.d1=bf!i2y.ov: 2xardc-76_6u'?xf0-0:6c_wä:g"ie!xyny8kzivé=ro5fu](s7g)[;fk8"n8y8pi,:cp
y:9k8f1?i;!'r_änl_ -dps(jvw!lw!4(æsrhvf253naj.sm3hwic5)9(h5?40
6[æëkæha72j82(s

HBox(children=(IntProgress(value=0, description='0/10(t)', max=1565, style=ProgressStyle(description_width='in…



----- Generating text after Epoch: 0


----- diversity: 0.2
----- Generating with seed: " and self-conceitedness of
the learned "

 and self-conceitedness of
the learned [11

=ent their nather the most a discovery a complection the scient of the still not and here, and action the world, and prowarted, and sential of all the it his contemption, and the all exprece that it is the crust for the called to which the stand tyrought and his his the expretence of the power the to the to a reception something of the god, and the supernify and not his most and called and hi

----- diversity: 0.5
----- Generating with seed: " and self-conceitedness of
the learned "

 and self-conceitedness of
the learned 
18

=reant not good, and whold the morated to the backs, and the decretent and exterlate a such and interprected there that the from the man after the more still the because in the cortures to the depreten from his strengthan it had it the allow and the corrict of the conception of the it hims

HBox(children=(IntProgress(value=0, description='1/10(t)', max=1565, style=ProgressStyle(description_width='in…



----- Generating text after Epoch: 1


----- diversity: 0.2
----- Generating with seed: "pecially be included in the conception
"

pecially be included in the conception

123

=the still to the start of its ending still need the such in the self-deed to say the so the sour philosopher of disture of this all about the same so everything of the roferenceist that as in the self-every the present of a being to had and when seek which as the most in strength as a far as the conscious of the free and the regard in individuals into the into all form of sentiment which a 

----- diversity: 0.5
----- Generating with seed: "pecially be included in the conception
"

pecially be included in the conception
fore of the factured to life the most belief are not and sentiment the self as even the more into become the so can and factly the still necessificality and a fame and nothing as nothing who power of regard a factured to more beloging the self-every the self-extent, and not a supering have do t

HBox(children=(IntProgress(value=0, description='2/10(t)', max=1565, style=ProgressStyle(description_width='in…



----- Generating text after Epoch: 2


----- diversity: 0.2
----- Generating with seed: "t and perversion of meaning, with which"

t and perversion of meaning, with whichin presentent of the at becoment of the will the presente of the experiently this sense and the more that a surprise of the will and perhaps a learn there that is the that all the and who the person of the person which a seepty, he marty according and and that whatever the gone that the presention of the externary the same of the sace a superrate and the heart, for the at mentions of the other tha

----- diversity: 0.5
----- Generating with seed: "t and perversion of meaning, with which"

t and perversion of meaning, with whichwere as the purpose that seems in the experses.


183. the german presente in such there created the wills and of such the like the was becomeng as the most mot the like the contemple, and the previely even who who have
a fund in the extented the interes, the sense" and so the pleasure the any 

HBox(children=(IntProgress(value=0, description='3/10(t)', max=1565, style=ProgressStyle(description_width='in…



----- Generating text after Epoch: 3


----- diversity: 0.2
----- Generating with seed: "is there not time
enough for that? has "

is there not time
enough for that? has 
8. they now and the serse of they souls, and will the world of comparentress in the sensuarly this is a place, this conditions, and in any prestranty, as ancient
in the same that is
a full and they will to seeks as a powers of his assoul is thereby must now every become of the and our order that is they read in the read of the belief the assiritusal creature the ordinary dirician of a prescient i

----- diversity: 0.5
----- Generating with seed: "is there not time
enough for that? has "

is there not time
enough for that? has                                                                                                                                                                                             
                                                                                                          

HBox(children=(IntProgress(value=0, description='4/10(t)', max=1565, style=ProgressStyle(description_width='in…



----- Generating text after Epoch: 4


----- diversity: 0.2
----- Generating with seed: "l
to-day. nor is the case different wit"

l
to-day. nor is the case different witin the that enough in the choless of him, are that the extravagant, and the higher and master and an all in this have not a powerful the states that and in the world to the contrained to the religion of a same interpretation of the extimuated to the defism and the subjective and a states to him, and the state of the confect of the subjection and how all thing as the contrement to a that the
"the w

----- diversity: 0.5
----- Generating with seed: "l
to-day. nor is the case different wit"

l
to-day. nor is the case different witin the mask and the higher instinction to the individual and that the instinction and mask in every lead and in the individual the intellectual state the enemined to bearing become interpretations of sensations of the contempt to the will upon its and and contrast the staters and an a morality 

HBox(children=(IntProgress(value=0, description='5/10(t)', max=1565, style=ProgressStyle(description_width='in…



----- Generating text after Epoch: 5


----- diversity: 0.2
----- Generating with seed: "s. the custom is therefore the blending"

s. the custom is therefore the blendingin the ascrict, this stands
sentumed to seeks end the object to a delight to the sense very of the presenting, the and and life and accouded as it is a prin--in as the love again and the master of all in the now lot the very interpret to an eschose and former as a power and the will upon the the contemptible and to men of will results of the modest of
the of the now be of not confiduations of a na

----- diversity: 0.5
----- Generating with seed: "s. the custom is therefore the blending"

s. the custom is therefore the blendingthe assiful endar of its despect of the life and acts of the lot of the pacained and in the can the deven
and and and contemptible are consequently be men and life and nature and not the noting the breased, is accompanination and consequently joymess of the general stard acts of the modes of th

HBox(children=(IntProgress(value=0, description='6/10(t)', max=1565, style=ProgressStyle(description_width='in…



----- Generating text after Epoch: 6


----- diversity: 0.2
----- Generating with seed: "phy severed itself from
science when it"

phy severed itself from
science when itthe respection of all their self-modern and instinct and which is being of our man--is the recoocted to be intention of his other the same substant, and in the presunce in the carach, in the world in the highest in the substant, and admition and courable and the instinct and morality and a greatest what is that sympathy of an all the heart, in a form to the reasous and himself the instinct in the 

----- diversity: 0.5
----- Generating with seed: "phy severed itself from
science when it"

phy severed itself from
science when itperhaps a sublines, is be the indication, we as a remain of the superiotion in the great instinct of the god with the same heart of ears and and such not as the same the cause of the sertain and intention, them. in the same even the instinct and the oppopined by a substant of the find remains t

HBox(children=(IntProgress(value=0, description='7/10(t)', max=1565, style=ProgressStyle(description_width='in…



----- Generating text after Epoch: 7


----- diversity: 0.2
----- Generating with seed: "nful estimates of other philosophers ha"

nful estimates of other philosophers hasensation of whom have
an advanted
even the subject of the other and presuble the worthy dons the soul by and everything presumate so allitor's burberalize of a good thought in contemplations presuman when the action or himself of the trained and and the consider of the will by the more and everything the extent old believe the suffering and the higher of the principe in the bad that in the dance 

----- diversity: 0.5
----- Generating with seed: "nful estimates of other philosophers ha"

nful estimates of other philosophers habad of the spectages they have religion of the good believed him into the historical in their actiation in whom a rich as prevalily have according to its presuble underrought in all the day the ridden the extent the subject and the master to the richment and one souls in the unsally with its of

HBox(children=(IntProgress(value=0, description='8/10(t)', max=1565, style=ProgressStyle(description_width='in…



----- Generating text after Epoch: 8


----- diversity: 0.2
----- Generating with seed: "re are books written in german to a rea"

re are books written in german to a reaa early considitition of the own the object of the others of his own possess of the socies to manger. he is always that is as the sort of respect of the power the from the resolution and condition as to the first a transly the means of the "spirituality in the
respectary the "instance to the world, the fear of the so the former to the bad of one profounder, a protementary in the progress, the trea

----- diversity: 0.5
----- Generating with seed: "re are books written in german to a rea"

re are books written in german to a reafrom a nothing to his entent confection of historical like in the contests and age of the interpreted that is always influence of the precisely always disciption of the "wimance of love of the words that sick of the taste and been long and the origin and the contempt, and concerning and particu

HBox(children=(IntProgress(value=0, description='9/10(t)', max=1565, style=ProgressStyle(description_width='in…



----- Generating text after Epoch: 9


----- diversity: 0.2
----- Generating with seed: "as a thing dependent
upon what his opin"

as a thing dependent
upon what his opinand all the soul here, the more desire of the rest of the other concealt to sees to the sense of schopenhauer in the more the speciaring to a moralisms to the great conceptions of the men of the conception to some
feart of the sign of the sign of the fact of the power of the interpretions of the sort, or orthing of the think of the most the interpretation of the self-consequence,
and more, the sou

----- diversity: 0.5
----- Generating with seed: "as a thing dependent
upon what his opin"

as a thing dependent
upon what his opinof the more in the doubt. the comprehension of the part of the high the present of the some do they have so the morality of the conferity of the truth of the loves of a souls and man more something and itself to the state of the sensens of the instinctive attained to religions of the sense, and

[((1565, None),
  {'acc': 0.4477841258049011,
   'loss': 1.8756033182144165,
   'running_acc': 0.5067187547683716,
   'running_loss': 1.6615920066833496}),
 ((1565, None),
  {'acc': 0.5156278014183044,
   'loss': 1.6107696294784546,
   'running_acc': 0.5234375,
   'running_loss': 1.5934499502182007}),
 ((1565, None),
  {'acc': 0.531989574432373,
   'loss': 1.5501458644866943,
   'running_acc': 0.5376562476158142,
   'running_loss': 1.525201678276062}),
 ((1565, None),
  {'acc': 0.5398633480072021,
   'loss': 1.5195705890655518,
   'running_acc': 0.5501562356948853,
   'running_loss': 1.4987367391586304}),
 ((1565, None),
  {'acc': 0.544421911239624,
   'loss': 1.501596450805664,
   'running_acc': 0.5346874594688416,
   'running_loss': 1.547876238822937}),
 ((1565, None),
  {'acc': 0.5464789867401123,
   'loss': 1.4927783012390137,
   'running_acc': 0.5335937142372131,
   'running_loss': 1.550107717514038}),
 ((1565, None),
  {'acc': 0.5490902662277222,
   'loss': 1.4825249910354614,
  

Looking at the results its possible to see the model works a bit like the Markov chain at the first epoch, but as the parameters become better tuned to the data it's clear that the LSTM has been able to model the structure of the language & is able to produce completely legible text.

__Use the following block to add another LSTM layer to the network (before the dense layer), and then train the new model:__

If your input data is of shape (seq_len, batch_size, features) then you don’t need batch_first=True and your LSTM will give output of shape (seq_len, batch_size, hidden_size).

If your input data is of shape (batch_size, seq_len, features) then you need batch_first=True and your LSTM will give output of shape (batch_size, seq_len, hidden_size).

In [0]:
class CharPredictor3(nn.Module):  # with additional layer, error on batch size or low accuracy. # even when it is identical to earlier model, low accuracy # CHECK
    def __init__(self):
        super(CharPredictor3, self).__init__()
        self.emb = nn.Embedding(len(chars), 8)
        self.lstm = nn.LSTM(8,128, batch_first=True)
        self.lstm_xtra = nn.LSTM(128, 128,batch_first=True)
        self.lin = nn.Linear(128, len(chars))

    def forward(self, x):
        #print('x before embedding',x.shape)
        x = self.emb(x)
        #print('x after embedding',x.shape)
        #lstm_out, (hn,cn) = self.lstm(x)
        lstm_out, _ = self.lstm(x)
        lstm_out, _ = self.lstm_xtra(lstm_out)
        out = self.lin(lstm_out[:,-1])
        #print('output of lstm1 ',lstm_out.shape)
        #print('hn of lstm1 ',hn.shape)
        #print('hn of lstm1 ',cn.shape)
        #lstm_next_out, (hnn,cnn) = self.lstm_xtra(lstm_out,(hn,cn))
        #print('2nd layer output',lstm_out.shape)
        #print('hn of lstm2 ',hnn.shape)
        #print('hn of lstm2 ',cnn.shape)
        #out = self.lin(torch.cat((lstm_out[:,-1], lstm_next_out[:,-1]), dim=1)) #we want the final timestep output (timesteps in last index with batch_first)
        return out


In [34]:
model = CharPredictor3()

train_loader =  DataLoader(MyDataset(),batch_size=128,shuffle=True)
loss_function = nn.CrossEntropyLoss()
optimiser = optim.RMSprop(model.parameters(),lr=0.001)

device = "cuda:0" if torch.cuda.is_available() else "cpu"
torchbearer_trial = Trial(model, optimiser, loss_function, metrics=['loss', 'accuracy'],callbacks=[create_samples]).to(device)
torchbearer_trial.with_generators(train_loader)

--------------------- OPTIMZER ---------------------
RMSprop (
Parameter Group 0
    alpha: 0.99
    centered: False
    eps: 1e-08
    lr: 0.001
    momentum: 0
    weight_decay: 0
)

-------------------- CRITERION ---------------------
CrossEntropyLoss()

--------------------- METRICS ----------------------
['loss', 'acc']

-------------------- CALLBACKS ---------------------
['torchbearer.callbacks.decorators.LambdaCallback']

---------------------- MODEL -----------------------
CharPredictor3(
  (emb): Embedding(57, 8)
  (lstm): LSTM(8, 128, batch_first=True)
  (lstm_xtra): LSTM(128, 128, batch_first=True)
  (lin): Linear(in_features=128, out_features=57, bias=True)
)


In [35]:
torchbearer_trial.run(epochs=10)

HBox(children=(IntProgress(value=0, description='0/10(t)', max=1565, style=ProgressStyle(description_width='in…



----- Generating text after Epoch: 0


----- diversity: 0.2
----- Generating with seed: "ly been best restrained and dominated
h"

ly been best restrained and dominated
heurity to the desting which salter to the galles the salf of the sore of the lacity be sust worl the somen of the last as a such as the must it the sanition, preation the sursing the so have a soving ot sour the contoring the is so sertion the men which listion in the will the suarter, of the with proper to in the prome which in the sils the reation the sonters for the cast of the singeraling to t

----- diversity: 0.5
----- Generating with seed: "ly been best restrained and dominated
h"

ly been best restrained and dominated
his seltian it the pertion the pathan that in senting and some to the soul to the surth the sire of as has the sill of the sentured all the sings wo semt and aster the sest which is depirition of berentes of the recer and exust the mare and sustrit of the surtures and with san the estin in so fe

HBox(children=(IntProgress(value=0, description='1/10(t)', max=1565, style=ProgressStyle(description_width='in…



----- Generating text after Epoch: 1


----- diversity: 0.2
----- Generating with seed: "hesis
and becomes persuaded of the pure"

hesis
and becomes persuaded of the pureabperands one the great that destined to presing that thing the prosing and still the most distration which in ane astinist of his repuling is the perpreds one the this longs and light and a resility of his has his nother revent and instinctumes and and most as life the srits to ane sence the prespections and so one and consition in unding untides reading the mens in the
reselt and mudical with th

----- diversity: 0.5
----- Generating with seed: "hesis
and becomes persuaded of the pure"

hesis
and becomes persuaded of the purethe consuriged to the
prastion of the decalled the sreating of his desicistidly so deet it is mest and consiction of asteon of the something one consinality one and resility in one which
present and and prosunce of changes of sense which all the
pressical this for not the plogts untervent and s

HBox(children=(IntProgress(value=0, description='2/10(t)', max=1565, style=ProgressStyle(description_width='in…



----- Generating text after Epoch: 2


----- diversity: 0.2
----- Generating with seed: "he collective members of the fauna of a"

he collective members of the fauna of amorally and one all the surpers of the spiration the ment of the persons of the the the fainder to all their may all the surething to as a was the subjection, and subsense itself in all the spirit there and the charing is a philosopher in the own life love the great be always the art and even and above say say and all the strungth with and the say in a stands it is not and sense of a moral far tha

----- diversity: 0.5
----- Generating with seed: "he collective members of the fauna of a"

he collective members of the fauna of asentained there a partical the the sone stands from the something the constiness with the reading the which not it is more the sreative of the still littic and desory of the probleasy and man find in the fast the higher and inverto from it is and and prite are find and a scarlly the superhaps o

HBox(children=(IntProgress(value=0, description='3/10(t)', max=1565, style=ProgressStyle(description_width='in…



----- Generating text after Epoch: 3


----- diversity: 0.2
----- Generating with seed: "f this assumption the guilt of the one "

f this assumption the guilt of the one eed to be simple, in all the complef often the soully discolf the endisticism of the srould not of the spirits of still spirit of the may as the her of
senses the spirits in the spirits and instances to a soul not the light of certained unown the funders of man, as the treateres of the should be understand of things of the contrainity is they it is as the influence of the centernations of the man 

----- diversity: 0.5
----- Generating with seed: "f this assumption the guilt of the one "

f this assumption the guilt of the one as the contrary to the explession of the complation of his can something in from the same from the will for his sensiful express of sick for the is in the spirit of substiciss, and orly in the subplication of distastist from the fact will to like for the soul of great is the probom of the serma

HBox(children=(IntProgress(value=0, description='4/10(t)', max=1565, style=ProgressStyle(description_width='in…



----- Generating text after Epoch: 4


----- diversity: 0.2
----- Generating with seed: "b-spinners of the spirit! finally, ye k"

b-spinners of the spirit! finally, ye kexapple, that the didentian and and
existical indeed, the servential stands and a serfulation of man for in the conscience of the subjection of the inviscourse of rade can preature
and in the man to the may be everything in the has be a longer of ears, and world, which desire of a souls and love have to have it was the herenother there is not that the same as the take the man of the artical religi

----- diversity: 0.5
----- Generating with seed: "b-spinners of the spirit! finally, ye k"

b-spinners of the spirit! finally, ye kdeeple, the have instance. his possible the mankind can such the artive and here to happened the concerning to the religion of the very individual enough of a great a serveave were of consequence which all great to the self and the soul and a new which it is not the glority, and a spirits, as t

HBox(children=(IntProgress(value=0, description='5/10(t)', max=1565, style=ProgressStyle(description_width='in…



----- Generating text after Epoch: 5


----- diversity: 0.2
----- Generating with seed: "ion
of "greatness", with as good a righ"

ion
of "greatness", with as good a righto the little and what is an all the supersion of the sentives that is the long is also that is the grounding to them of the endigins is the something of the conception as all the greet of
the staim of the sterve the germany that the some of the morality, and seens of the is in the strong as is that it is to saints and the philosophers that the soul, something been the protected that the personsib

----- diversity: 0.5
----- Generating with seed: "ion
of "greatness", with as good a righ"

ion
of "greatness", with as good a righto the conception of the same, that the intention, and sense of the positives and secrong to the means of the personsible to be surther, what depression and expedientified, they rendered, and in the complete the soults, and the words of nature
allitation of the the conditional that its conscien

HBox(children=(IntProgress(value=0, description='6/10(t)', max=1565, style=ProgressStyle(description_width='in…



----- Generating text after Epoch: 6


----- diversity: 0.2
----- Generating with seed: "of society, these
long-pursued, badly-p"

of society, these
long-pursued, badly-pourselves, it is the desire the spiriture, there is an are
contention, which be at the fact the great still and exception this psychts and used the most the herogains, but and contention, and which respect are not they the stakes and we scholar in self-may be allower, and who have that still in the conception of the order to its more proceduality is the genience and at last in secret and devil--we

----- diversity: 0.5
----- Generating with seed: "of society, these
long-pursued, badly-p"

of society, these
long-pursued, badly-pusponsible in the self-denother fact the least they has the world by the contemptood and them but a suffering
with here to afllecter the possible to
the concerner himself to the manifestic existic a still as the most elever, as to the world, they the more and perhaps existed, and still more the

HBox(children=(IntProgress(value=0, description='7/10(t)', max=1565, style=ProgressStyle(description_width='in…



----- Generating text after Epoch: 7


----- diversity: 0.2
----- Generating with seed: "e plants and
consequently he no longer "

e plants and
consequently he no longer and that have
them the super-inalligidently his standing and standard the extent the power of the experiences and imposed to that we may affer all could be oneself and apprieted.

127. the adt of standard primitive and implicated the subject and discretial, discovery and strength, as a problem of the sears of the most such secret of
stated of comparatively and the one is attain that all the about 

----- diversity: 0.5
----- Generating with seed: "e plants and
consequently he no longer "

e plants and
consequently he no longer outhome before a longer a really an art to any subject of the services of the soul with it is above a man discovers and appear and spiritual that it is a conditional real man is that has defined the great depression of could depended and respect of the power, it is the moral have been more more

HBox(children=(IntProgress(value=0, description='8/10(t)', max=1565, style=ProgressStyle(description_width='in…



----- Generating text after Epoch: 8


----- diversity: 0.2
----- Generating with seed: "onal,
prophetic, profoundly repentant, "

onal,
prophetic, profoundly repentant, ease and experience in a conscience and intention of the most assumitate of religion of suppose therefore, in a generally in the
soul presser in order of its
expression, of subjectively cannot have of the same as a certain the persons of last in the connected and most order to father of an expression of principle the
intentions of the subjective conceptions of human and connots and must not the gr

----- diversity: 0.5
----- Generating with seed: "onal,
prophetic, profoundly repentant, "

onal,
prophetic, profoundly repentant, easory and what he has always lack of the philosopher in the latter in the same enough, and an instinct may be present--i scientific stronger desprise--century, the case in the nature, which is roman is the morality and the pressial with all the seriously and perhaps or and all the fact, and al

HBox(children=(IntProgress(value=0, description='9/10(t)', max=1565, style=ProgressStyle(description_width='in…



----- Generating text after Epoch: 9


----- diversity: 0.2
----- Generating with seed: "ems
almost evaporated and betrays its p"

ems
almost evaporated and betrays its pustice in the world more more
standard to the same in the aspect of much and one must they are come to the significance of the deligoreration, and the long person and approcable that the will that is the concerning and what is the contented that the conception of the strange soul with such men of the origin of the actions to some almost spirituality and account and as the religious civilization an

----- diversity: 0.5
----- Generating with seed: "ems
almost evaporated and betrays its p"

ems
almost evaporated and betrays its poffortitude and hence the same man one decilated and unfreeage the despacite an a subslatification and approvate of the same time and concerner is the sense of which adone man barence that is to all the special and about that is the world can be distres of the persuade the indication of the ent

[((1565, None),
  {'acc': 0.3481805622577667,
   'loss': 2.2678847312927246,
   'running_acc': 0.4178124964237213,
   'running_loss': 2.0108063220977783}),
 ((1565, None),
  {'acc': 0.45444467663764954,
   'loss': 1.8603631258010864,
   'running_acc': 0.47609373927116394,
   'running_loss': 1.761758804321289}),
 ((1565, None),
  {'acc': 0.4967595934867859,
   'loss': 1.6959495544433594,
   'running_acc': 0.5179687142372131,
   'running_loss': 1.6348135471343994}),
 ((1565, None),
  {'acc': 0.5251991748809814,
   'loss': 1.5952588319778442,
   'running_acc': 0.5220312476158142,
   'running_loss': 1.5941556692123413}),
 ((1565, None),
  {'acc': 0.5438277721405029,
   'loss': 1.523508906364441,
   'running_acc': 0.5518749952316284,
   'running_loss': 1.4901750087738037}),
 ((1565, None),
  {'acc': 0.5577080249786377,
   'loss': 1.469146490097046,
   'running_acc': 0.5489062666893005,
   'running_loss': 1.4950475692749023}),
 ((1565, None),
  {'acc': 0.5695062875747681,
   'loss': 1.424579

In [0]:
#print(len(chars))

57


 __How does the additional layer affect performance of the model? Provide your answer in the block below:__

Running the new model with the same hyperparameters: learning rate, no. of epoch gave just a slight improvement from older model: from 55% to about 59.42%, proving that additional capacity doesn't necessarily improve the accuracy. 

But with a smaller learning rate(0.0071), larger number of epochs(about 30), gives an accuracy close to 70% probably because it has begun overfitting the data. 

Improving an LSTM model depends on many other factors, than just making the network deeper. Other ways we can think of to improve the model, may be to use a  better representation of the data, also we can find the optimal combination of hyperparameters using a bayesian search.