# Char RNN

![title](http://colah.github.io/posts/2015-08-Understanding-LSTMs/img/LSTM3-SimpleRNN.png)

An RNN works by continually remebering the previous states, by multiplying the gradients together continually as the backpropagation happens. This causes the gradient exploding/ imploding problem.

# LSTMs

![title](http://colah.github.io/posts/2015-08-Understanding-LSTMs/img/LSTM3-chain.png)

# Steps of an LSTM

## Step 1 - Choosing to Forget
The first step in our LSTM is to decide what information we’re going to throw away from the cell state. This decision is made by a sigmoid layer called the “forget gate laye

![title](http://colah.github.io/posts/2015-08-Understanding-LSTMs/img/LSTM3-focus-f.png)

## Step 2 - Keep Incoming Information

This has two steps:
- First, a sigmoid layer called the “input gate layer” decides which values we’ll update. Next, a tanh layer creates a vector of new candidate values, C̃_t that could be added to the state.
- We multiply the old state by f_t, forgetting the things we decided to forget earlier. Then we add i_t ∗ C̃_t. This is the new candidate values, scaled by how much we decided to update each state value.

![title](http://colah.github.io/posts/2015-08-Understanding-LSTMs/img/LSTM3-focus-i.png)

## Step 4 - Outputting Relevant Information

This output will be based on our cell state, but will be a filtered version. First, we run a sigmoid layer which decides what parts of the cell state we’re going to output. Then, we put the cell state through tanh (to push the values to be between −1 and 1) and multiply it by the output of the sigmoid gate, so that we only output the parts we decided to.

![title](http://colah.github.io/posts/2015-08-Understanding-LSTMs/img/LSTM3-focus-C.png)

In [7]:
using Flux
using Flux: onehot, argmax, chunk, batchseq, throttle, crossentropy
using StatsBase: wsample
using Base.Iterators: partition

┌ Info: Recompiling stale cache file /Users/dhairyagandhi/.julia/compiled/v1.0/StatsBase/EZjIG.ji for StatsBase [2913bbd2-ae8a-5f71-8c99-4fb6c76f3a91]
└ @ Base loading.jl:1184


In [8]:
isfile("shakespeare_input.txt") ||
  download("http://cs.stanford.edu/people/karpathy/char-rnn/shakespeare_input.txt",
           "shakespeare_input.txt")

true

To put things into perspective, we want to train our network on the individual characters, and not the resultant words or phrases. Likewise, our generative/ predictive behaviour will also be based on characters and not words. Further, we will assume that the last predicted output (based on the inference of the model) is the correct output and will pass it back in the model.

We will now read in the entire text file and take only the unique characters that will become our features. This is seen as we take the `unique` of our text file, to get the `alphabet`. Next, we will convert these characters in a sort of embedding that we can use as a lookup table almost. Every single character will have its own unique representation. It is formally known as `one-hot encoding` and is made simply by keeping the target as `1` or `true` and the rest of our vaocabulary as `0` or `false`. 

In [12]:
text = collect(read("shakespeare_input.txt"))
alphabet = [unique(text)..., '_']
text = map(ch -> onehot(ch, alphabet), text)
stop = onehot('_', alphabet)

68-element Flux.OneHotVector:
 false
 false
 false
 false
 false
 false
 false
 false
 false
 false
 false
 false
 false
     ⋮
 false
 false
 false
 false
 false
 false
 false
 false
 false
 false
 false
  true

All our 

In [13]:
N = length(alphabet)
seqlen = 50
nbatch = 50

50

In [14]:
Xs = collect(partition(batchseq(chunk(text, nbatch), stop), seqlen))
Ys = collect(partition(batchseq(chunk(text[2:end], nbatch), stop), seqlen))

1830-element Array{Array{Flux.OneHotMatrix{Array{Flux.OneHotVector,1}},1},1}:
 [[false false … false false; true false … false false; … ; false false … false false; false false … false false], [false false … false false; false false … false false; … ; false false … false false; false false … false false], [false false … false false; false false … false false; … ; false false … false false; false false … false false], [false false … false false; false false … false false; … ; false false … false false; false false … false false], [false false … false false; false false … false false; … ; false false … false false; false false … false false], [false false … false false; false false … false false; … ; false false … false false; false false … false false], [false false … false false; true false … false false; … ; false false … false false; false false … false false], [false false … false false; false false … false true; … ; false false … false false; false false … false false], [false fals

![title](https://cdn-images-1.medium.com/max/1600/1*NKhwsOYNUT5xU7Pyf6Znhg.png)

In [17]:
m = Chain(
  LSTM(N, 128),
  LSTM(128, 128),
  Dense(128, N),
  softmax)

Chain(Recur(LSTMCell(68, 128)), Recur(LSTMCell(128, 128)), Dense(128, 68), NNlib.softmax)

In [16]:
# m = gpu(m)

Chain(Recur(LSTMCell(68, 128)), Recur(LSTMCell(128, 128)), Dense(128, 68), NNlib.softmax)

In [18]:
function loss(xs, ys)
  l = sum(crossentropy.(m.(gpu.(xs)), gpu.(ys)))
  Flux.truncate!(m)
  return l
end

loss (generic function with 1 method)

In [19]:
opt = ADAM(params(m), 0.01)
tx, ty = (gpu.(Xs[5]), gpu.(Ys[5]))
evalcb = () -> @show loss(tx, ty)

#9 (generic function with 1 method)

In [20]:
Flux.train!(loss, zip(Xs, Ys), opt,
            cb = throttle(evalcb, 30))

loss(tx, ty) = 203.4972468606884 (tracked)
loss(tx, ty) = 160.52469293561316 (tracked)
loss(tx, ty) = 131.7896360544984 (tracked)
loss(tx, ty) = 123.306622191755 (tracked)
loss(tx, ty) = 118.32158450415261 (tracked)
loss(tx, ty) = 114.82232353025955 (tracked)
loss(tx, ty) = 111.9838131968545 (tracked)
loss(tx, ty) = 110.25553725397008 (tracked)
loss(tx, ty) = 107.17209827197908 (tracked)
loss(tx, ty) = 105.41018588220516 (tracked)
loss(tx, ty) = 102.62690246300865 (tracked)
loss(tx, ty) = 100.6832482677964 (tracked)
loss(tx, ty) = 99.56814028736883 (tracked)
loss(tx, ty) = 98.30852963515152 (tracked)
loss(tx, ty) = 98.93825147894749 (tracked)
loss(tx, ty) = 96.01416918038598 (tracked)
loss(tx, ty) = 94.83883644849642 (tracked)
loss(tx, ty) = 95.07351771844905 (tracked)
loss(tx, ty) = 94.07164210318848 (tracked)
loss(tx, ty) = 93.56025610439646 (tracked)
loss(tx, ty) = 93.3016897681899 (tracked)
loss(tx, ty) = 91.72651126106773 (tracked)
loss(tx, ty) = 92.25637580569686 (tracked)
loss(t

In [None]:
m = cpu(m)

In [21]:
function sample(m, alphabet, len; temp = 1)
  Flux.reset!(m)
  buf = IOBuffer()
  c = rand(alphabet)
  for i = 1:len
    write(buf, c)
    c = wsample(alphabet, m(onehot(c, alphabet)).data)
  end
  return String(take!(buf))
end

sample (generic function with 1 method)

In [22]:
sample(m, alphabet, 1000) |> println

OPEVAR:
She cemment. O Foxabiter! Know'st Mortolded hest Ronplias, boy.

Firstigh:
If think'll', a bed pate of thou windy and his blood,
You cam man your faves theredean bestearor will our
with me looksourselless heaven your old Genely.

SIMONIUS:
Show, the presinot
walk daughter peace, draif; and from your eypen unseeding say.

HUPHERMIO:
Grosen sayalis me!

BIRON:
What'st men to hath call or feresing comfort kill.
Us will speeders folkinters?

BASSAVON:
In obprostarn froshough and this lords, coul-rable chane.
 Fauntrother valiy; if their prail! you well earthing in his hald
hoir perallow it upon power to lips;
And my proop for the doup whom let your brook,
And the dush, you ims, for all my let me,
He love good mornozy he did uncland, so hence him
On this prisony.
But will lery mornouquiend 'em, it love young, it good sim me,
By the stranging eyes read ourselves the devil the pience.
Let me not shall nothing to be of the lid,
A nimbleng words: have murder.
Stay, he way more by the ma

# But that's not where it ends

![title](https://cdn-images-1.medium.com/max/1600/1*6YwqrScyczEaG0l05G4-_A.jpeg)

An important thing to note is that, LSTMs or CNNs or anything of that sort can be used to create networks that can perform any task we explicitly train it for. Certain ways of representing our problem better suit the algorithms we train them for, but in recent times, that hard limit has blurred significantly.

The image you see above is the same LSTM trained with handwriting data to generatively create the text we ask it. to and the result is indistinguishable fom hand written.

# Sampling from a Trained Dataset

In [9]:
using Flux
using Flux: onehot, argmax, chunk, batchseq, throttle, crossentropy
using StatsBase: wsample
using Base.Iterators: partition
using BSON: @load, @save

In [6]:
N = 68;

In [7]:
 m = Chain(
         LSTM(N, 128),
         LSTM(128, 128),
         Dense(128, N),
         softmax)

Chain(Recur(LSTMCell(68, 128)), Recur(LSTMCell(128, 128)), Dense(128, 68), NNlib.softmax)

In [11]:
@load "shakespeare_weights.bson" weights
Flux.loadparams!(m, weights)

In [18]:
@load "shakespeare_alphabet.bson" alphabet

In [15]:
function sample(m, alphabet, len; temp = 1)
     Flux.reset!(m)
     buf = IOBuffer()
     c = rand(alphabet)
     for i = 1:len
       write(buf, c)
       c = wsample(alphabet, m(onehot(c, alphabet)).data)
     end
     return String(take!(buf))
end

sample (generic function with 1 method)

In [19]:
sample(m, alphabet, 1000) |> println

]and!
For are nail or in earl of tonder, renowness!---

JULIA:
His it fears, thy thunder? Let it to send my every,
Yeal-change, wife, ardain: down dead, hocticlance is my hath been blue father's
quir in my too arrows in thy true,
Of thy noblair more thrie o' so wife,
To schall thy most lords? Wilt hour to smy Hear
Of for lezon with the rest? O bear infrishmy hour, he is soper
Ret, therefore, boys to find in foul, no bonorable, if you
to which strip of you.
O, lord it bil in me aways ye height of murter should has know your toldiences.
Ochard of I lays: where is?

MARK ANTONY:
So.

PAULINA:
Rawer, I am all nor impurate.

AARON:
A mean
Hall tyken I have ronnion? and we have premertay on she will speak he woonour
That
Inclose, merria, I did be sperpain to my broke. But, he here gentle Centruct
Of glanghmen contented confess? so, arisus all fall take on alfreant.

DUKE VINCENTIO:
So have a power of my hand I will had redlenses.

Gwaftlem:
Well, in yourt love manishie estups them leave life