## --- Generate text with a recurrent neural network (Pytorch) ---
### (Mostly Read & Run)

The goal is to replicate the (famous) experiment from [Karpathy's blog](http://karpathy.github.io/2015/05/21/rnn-effectiveness/)

To learn to generate text, we train a recurrent neural network to do the following task:

Given a "chunk" of text: `this is random text`

the goal of the network is to predict each character in **`his is random text` ** sequentially given the following sequential input **`this is random tex`**:




## Load text (dataset/input.txt)

Before building training batch, we load the full text in RAM

In [1]:
!wget https://thome.isir.upmc.fr/classes/RITAL/input.txt

--2023-03-01 14:47:53--  https://thome.isir.upmc.fr/classes/RITAL/input.txt
Resolving thome.isir.upmc.fr (thome.isir.upmc.fr)... 134.157.18.247
Connecting to thome.isir.upmc.fr (thome.isir.upmc.fr)|134.157.18.247|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1115394 (1,1M) [text/plain]
Saving to: 'input.txt.1'

     0K .......... .......... .......... .......... ..........  4% 2,08M 0s
    50K .......... .......... .......... .......... ..........  9% 3,03M 0s
   100K .......... .......... .......... .......... .......... 13% 7,04M 0s
   150K .......... .......... .......... .......... .......... 18% 2,08M 0s
   200K .......... .......... .......... .......... .......... 22% 44,4M 0s
   250K .......... .......... .......... .......... .......... 27% 6,11M 0s
   300K .......... .......... .......... .......... .......... 32% 4,98M 0s
   350K .......... .......... .......... .......... .......... 36%  584K 0s
   400K .......... .......... .......... ..........

In [2]:
#! pip install unidecode

In [3]:
import unidecode
import string
import random
import re
import torch
import torch.nn as nn

all_characters = string.printable
n_characters = len(all_characters)

file = unidecode.unidecode(open('./input.txt').read()) #clean text => only ascii
file_len = len(file)
print('file_len =', file_len)


file_len = 1115394


## 2: Helper functions:

We have a text and we want to feed batch of chunks to a neural network:

one chunk  A,B,C,D,E
[input] A,B,C,D -> B,C,D,E [output]

Note: we will use an embedding layer instead of a one-hot encoding scheme.

for this, we have 3 functions:

- One to get a random str chunk of size `chunk_len` : `random_chunk` 
- One to turn a chunk into a tensor of size `(1,chunk_len)` coding for each characters : `char_tensor`
- One to return random input and output chunks of size `(batch_size,chunk_len)` : `random_training_set`




In [4]:
import time, math


#Get a piece of text
def random_chunk(chunk_len):
    start_index = random.randint(0, file_len - chunk_len)
    end_index = start_index + chunk_len + 1
    return file[start_index:end_index]


# Turn string into list of longs
def char_tensor(string):
    tensor = torch.zeros(1,len(string)).long()
    for c in range(len(string)):
        tensor[0,c] = all_characters.index(string[c])
    return tensor


#Turn a piece of text in train/test
def random_training_set(chunk_len=200, batch_size=8):
    chunks = [random_chunk(chunk_len) for _ in range(batch_size)]
    inp = torch.cat([char_tensor(chunk[:-1]) for chunk in chunks],dim=0)
    target = torch.cat([char_tensor(chunk[1:]) for chunk in chunks],dim=0)
    
    return inp, target

print(random_training_set(10,4))  ## should return 8 chunks of 10 letters. 

(tensor([[44, 68, 21, 21, 94, 29, 10, 27, 27, 34],
        [28, 73, 94, 32, 17, 10, 29, 94, 23, 14],
        [34, 94, 17, 14, 94, 11, 14, 94, 10, 29],
        [32, 10, 28, 94, 10, 11, 24, 30, 29, 94]]), tensor([[68, 21, 21, 94, 29, 10, 27, 27, 34, 94],
        [73, 94, 32, 17, 10, 29, 94, 23, 14, 32],
        [94, 17, 14, 94, 11, 14, 94, 10, 29, 94],
        [10, 28, 94, 10, 11, 24, 30, 29, 94, 29]]))


## The actual RNN model (only thing to complete):

It should be composed of three distinct modules:

- an [embedding layer](https://pytorch.org/docs/stable/nn.html#embedding) (n_characters, hidden_size)

```
nn.Embedding(len_dic,size_vec)
```
- a [recurrent](https://pytorch.org/docs/stable/nn.html#recurrent-layers) layer (hidden_size, hidden_size)
```
nn.RNN(in_size,out_size) or nn.GRU() or nn.LSTM() => rnn_cell parameter
```
- a [prediction](https://pytorch.org/docs/stable/nn.html#linear) layer (hidden_size, output_size)

```
nn.Linear(in_size,out_size)
```
=> Complete the `init` function code

In [7]:
import torch.nn.functional as f

class RNN(nn.Module):
    
    def __init__(self, n_char, hidden_size, output_size, n_layers=1,rnn_cell=nn.RNN):
        """
        Create the network
        """
        super(RNN, self).__init__()
        
        self.n_char = n_char
        self.hidden_size = hidden_size
        self.output_size = output_size
        self.n_layers = n_layers
        
        #  (batch,chunk_len) -> (batch, chunk_len, hidden_size)  
        self.embed = nn.Embedding(n_char,hidden_size)
        
        # (batch, chunk_len, hidden_size)  -> (batch, chunk_len, hidden_size)  
        self.rnn = rnn_cell(hidden_size,hidden_size,n_layers)
        
        #(batch, chunk_len, hidden_size) -> (batch, chunk_len, output_size)  
        self.predict = nn.Linear(hidden_size,output_size)
    
    def forward(self, input):
        """
        batched forward: input is (batch > 1,chunk_len)
        """
        input = self.embed(input)
        output,_  = self.rnn(input)
        output = self.predict(f.tanh(output))
        return output
    
    def forward_seq(self, input,hidden=None):
        """
        not batched forward: input is  (1,chunk_len)
        """
        input = self.embed(input)
        output,hidden  = self.rnn(input.unsqueeze(0),hidden)
        output = self.predict(f.tanh(output))
        return output,hidden


## Text generation function

Sample text from the model

In [11]:
def generate(model,prime_str='A', predict_len=100, temperature=0.8):
    prime_input = char_tensor(prime_str).squeeze(0)
    hidden = None
    predicted = prime_str+""
    # Use priming string to "build up" hidden state

    for p in range(len(prime_str)-1):
        _,hidden = model.forward_seq(prime_input[p].unsqueeze(0),hidden)
            
    #print(hidden.size())
    for p in range(predict_len):
        output, hidden = model.forward_seq(prime_input[-1].unsqueeze(0), hidden)
                # Sample from the network as a multinomial distribution
        output_dist = output.data.view(-1).div(temperature).exp()
        #print(output_dist)
        top_i = torch.multinomial(output_dist, 1)[0]
        #print(top_i)
        # Add predicted character to string and use as next input
        predicted_char = all_characters[top_i]
        predicted += predicted_char
        prime_input = torch.cat([prime_input,char_tensor(predicted_char).squeeze(0)])

    return predicted

## Training loop for net

In [12]:
def time_since(since):
    s = time.time() - since
    m = math.floor(s / 60)
    s -= m * 60
    return '%dm %ds' % (m, s)

###Parameters
n_epochs = 20000
print_every = 100
plot_every = 10
hidden_size = 100
n_layers = 5
lr = 0.005
batch_size = 16
chunk_len = 80

####

model = RNN(n_characters, hidden_size, n_characters, n_layers) #create model
model_optimizer = torch.optim.Adam(model.parameters(), lr=lr) #create Adam optimizer
criterion = nn.CrossEntropyLoss() #chose criterion

start = time.time()
all_losses = []
loss_avg = 0


def train(inp, target):
    """
    Train sequence for one chunk:
    """
    #reset gradients
    model_optimizer.zero_grad() 
    
    # predict output
    output = model(inp)
    
    #compute loss
    loss =  criterion(output.view(batch_size*chunk_len,-1), target.view(-1)) 

    #compute gradients and backpropagate
    loss.backward() 
    model_optimizer.step() 

    return loss.data.item() 



for epoch in range(1, n_epochs + 1):
    loss = train(*random_training_set(chunk_len,batch_size))  #train on one chunk 
    loss_avg += loss

    if epoch % print_every == 0:
        print('[%s (%d %d%%) %.4f]' % (time_since(start), epoch, epoch / n_epochs * 100, loss))
        print(generate(model,'Wh', 100), '\n')
       


    if epoch % plot_every == 0:
        all_losses.append(loss_avg / plot_every)
        loss_avg = 0


[0m 14s (100 0%) 3.0874]
WhaI tOre Te myheleedero h ter f thru t.-sae
sea
moEhhap botedtase.reloan Bs theoG tosnlan u dat, n  t 

[0m 29s (200 1%) 2.9836]
Wh ase pl era I s.
Rt Iatentoran
soton
hl m loom y an lo? mede Uf ch mtan an n thaApusp arstr h ses I
 

[0m 44s (300 1%) 2.6487]
Wh r the me re?
Mleanedon.
A y t.
I me gerlot mecon n;
Is ru he ele th tke ithetil f asob giitheson th 

[0m 59s (400 2%) 2.5859]
Whe t ar t homes or s, od RCis le he-crthon sl VIororlenced

Batorsow cle we athed ond,
Y:
Wotowthoro  

[1m 12s (500 2%) 2.6472]
Whe his as ofolisln ancot ponou me ROS todsis tr, met bimere ltory he pe as macres in lymare d t misot 

[1m 26s (600 3%) 2.5805]
Whs y he.
E: it aknlorve, a hy w.

OThe d t ticuk hitite ans e ilely w e ur ile:
Anodillr has isthe th 

[1m 41s (700 3%) 2.5409]
Whe tas k be f d tor he tod the ithofe ainonco'd lloud d, buthenowandind be t!
CENG: this digcas,
A ba 

[1m 55s (800 4%) 2.5448]
Wh.


NI
OHeripl an car thesoue ono we mesingicee I I b ds ard nd

[15m 25s (6400 32%) 2.4668]
Whon Hent s this s nd awe,-ge ittheresemen he mad tathe se thinthin ing.
Wee thatenge t a art ar, my t 

[15m 39s (6500 32%) 2.3905]
Whed, e pese,
Therd,
Gthe itcos! inoves her ongoul:
Thar! ing co anomees, pu ane ghofosthe's nd th gad 

[15m 52s (6600 33%) 2.4382]
Whetr ithy br bematherendsut than the my t blin man ayoulisuradealot, s h feno t hote thorerelo douse  

[16m 5s (6700 33%) 2.4527]
Whenounod my, f arinone ht al as imyomemen whe froucer s ous the'd oug dathe wo tht ifo m llf thirchal 

[16m 18s (6800 34%) 2.4446]
Whert wrensone jent owin llowou thino,
NNThise

Ans, itrouth g moucon bed s's.
O:
ORERoroucastheed t h 

[16m 31s (6900 34%) 2.4292]
Whencakeare t toupearing's here. ves hourd n.
Sh ourd an seroo on t f s thensthanthe beee.
HI ll we Bu 

[16m 45s (7000 35%) 2.4690]
Wher. s f ghel!

IULOfar LO:
PENu dd theas roumy,
HENGouldere tinelay iowere mas t omy andof rlaiee, h 

[16m 58s (7100 35%) 2.4532]
Whaingheng;


LINENCORI tresirstouchellls,

[94m 42s (12600 63%) 2.4350]
Whee, hy t veme:
TISeno t, seawinon ond le wo t thange uras s f w tespou hinct hayow lt, hy, sat, sthe 

[94m 53s (12700 63%) 2.4890]
Whe hitugothourisce cha sad wisath lderr ff t chayou matouchis be cis,
Ith toter gen go s CERI herald  

[94m 59s (12800 64%) 2.4905]
Who stathinouny VI mano hisstar inore SCotherifonourt thed.

to thiee f t indoutono d t the as s s a t 

[95m 5s (12900 64%) 2.4342]
Whe IOridss akn utowowind bys bre miallord an wates I gu spouss sloule
Lore lllavit re;
LLAUS:
Theer a 



RuntimeError: Sizes of tensors must match except in dimension 0. Expected size 80 but got size 79 for tensor number 3 in the list.

## Visualize loss 

In [None]:
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
%matplotlib inline

plt.figure()
plt.plot(all_losses)

## Try different temperatures

Changing the distribution sharpness has an impact on character sampling:

more or less probable things are sampled

In [None]:
print(generate(model,'T', 200, temperature=1))
print("----")
print(generate(model,'Th', 200, temperature=0.8))
print("----")

print(generate(model,'Th', 200, temperature=0.5))
print("----")

print(generate(model,'Th', 200, temperature=0.3))
print("----")

print(generate(model,'Th', 200, temperature=0.1))

### Improving this code:

(a) Tinker with parameters:

- Is it really necessary to have 100 dims character embeddings
- Chunk length can be gradually increased
- Try changing RNN cell type (GRUs - LSTMs)

(b) Add GPU support to go faster


In [None]:
#Test avec dimension de 50

In [None]:
hidden_size = 50 
chunk_len = 40
model = RNN(n_characters, hidden_size, n_characters, n_layers,rnn_cell = nn.GRU) #create model
model_optimizer = torch.optim.Adam(model.parameters(), lr=lr) #create Adam optimizer
criterion = nn.CrossEntropyLoss() #chose criterion

start = time.time()
all_losses = []
loss_avg = 0


def train(inp, target):
    """
    Train sequence for one chunk:
    """
    #reset gradients
    model_optimizer.zero_grad() 
    
    # predict output
    output = model(inp)
    
    #compute loss
    loss =  criterion(output.view(batch_size*chunk_len,-1), target.view(-1)) 

    #compute gradients and backpropagate
    loss.backward() 
    model_optimizer.step() 

    return loss.data.item() 



for epoch in range(1, n_epochs + 1):
    loss = train(*random_training_set(chunk_len,batch_size))  #train on one chunk 
    loss_avg += loss

    if epoch % print_every == 0:
        print('[%s (%d %d%%) %.4f]' % (time_since(start), epoch, epoch / n_epochs * 100, loss))
        print(generate(model,'Wh', 100), '\n')
       


    if epoch % plot_every == 0:
        all_losses.append(loss_avg / plot_every)
        loss_avg = 0


In [None]:
plt.figure()
plt.plot(all_losses)