# Introducción

En el presente laboratorio se solicitó entrenar un *Character-level Languaje Model* basado en *Recurrent Neural Network* (RNN) sobre un conjunto de textos, que en este caso es el libro *War and Peace* del escritor ruso Leo Tolstoy, publicada en 1869.

El objetivo del laboratorio es usar el modelo generado para predecir y generar texto completamente nuevo en base del original, para su posterior análisis.

Dicho libro se encontrará en formato **.txt** con un peso de 3.3 MB, siguiendo la sugerencia de tener como mínimo un tamaño de 2 MB, para poder generar un modelo aceptable.

* Recurrent Neural Network (RNN): https://chunml.github.io/ChunML.github.io/project/Creating-Text-Generator-Using-Recurrent-Neural-Network/
* DataSet: https://cs.stanford.edu/people/karpathy/char-rnn/

## Índice
1. Instalación de Librerías
2. Procesamiento
3. Entrenamiento y Prueba
4. Generación de texto

## Ejecución

* Desde 0: Puntos 1-4
* Entrenar: Puntos 1, 2, **Entrenamiento** de 3, 4
* Generar texto: Puntos 1, 2 y 4

# 1. Instalación de Librerías

## Windows

Usando Anaconda Prompt se debe usar los siguientes comandos para importar la librería de Keras. Entrar en modo administrador a Anaconda Prompt e introducir los siguientes comandos.

```conda update conda ```
<br>```conda install keras ```

## Linux

Usar los siguientes comandos.

In [None]:
#!pip install cython --user
#!pip install --force-reinstall regex==2017.04.5
#!pip install pathlib --user
#!pip install msgpack --user
!pip install tensorflow-gpu --user
!pip install keras --user

## Verificación

Pruebo la correcta importación de librerías.

In [2]:
import tensorflow as tf
hello = tf.constant("Hello, TF!")
sess = tf.Session()
print(sess.run(hello))

  from ._conv import register_converters as _register_converters


b'Hello, TF!'


In [3]:
a = tf.constant(10)
b = tf.constant(32)
print(sess.run(a + b))

42


In [4]:
import keras

Using TensorFlow backend.


# 2. Procesamiento

## Definición de Variables

In [5]:
from __future__ import print_function
import matplotlib.pyplot as plt
import numpy as np
import time
import csv
from keras.models import Sequential
from keras.layers.core import Dense, Activation, Dropout
from keras.layers.recurrent import LSTM, SimpleRNN
from keras.layers.wrappers import TimeDistributed
import pickle

In [6]:
#Archivo de texto 
DATA_DIR = "./warpeace_input.txt" 
#Modificar BATCH_SIZE o HIDDEN_DIM en caso tengan problemas de memoria
BATCH_SIZE = 50 
HIDDEN_DIM = 250 #500
#Parametro para longitud de secuencia a analizar
SEQ_LENGTH = 50 
#Parametro para cargar un pesos previamente entrenados (checkpoint)
WEIGHTS = '' 

#Parametro para indicar cuantos caracteres generar en cada prueba
GENERATE_LENGTH = 500 
#Parametros para la red neuronal
LAYER_NUM = 2 
NB_EPOCH = 20

## Definición de Funciones

### Función A
(1) Carga de un archivo de texto, (2) Construcción de estructuras de entrada y salida de la red.

In [9]:
# method for preparing the training data
def load_data(data_dir, seq_length):
    #Carga del archivo
    data = open(data_dir, 'r').read()
    #Caracteres unicos
    chars = list(set(data))
    VOCAB_SIZE = len(chars)

    print('Data length: {} characters'.format(len(data)))
    print('Vocabulary size: {} characters'.format(VOCAB_SIZE))
    print(chars)
    
    #Indexacion de los caracteres
    ix_to_char = {ix:char for ix, char in enumerate(chars)}
    char_to_ix = {char:ix for ix, char in enumerate(chars)}
    
    #Estructuras de entrada y salida
    NUMBER_OF_SEQ = int(len(data)/seq_length)
    print('Number of sequences: {}'.format(NUMBER_OF_SEQ))
    X = np.zeros((NUMBER_OF_SEQ, seq_length, VOCAB_SIZE))
    y = np.zeros((NUMBER_OF_SEQ, seq_length, VOCAB_SIZE))
    
    for i in range(0, NUMBER_OF_SEQ):
        #LLenado de la estructura de entrada X
        X_sequence = data[i*seq_length:(i+1)*seq_length]
        X_sequence_ix = [char_to_ix[value] for value in X_sequence]
        #one-hot-vector (input)
        input_sequence = np.zeros((seq_length, VOCAB_SIZE))  
        #uso del diccionario para completar el one-hot-vector
        for j in range(seq_length):
            input_sequence[j][X_sequence_ix[j]] = 1.
            X[i] = input_sequence
            
        #Llenado de la estructura de salida y
        y_sequence = data[i*seq_length+1:(i+1)*seq_length+1]
        y_sequence_ix = [char_to_ix[value] for value in y_sequence]
        #one-hot-vector (output)
        target_sequence = np.zeros((seq_length, VOCAB_SIZE))
        #uso del diccionario para completar el one-hot-vector
        for j in range(seq_length):
            target_sequence[j][y_sequence_ix[j]] = 1.
            y[i] = target_sequence
            
    return X, y, VOCAB_SIZE, ix_to_char

### Función B
Generación de textos.

In [10]:
# method for generating text
def generate_text(model, length, vocab_size, ix_to_char):
    # starting with random character
    ix = [np.random.randint(vocab_size)]
    y_char = [ix_to_char[ix[-1]]]
    X = np.zeros((1, length, vocab_size))
    for i in range(length):
        # appending the last predicted character to sequence
        X[0, i, :][ix[-1]] = 1
        print(ix_to_char[ix[-1]], end="")
        ix = np.argmax(model.predict(X[:, :i+1, :])[0], 1)
        y_char.append(ix_to_char[ix[-1]])
    return ('').join(y_char)

### Función C
Obtener el tamaño del vocabulario.

In [21]:
def vocab_size(data_dir):
    #Carga del archivo
    data = open(data_dir, 'r').read()
    #Caracteres unicos
    chars = list(set(data))
    return len(chars)

# 3. Entrenamiento y Prueba

## Creación de Diccionario

*** ADVERTENCIA: NO EJECUTAR ESTA SECCIÓN SI ES QUE YA EXISTEN CHECKPOINTS Y IX_TO_CHAR ***

Uso de la Función A: carga de los datos.

In [11]:
# Creating training data
X, y, VOCAB_SIZE, ix_to_char = load_data(DATA_DIR, SEQ_LENGTH)

Data length: 3196232 characters
Vocabulary size: 86 characters
['C', 'o', 'p', 'h', 'e', 'c', 'x', 'q', '(', '6', '0', '4', '.', 'O', 'K', 'Z', '¿', 'L', '!', 'Y', 'N', 'j', '*', '©', 'z', 'A', 's', 'D', '"', '5', 'a', '2', 'B', 'G', '9', 'k', 'S', 'I', 'l', 'Q', 'ï', '3', 'H', 'F', ')', 'P', 'U', '¤', '»', '1', 'R', 'T', '-', 'Ã', "'", '=', 'X', 'i', 'v', 'J', 'ª', '8', ' ', 'w', '7', 'r', '?', 'E', ',', '\xa0', ';', 'y', 'W', 'g', 'b', ':', 'f', 'V', '\n', 'n', 'd', 'u', 't', '/', 'M', 'm']
Number of sequences: 63924


Es importante guardar el diccionario `ix_to_char` en un archivo binario. Este debe ser cargado cada vez que se quiera retomar el entrenamiento o generar texto a partir de un *checkpoint*, debido a que el orden de los caracteres en el diccionario podría modificarse (no es un orden fijo).

***NO MODIFICAR ESTE PICKLE AL REINICIAR EL NOTEBOOK PARA PROBAR CHECKPOINTS***

In [11]:
#No modificar el pickle al reiniciar el cuaderno de trabajo para probar checkpoints previos
with open('ix_to_char.pickle', 'wb') as handle:
    pickle.dump(ix_to_char, handle, protocol=pickle.HIGHEST_PROTOCOL)

In [12]:
print(ix_to_char)

{0: '.', 1: '2', 2: 'v', 3: 'a', 4: 'B', 5: '!', 6: '(', 7: 'W', 8: 'u', 9: 'q', 10: 'o', 11: 'U', 12: 'b', 13: 'D', 14: 'z', 15: 'f', 16: 'n', 17: '¤', 18: ' ', 19: 'r', 20: '?', 21: 'C', 22: 't', 23: '9', 24: '/', 25: 'k', 26: 'Q', 27: 'S', 28: '\xa0', 29: 'x', 30: 'Z', 31: 'L', 32: '»', 33: '©', 34: 'N', 35: 'K', 36: '*', 37: 'ï', 38: "'", 39: 'y', 40: 'I', 41: '¿', 42: 'l', 43: ';', 44: ':', 45: 's', 46: 'V', 47: 'A', 48: 'g', 49: 'G', 50: '0', 51: 'i', 52: 'Ã', 53: '4', 54: 'Y', 55: 'M', 56: '=', 57: 'H', 58: '1', 59: 'w', 60: 'j', 61: '6', 62: '"', 63: 'P', 64: '3', 65: '-', 66: '5', 67: 'd', 68: 'm', 69: 'J', 70: ',', 71: '8', 72: 'h', 73: 'T', 74: 'F', 75: '\n', 76: 'c', 77: 'p', 78: ')', 79: 'X', 80: 'ª', 81: 'O', 82: 'R', 83: 'e', 84: '7', 85: 'E'}


In [12]:
print(X.shape, y.shape, VOCAB_SIZE)

(63924, 50, 86) (63924, 50, 86) 86


## Entrenamiento

*** ADVERTENCIA: EJECUTAR DESDE ACÁ PARA ENTRENAR AL MODELO ***

### Creación de la RNN (LSTM)

In [26]:
VOCAB_SIZE = vocab_size(DATA_DIR)

In [28]:
# Creating and compiling the Network
model = Sequential()

#Añadiendo las capas LSTM
model.add(LSTM(HIDDEN_DIM, input_shape=(None, VOCAB_SIZE), return_sequences=True))
for i in range(LAYER_NUM - 1):
    model.add(LSTM(HIDDEN_DIM, return_sequences=True))
#Añadiendo la operacion de salida
model.add(TimeDistributed(Dense(VOCAB_SIZE)))
model.add(Activation('softmax'))

#"Compilando" = instanciando la RNN con su función de pérdida y optimización
model.compile(loss="categorical_crossentropy", optimizer="rmsprop")

In [29]:
# Generate some sample before training to know how bad it is!
generate_text(model, GENERATE_LENGTH, VOCAB_SIZE, ix_to_char)

c"00JJ6666//(88888888888Zk//8888888k///88888:::::777LLLLLMMMJJJJOO(h6666//((8888888888Zk//8888888k///88888:::::777LLLLLMMMJJJJOO(h6666//((8888888888Zk//8888888k///88888:::::777LLLLLMMMJJJJOO(h6666//((8888888888Zk//8888888k///88888:::::777LLLLLMMMJJJJOO(h6666//((8888888888Zk//8888888k///88888:::::777LLLLLMMMJJJJOO(h6666//((8888888888Zk//8888888k///88888:::::777LLLLLMMMJJJJOO(h6666//((8888888888Zk//8888888k///88888:::::777LLLLLMMMJJJJOO(h6666//((8888888888Zk//8888888k///88888:::::777LLLLLMMMJJJJOO

'c"00JJ6666//(88888888888Zk//8888888k///88888:::::777LLLLLMMMJJJJOO(h6666//((8888888888Zk//8888888k///88888:::::777LLLLLMMMJJJJOO(h6666//((8888888888Zk//8888888k///88888:::::777LLLLLMMMJJJJOO(h6666//((8888888888Zk//8888888k///88888:::::777LLLLLMMMJJJJOO(h6666//((8888888888Zk//8888888k///88888:::::777LLLLLMMMJJJJOO(h6666//((8888888888Zk//8888888k///88888:::::777LLLLLMMMJJJJOO(h6666//((8888888888Zk//8888888k///88888:::::777LLLLLMMMJJJJOO(h6666//((8888888888Zk//8888888k///88888:::::777LLLLLMMMJJJJOO('

### Generación de Modelo

Se cargan los pesos (y el diccionario de los one-hot-vectors) en caso haya habido un entrenamiento previo. WEIGHTS debe tener el valor del nombre del archivo de "checkpoint" guardado.

Por ejemplo: ```WEIGHTS = "checkpoint_layer_2_hidden_250_epoch_60.hdf5"```

In [None]:
#Se cargan los pesos de un entrenamiento previo (si se desea restaurar una ejecucion)
#Se calcula el numero de epocas en base al nombre del archivo
#Se carga el diccionario de caracteres (one-hot-vectors) para la generacion

### WEIGHTS = "checkpoint_layer_2_hidden_250_epoch_60.hdf5" 

if not WEIGHTS == '':
    model.load_weights(WEIGHTS)
    nb_epoch = int(WEIGHTS[WEIGHTS.rfind('_') + 1:WEIGHTS.find('.')])
    with open('ix_to_char.pickle', 'rb') as handle:
        ix_to_char = pickle.load(handle)
else:
    #Si se va a empezar de 0:
    nb_epoch = 0

In [20]:
# Training if there is no trained weights specified

#Esta es la iteración importante
#Pueden cambiar la condición para que termine en un determinado numero de epochs.
while True:
    print('\n\nEpoch: {}\n'.format(nb_epoch))
    #Ajuste del modelo, y entrenamiento de 1 epoca
    model.fit(X, y, batch_size=BATCH_SIZE, verbose=1, epochs=1)
    nb_epoch += 1
    #Generacion de un texto al final de la epoca
    generate_text(model, GENERATE_LENGTH, VOCAB_SIZE, ix_to_char)
    #Pueden modificar esto para tener más checkpoints
    if nb_epoch % 10 == 0:
        model.save_weights('checkpoint_layer_{}_hidden_{}_epoch_{}.hdf5'.format(LAYER_NUM, HIDDEN_DIM, nb_epoch))
    if nb_epoch == 100:
        break



Epoch: 0

Epoch 1/1
and the street was the same time the street was the same time the street was the same time the street was the same time the street was the same time the street was the same time the street was the same time the street was the same time the street was the same time the street was the same time the street was the same time the street was the same time the street was the same time the street was the same time the street was the same time the street was the same time the street was the same time the

Epoch: 1

Epoch 1/1
7 the servants were standing at her son. He saw that he was standing and seemed to her son and she had seemed to her son and she had seemed to her son and she had seemed to her son and she had seemed to her son and she had seemed to her son and she had seemed to her son and she had seemed to her son and she had seemed to her son and she had seemed to her son and she had seemed to her son and she had seemed to her son and she had seemed to her son and s

I am sorry for your heart and the same thing is the same thing is the same thing is the same time and the same thing is the same time the same thing is the same time the same thing is the same time the same thing is the same time the same thing is the same time the same thing is the same time the same thing is the same time the same thing is the same time the same thing is the same time the same thing is the same time the same thing is the same time the same thing is the same time the same thing

Epoch: 14

Epoch 1/1
0 the count was already and again and still looking at him with a smile.

"What does it matter to your honor?" he asked his son and saw the state of the commander-in-chief.

"What does it matter to your honor?" he asked his son and saw the state of the commander-in-chief.

"What does it matter to your honor?" he asked his son and saw the state of the commander-in-chief.

"What does it matter to your honor?" he asked his son and saw the state of the commander-in-chief.

"Wh

re the whole army, and the same thing that had been at the same time he was always the same thing that he was always the same to her. The countess was standing and shouting and shouting at his stay in the same way and went out of the room and stopped him and shouted at him and said that he was always distinguished at the same time he was always the same thing that he was always the same to her. The countess was standing and shouting and shouting at his stay in the same way and went out of the ro

Epoch: 27

Epoch 1/1
» the count and the countess was struck by the countess and the countess was struck by the countess and the countess was struck by the countess and the countess was struck by the countess and the countess was struck by the countess and the countess was struck by the countess and the countess was struck by the countess and the countess was struck by the countess and the countess was struck by the countess and the countess was struck by the countess and the countess was stru

, and the countess was a great deal of mutual tone of the contrary to the right of the countess' handsome, and the sound of the consciousness of the contradiction of the contradiction of the contradiction of the contradiction of the contradiction of the contradiction of the contradiction of the contradiction of the contradiction of the contradiction of the contradiction of the contradiction of the contradiction of the contradiction of the contradiction of the contradiction of the contradiction o

Epoch: 40

Epoch 1/1
he same step toward the door of the study to the right and saw the soldiers who had been a secret sitting room with a smile.

"Who do you think? What do you think? Who are you?" said the countess.

"I have the honor to the Emperor to the countess and the countess all right?" he asked.

"What is it? What?" asked the countess.

"What are you staying at my brother is the more destruction of the man who do not know what I say is in the least and your honor to anyone?" he thoug

"It's not the same time is the same time in the fact that the conception of the people had been in a fairy step to the sofa in a subtle smile.

"What a true young man was to explain the case of the conception of the facts an

Epoch: 53

Epoch 1/1
person of the world, and the sounds of the commander-in-chief's staff officer and so on a soldier who was still as soon as the soldiers who were not only the same as a commander-in-chief's staff officer and so on a soldier who was still as soon as the soldiers who were not only the same as a commander-in-chief's staff officer and so on a soldier who was still as soon as the soldiers who were not only the same as a commander-in-chief's staff officer and so on a soldier who was still as soon as th

Epoch: 54

Epoch 1/1
Mary had been an instant the same thing that the count was not a case of the service, and the soldiers shouted at his stern and the same thing that the count was not a case of the service, and the soldiers shouted at his steps. "T

But the conversation was not a single to the countess' hands and smiled at him and told him to the sound of the commanders and the commanders of the commanders and the commanders of the commanders to the same time to the sound of the commanders and the commanders of the commanders and the commanders of the commanders to the same time to the sound of the commanders and the commanders of the commanders and the commanders of the commanders to the same time to the sound of the commanders and the com

Epoch: 67

Epoch 1/1
¿quarters were being speaking of the countess and the countess was to be a great desireur) of the contrary they could not have been the carts to be at the same time and the countess was to be a great deal of meadows the same at a distance the carts were being seen at the same time and the staff officer was not only the same and seemed to him that the prince and the countess was to be a great deal of meadows the same at a distance from the countess' little fight because the

ut the princess and the sound of the staff of his side (the conversation was at the same time and the countess was to be a second time and that she was already sixteen the position of the condition and the same feeling he had seen and as if to say that the count was already so much in love with the countess was to be done by the fact that he was already stopped by the countess and went to the door of the room and to his suite who sat down on the sofa, she was silent.

"I am very glad to make you

Epoch: 80

Epoch 1/1
" said Princess Mary, "and then I want to see you," said the count, and went on tiptoe to himself and took his hand to the glow soldier.

The count was about to see her and then the sound of the service was being carried out of the carriage. "It was a bit of influence to the Emperor Alexander and that the count was always the same to see her and then the sound of the servants to leave the room with a smile of people who were carrying him to himself and the sound of the ser

Zherkov to the left flank of the country and strangely fixed one and wounded and shouting and shouting and shouting and shouting and shouting and shouting and shouting and shouting and shouting and shouting and shouting and shouting and shouting and shouting and shouting and shouting and shouting and shouting and shouting and shouting and shouting and shouting and shouting and shouting and shouting and shouting and shouting and shouting and shouting and shouting and shouting and shouting and sho

Epoch: 93

Epoch 1/1
question to him and the same thing he had not yet seen that the same thing had been at last dared to take it away; and the same salt le promise very good softened and seemed to him and the sound of the soldiers were convinced that he was already attacking the countess' house in the same state of abuse him at the same time that had been at last dared to see him in the same state of the soldiers who were all conversations as the conversation with the soldiers were dullt to t

# 4. Generación de texto

Si instancian el modelo y sus parametros (ejecutando algunas celdas preliminares), y tienen los 2 archivos requeridos (.pickle y .hdf5) pueden generar el texto.
Si usted va a cargar defrente un *checkpoint*, ejecutar los siguientes 2 módulos.

In [22]:
VOCAB_SIZE = vocab_size(DATA_DIR)

In [24]:
# Creating and compiling the Network
model = Sequential()

#Añadiendo las capas LSTM
model.add(LSTM(HIDDEN_DIM, input_shape=(None, VOCAB_SIZE), return_sequences=True))
for i in range(LAYER_NUM - 1):
    model.add(LSTM(HIDDEN_DIM, return_sequences=True))
#Añadiendo la operacion de salida
model.add(TimeDistributed(Dense(VOCAB_SIZE)))
model.add(Activation('softmax'))

#"Compilando" = instanciando la RNN con su función de pérdida y optimización
model.compile(loss="categorical_crossentropy", optimizer="rmsprop")

Generar texto dependiendo del WEIGHTS.

In [37]:
#Cuidar de no reemplazar el pickle original
WEIGHTS = 'checkpoint_layer_2_hidden_250_epoch_100.hdf5'
nb_epoch = int(WEIGHTS[WEIGHTS.rfind('_') + 1:WEIGHTS.find('.')])
# Loading the trained weights
model.load_weights(WEIGHTS)
with open('ix_to_char.pickle', 'rb') as handle:
    ix_to_char = pickle.load(handle)
generate_text(model, GENERATE_LENGTH, VOCAB_SIZE, ix_to_char)
print('\n\n')

/Bª"RaB"QRZcc"!ZvRZHX"PZQ"QHRRHXT"ZR"RaB"QR-BBRQ"R2"RaB"QR-BBRQ"ZXª"RaB"Q2OXª"2c"RaB"QRZcc"2ccH!B-"Pa2"PZQ"FBHXT"QBXR"R2"RaB"'BcRb"iaB"!2OXR"PZQ"ZQ"Hc"RaB"QZEB"RaHXT"RaZR"aZª"FBBX"ZF'B"R2"ªBcBXª"RaB"!2X1B-QZRH2Xm"ZXª"RaB"Q2OXª"2c"RaB"QRZcc"2c"RaB"QR2-8"ZXª"RaB"!2OXR"PZQ"ZQKBª"RaB"QR-BXTRa"2c"RaB"!2X128"2c"P2OXªBª"ZXª"RaB"Q2OXª"2c"RaB"QRZcc"2c"RaB"QR2-8"ZXª"RaB"!2OXR"PZQ"ZQKBª"RaB"QR-BXTRa"2c"RaB"!2X128"2c"P2OXªBª"ZXª"RaB"Q2OXª"2c"RaB"QRZcc"2c"RaB"QR2-8"ZXª"RaB"!2OXR"PZQ"ZQKBª"RaB"QR-BXTRa"2c"RaB




In [38]:
nb_epoch

100