## Instalación de Librerías

Usando Anaconda Prompt se debe usar los siguientes comandos para importar la librería de Keras.
<br>Link: https://medium.com/@pushkarmandot/installing-tensorflow-theano-and-keras-in-spyder-84de7eb0f0df
<br>TensorFlow: https://www.tensorflow.org/install/

```conda create -n tensorflow-gpu pip python=3.5```
<br>```conda activate tensorflow-gpu```
<br>```conda install keras ```

Otra opción es la siguiente:
<br>Entrar en modo administrador a Anaconda Prompt e introducir los siguientes comandos.
<br>```conda update conda ```
<br>```conda install keras ```

Si está usando Linux, usar los siguientes comandos.

In [5]:
!pip install cython --user
!pip install --force-reinstall regex==2017.04.5
!pip install pathlib --user
!pip install msgpack --user
!pip install tensorflow-gpu --user
!pip install keras --user

Collecting cython
  Downloading https://files.pythonhosted.org/packages/90/3e/8fb8aacc6eef05b2c80ff46f02c850d41ea01dc43eb539c45aa2b783b2d2/Cython-0.28.3-cp35-cp35m-win_amd64.whl (2.4MB)
Installing collected packages: cython
Successfully installed cython-0.28.3


  The scripts cygdb.exe, cython.exe and cythonize.exe are installed in 'C:\Users\alirapal\AppData\Roaming\Python\Python35\Scripts' which is not on PATH.


Collecting regex==2017.04.5
  Downloading https://files.pythonhosted.org/packages/ef/66/e1c7a49068bc9fae46b3acb9b2c6f4cbb095c2e835c00de1ff12c82553ed/regex-2017.04.05-cp35-none-win_amd64.whl (243kB)
Installing collected packages: regex
Successfully installed regex-2017.4.5
Collecting pathlib
  Downloading https://files.pythonhosted.org/packages/ac/aa/9b065a76b9af472437a0059f77e8f962fe350438b927cb80184c32f075eb/pathlib-1.0.1.tar.gz (49kB)
Building wheels for collected packages: pathlib
  Running setup.py bdist_wheel for pathlib: started
  Running setup.py bdist_wheel for pathlib: finished with status 'done'
  Stored in directory: C:\Users\alirapal\AppData\Local\pip\Cache\wheels\f9\b2\4a\68efdfe5093638a9918bd1bb734af625526e849487200aa171
Successfully built pathlib
Installing collected packages: pathlib
Successfully installed pathlib-1.0.1
Collecting msgpack
  Downloading https://files.pythonhosted.org/packages/9a/4f/7c1188ff64148b36d0d7ddeaba0f6e8e2fb7a46cd942f3543420b714a89f/msgpack-0.5.

  The scripts freeze_graph.exe, saved_model_cli.exe, tensorboard.exe, toco.exe and toco_from_protos.exe are installed in 'C:\Users\alirapal\AppData\Roaming\Python\Python35\Scripts' which is not on PATH.




Añadir en el path las librerias importadas si es que estás en Windows:

``` set path=%PATH%;C:\Users\Alvaro\AppData\Roaming\Python\Python35\Scripts ```

Pruebo la correcta importación de librerías.

In [1]:
import tensorflow as tf
hello = tf.constant("Hello, TF!")
sess = tf.Session()
print(sess.run(hello))

  from ._conv import register_converters as _register_converters


b'Hello, TF!'


In [2]:
a = tf.constant(10)
b = tf.constant(32)
print(sess.run(a + b))

42


In [3]:
import keras

Using TensorFlow backend.


## Procesamiento

In [4]:
from __future__ import print_function
import matplotlib.pyplot as plt
import numpy as np
import time
import csv
from keras.models import Sequential
from keras.layers.core import Dense, Activation, Dropout
from keras.layers.recurrent import LSTM, SimpleRNN
from keras.layers.wrappers import TimeDistributed
import pickle

In [5]:
#Archivo de texto 
DATA_DIR = "./lotr.txt" 
#Modificar BATCH_SIZE o HIDDEN_DIM en caso tengan problemas de memoria
BATCH_SIZE = 50 
HIDDEN_DIM = 250 #500
#Parametro para longitud de secuencia a analizar
SEQ_LENGTH = 50 
#Parametro para cargar un pesos previamente entrenados (checkpoint)
WEIGHTS = '' 

#Parametro para indicar cuantos caracteres generar en cada prueba
GENERATE_LENGTH = 500 
#Parametros para la red neuronal
LAYER_NUM = 2 
NB_EPOCH = 20

**Función A:
<br>(1) Carga de un archivo de texto, (2) Construcción de estructuras de entrada y salida de la red**

In [6]:
# method for preparing the training data
def load_data(data_dir, seq_length):
    #Carga del archivo
    data = open(data_dir, 'r').read()
    #Caracteres unicos
    chars = list(set(data))
    VOCAB_SIZE = len(chars)

    print('Data length: {} characters'.format(len(data)))
    print('Vocabulary size: {} characters'.format(VOCAB_SIZE))
    print(chars)
    
    #Indexacion de los caracteres
    ix_to_char = {ix:char for ix, char in enumerate(chars)}
    char_to_ix = {char:ix for ix, char in enumerate(chars)}
    
    #Estructuras de entrada y salida
    NUMBER_OF_SEQ = int(len(data)/seq_length)
    print('Number of sequences: {}'.format(NUMBER_OF_SEQ))
    X = np.zeros((NUMBER_OF_SEQ, seq_length, VOCAB_SIZE))
    y = np.zeros((NUMBER_OF_SEQ, seq_length, VOCAB_SIZE))
    
    for i in range(0, NUMBER_OF_SEQ):
        #LLenado de la estructura de entrada X
        X_sequence = data[i*seq_length:(i+1)*seq_length]
        X_sequence_ix = [char_to_ix[value] for value in X_sequence]
        #one-hot-vector (input)
        input_sequence = np.zeros((seq_length, VOCAB_SIZE))  
        #uso del diccionario para completar el one-hot-vector
        for j in range(seq_length):
            input_sequence[j][X_sequence_ix[j]] = 1.
            X[i] = input_sequence
            
        #Llenado de la estructura de salida y
        y_sequence = data[i*seq_length+1:(i+1)*seq_length+1]
        y_sequence_ix = [char_to_ix[value] for value in y_sequence]
        #one-hot-vector (output)
        target_sequence = np.zeros((seq_length, VOCAB_SIZE))
        #uso del diccionario para completar el one-hot-vector
        for j in range(seq_length):
            target_sequence[j][y_sequence_ix[j]] = 1.
            y[i] = target_sequence
            
    return X, y, VOCAB_SIZE, ix_to_char

**Función B:
<br>Generación de textos**

In [7]:
# method for generating text
def generate_text(model, length, vocab_size, ix_to_char):
    # starting with random character
    ix = [np.random.randint(vocab_size)]
    y_char = [ix_to_char[ix[-1]]]
    X = np.zeros((1, length, vocab_size))
    for i in range(length):
        # appending the last predicted character to sequence
        X[0, i, :][ix[-1]] = 1
        print(ix_to_char[ix[-1]], end="")
        ix = np.argmax(model.predict(X[:, :i+1, :])[0], 1)
        y_char.append(ix_to_char[ix[-1]])
    return ('').join(y_char)

## Entrenamiento y Prueba

**Uso de la Función A: carga de los datos**

In [8]:
# Creating training data
X, y, VOCAB_SIZE, ix_to_char = load_data(DATA_DIR, SEQ_LENGTH)

Data length: 3262172 characters
Vocabulary size: 99 characters
['S', '(', '\n', '3', 'a', 'r', ':', 'B', 'e', '-', 'K', 'ó', 'k', '1', '«', 'm', '=', 'd', 'u', '}', '2', 'z', 'X', '8', 'N', 'g', '>', '—', '`', 'G', 'c', '9', 'w', 'T', 'l', '…', 'C', 'F', 'U', 'I', 'W', '4', '–', '»', '¤', '7', 'o', '*', 'b', '0', 'µ', '’', '#', 'y', '¢', '_', 'v', 'j', 'P', 'M', 'Z', '.', '‘', 'H', 'x', 'p', '6', '5', 'A', '!', ')', '"', 'V', ' ', '<', 'D', '?', 'Q', '¥', '/', 'i', '‚', 'h', 'L', 'R', 'Y', ',', '®', '&', 'q', "'", 'J', 's', 'E', 'n', ';', 't', 'O', 'f']
Number of sequences: 65243


**Es importante guardar el diccionario `ix_to_char` en un archivo binario. Este debe ser cargado cada vez que se quiera retomar el entrenamiento o generar texto a partir de un checkpoint, debido a que el orden de los caracteres en el diccionario podría modificarse (no es un orden fijo)**
<br>**NO MODIFICAR ESTE PICKLE AL REINICIAR EL NOTEBOOK PARA PROBAR CHECKPOINTS**

In [9]:
#No modificar el pickle al reiniciar el cuaderno de trabajo para probar checkpoints previos
with open('ix_to_char.pickle', 'wb') as handle:
    pickle.dump(ix_to_char, handle, protocol=pickle.HIGHEST_PROTOCOL)

In [10]:
print(ix_to_char)

{0: 'S', 1: '(', 2: '\n', 3: '3', 4: 'a', 5: 'r', 6: ':', 7: 'B', 8: 'e', 9: '-', 10: 'K', 11: 'ó', 12: 'k', 13: '1', 14: '«', 15: 'm', 16: '=', 17: 'd', 18: 'u', 19: '}', 20: '2', 21: 'z', 22: 'X', 23: '8', 24: 'N', 25: 'g', 26: '>', 27: '—', 28: '`', 29: 'G', 30: 'c', 31: '9', 32: 'w', 33: 'T', 34: 'l', 35: '…', 36: 'C', 37: 'F', 38: 'U', 39: 'I', 40: 'W', 41: '4', 42: '–', 43: '»', 44: '¤', 45: '7', 46: 'o', 47: '*', 48: 'b', 49: '0', 50: 'µ', 51: '’', 52: '#', 53: 'y', 54: '¢', 55: '_', 56: 'v', 57: 'j', 58: 'P', 59: 'M', 60: 'Z', 61: '.', 62: '‘', 63: 'H', 64: 'x', 65: 'p', 66: '6', 67: '5', 68: 'A', 69: '!', 70: ')', 71: '"', 72: 'V', 73: ' ', 74: '<', 75: 'D', 76: '?', 77: 'Q', 78: '¥', 79: '/', 80: 'i', 81: '‚', 82: 'h', 83: 'L', 84: 'R', 85: 'Y', 86: ',', 87: '®', 88: '&', 89: 'q', 90: "'", 91: 'J', 92: 's', 93: 'E', 94: 'n', 95: ';', 96: 't', 97: 'O', 98: 'f'}


In [11]:
print(X.shape, y.shape, VOCAB_SIZE)

(65243, 50, 99) (65243, 50, 99) 99


### Creación de la RNN (LSTM)

In [12]:
# Creating and compiling the Network
model = Sequential()

#Añadiendo las capas LSTM
model.add(LSTM(HIDDEN_DIM, input_shape=(None, VOCAB_SIZE), return_sequences=True))
for i in range(LAYER_NUM - 1):
    model.add(LSTM(HIDDEN_DIM, return_sequences=True))
#Añadiendo la operacion de salida
model.add(TimeDistributed(Dense(VOCAB_SIZE)))
model.add(Activation('softmax'))

#"Compilando" = instanciando la RNN con su función de pérdida y optimización
model.compile(loss="categorical_crossentropy", optimizer="rmsprop")

In [13]:
# Generate some sample before training to know how bad it is!
generate_text(model, GENERATE_LENGTH, VOCAB_SIZE, ix_to_char)

;Hn88bkkbrrrrWWWWC¥¥¥¥¥««RRR###





//////-----------HHHHHH––PPooWzzWWWWVVGGGGGGGEEHHHSSM?????333333333333gIIggI«==‘MM‘>>>>&Ig"""""‘WWVVWGGGGGGGEEHHHSSM?????333333333333gIIggI«==‘MM‘>>>>&Ig"""""‘WWVVWGGGGGGGEEHHHSSM?????333333333333gIIggI«==‘MM‘>>>>&Ig"""""‘WWVVWGGGGGGGEEHHHSSM?????333333333333gIIggI«==‘MM‘>>>>&Ig"""""‘WWVVWGGGGGGGEEHHHSSM?????333333333333gIIggI«==‘MM‘>>>>&Ig"""""‘WWVVWGGGGGGGEEHHHSSM?????333333333333gIIggI«==‘MM‘>>>>&Ig"""""‘WWVVWGGGGGGGEEHHHSSM?????333333333333gIIggI«==‘MM‘>>

';Hn88bkkbrrrrWWWWC¥¥¥¥¥««RRR###\n\n\n\n\n\n//////-----------HHHHHH––PPooWzzWWWWVVGGGGGGGEEHHHSSM?????333333333333gIIggI«==‘MM‘>>>>&Ig"""""‘WWVVWGGGGGGGEEHHHSSM?????333333333333gIIggI«==‘MM‘>>>>&Ig"""""‘WWVVWGGGGGGGEEHHHSSM?????333333333333gIIggI«==‘MM‘>>>>&Ig"""""‘WWVVWGGGGGGGEEHHHSSM?????333333333333gIIggI«==‘MM‘>>>>&Ig"""""‘WWVVWGGGGGGGEEHHHSSM?????333333333333gIIggI«==‘MM‘>>>>&Ig"""""‘WWVVWGGGGGGGEEHHHSSM?????333333333333gIIggI«==‘MM‘>>>>&Ig"""""‘WWVVWGGGGGGGEEHHHSSM?????333333333333gIIggI«==‘MM‘>>>'

**Se cargan los pesos (y el diccionario de los one-hot-vectors) en caso haya habido un entrenamiento previo**
<br>WEIGHTS debe tener el valor del nombre del archivo de "checkpoint" guardado. Por ejemplo:
<br>```WEIGHTS = "checkpoint_layer_2_hidden_250_epoch_60.hdf5"```

In [14]:
#Se cargan los pesos de un entrenamiento previo (si se desea restaurar una ejecucion)
#Se calcula el numero de epocas en base al nombre del archivo
#Se carga el diccionario de caracteres (one-hot-vectors) para la generacion
if not WEIGHTS == '':
    model.load_weights(WEIGHTS)
    nb_epoch = int(WEIGHTS[WEIGHTS.rfind('_') + 1:WEIGHTS.find('.')])
    with open('ix_to_char.pickle', 'rb') as handle:
        ix_to_char = pickle.load(handle)
else:
    #Si se va a empezar de 0:
    nb_epoch = 0

### Entrenamiento

In [None]:
# Training if there is no trained weights specified

#Esta es la iteración importante
#Pueden cambiar la condición para que termine en un determinado numero de epochs.
while True:
    print('\n\nEpoch: {}\n'.format(nb_epoch))
    #Ajuste del modelo, y entrenamiento de 1 epoca
    model.fit(X, y, batch_size=BATCH_SIZE, verbose=1, epochs=1)
    nb_epoch += 1
    #Generacion de un texto al final de la epoca
    generate_text(model, GENERATE_LENGTH, VOCAB_SIZE, ix_to_char)
    #Pueden modificar esto para tener más checkpoints
    if nb_epoch % 10 == 0:
        model.save_weights('checkpoint_layer_{}_hidden_{}_epoch_{}.hdf5'.format(LAYER_NUM, HIDDEN_DIM, nb_epoch))
        break



Epoch: 0

Epoch 1/1
Z  the  hobbits  and  the  hills  and  the  hills  and  the  hills  and  the  hills  and  the  hills  and  the  hills  and  the  hills  and  the  hills  and  the  hills  and  the  hills  and  the  hills  and  the  hills  and  the  hills  and  the  hills  and  the  hills  and  the  hills  and  the  hills  and  the  hills  and  the  hills  and  the  hills  and  the  hills  and  the  hills  and  the  hills  and  the  hills  and  the  hills  and  the  hills  and  the  hills  and  the  hills  and  th

Epoch: 1

Epoch 1/1
k  the  stone  of  the  stone  of  the  stone  of  the  stone  of  the  stone  of  the  stone  of  the  stone  of  the  stone  of  the  stone  of  the  stone  of  the  stone  of  the  stone  of  the  stone  of  the  stone  of  the  stone  of  the  stone  of  the  stone  of  the  stone  of  the  stone  of  the  stone  of  the  stone  of  the  stone  of  the  stone  of  the  stone  of  the  stone  of  the  stone  of  the  stone  of  the  stone  of  the  

e  the  strangers  saw  them  all  the  strange  thing  and  the  sound  of  the  dwarves  and  the  sound  of  the  dwarves  and  the  sound  of  the  dwarves  and  the  sound  of  the  dwarves  and  the  sound  of  the  dwarves  and  the  sound  of  the  dwarves  and  the  sound  of  the  dwarves  and  the  sound  of  the  dwarves  and  the  sound  of  the  dwarves  and  the  sound  of  the  dwarves  and  the  sound  of  the  dwarves  and  the  sound  of  the  dwarves  and  the  sound  of  the

Epoch: 27

Epoch 1/1
‚d  the  stranger  of  the  stream  that  had  been  seen  they  had  been  seen  the  sun  was  still  seen  of  the  songs  of  the  dwarves  and  the  dwarves  and  the  dwarves  and  the  dwarves  and  the  dwarves  and  the  dwarves  and  the  dwarves  and  the  dwarves  and  the  dwarves  and  the  dwarves  and  the  dwarves  and  the  dwarves  and  the  dwarves  and  the  dwarves  and  the  dwarves  and  the  dwarves  and  the  dwarves  and  the  dwarves  and  the  

### Generación de texto
Si instancian el modelo y sus parametros (ejecutando algunas celdas preliminares), y tienen los 2 archivos requeridos (.pickle y .hdf5) pueden generar el texto. 
<br>En el ejemplo de LOTR: `VOCAB_SIZE = 84` (si desean probarlo, se adjuntar los pesos y el diccionario, pero no el texto)

In [19]:
#Cuidar de no reemplazar el pickle original
with open('ix_to_char.pickle', 'rb') as handle:
    ix_to_char = pickle.load(handle)
    
WEIGHTS = "checkpoint_layer_2_hidden_250_epoch_50.hdf5"
# Loading the trained weights
model.load_weights(WEIGHTS)
generate_text(model, GENERATE_LENGTH, VOCAB_SIZE, ix_to_char)
print('\n\n')

u(ROFoPZFSZ'PR

KeyError: 85