## <center>Лабораторная работа № 8 'Генерация текста на основе “Алисы в стране чудес”'<center>

### <center>Выполнила студентка 3 курса группы БФИ2001 Калмыкова Дарья<center>

### Цель
Использовать рекуррентные нейронные сети в качестве генеративных моделей.

### Задачи
* Ознакомиться с генерацией текста
* Ознакомиться с системой Callback в Keras

### Требования
1. Реализовать модель ИНС, которая будет генерировать текст
2. Написать собственный CallBack, который будет показывать то как генерируется 
текст во время обучения (то есть раз в какое-то количество эпох генирировать и 
выводить текст у необученной модели)
3. Отследить процесс обучения при помощи TensorFlowCallBack (TensorBoard), в 
отчете привести результаты и их анализ

In [1]:
import keras
import numpy as np
import codecs
import re
import sys

import tensorflow as tf
from keras.models import Sequential
from keras.layers import Dense, Dropout, LSTM
from keras.callbacks import ModelCheckpoint
from keras.utils import np_utils

import datetime
%load_ext tensorboard

In [2]:
fileObj = codecs.open( "./wonderland.txt", "r", "utf_8" )
raw_text = fileObj.read()
text_clear = re.sub(r"[\r\n]", '', raw_text)
raw_text = text_clear.lower()

In [3]:
fileObj.close()

In [4]:
# raw_text

In [5]:
chars = sorted(list(set(raw_text)))
char_to_int = dict((c, i) for i, c in enumerate(chars))

In [6]:
n_chars = len(raw_text)
print("Total Characters: ", n_chars)

Total Characters:  141208


In [7]:
n_vocab = len(chars)
print("Total Vocab: ", n_vocab)

Total Vocab:  48


In [8]:
seq_length = 100
dataX = []
dataY = []
for i in range(0, n_chars - seq_length, 1):
    seq_in = raw_text[i:i + seq_length]
    seq_out = raw_text[i + seq_length]
    dataX.append([char_to_int[char] for char in seq_in])
    dataY.append(char_to_int[seq_out])
n_patterns = len(dataX)
print("Total Patterns: ", n_patterns)

Total Patterns:  141108


In [10]:
# reshape X to be [samples, time steps, features]
X = np.reshape(dataX, (n_patterns, seq_length, 1))
# normalize
X = X / float(n_vocab)
# one hot encode the output variable
y = np_utils.to_categorical(dataY)

In [11]:
model = Sequential()

model.add(LSTM(256, input_shape=(X.shape[1], X.shape[2])))
model.add(Dropout(0.2))
model.add(Dense(y.shape[1], activation='softmax'))

model.compile(loss='categorical_crossentropy', optimizer='adam')

#### Custom callback

In [12]:
class Custom_Callback(keras.callbacks.Callback):
    def __init__(self, data, int_to_char):
        self.dataX = dataX
        self.int_to_char = int_to_char
    
    def period_text_gen(self, size):
        start = np.random.randint(0, n_patterns-1)
        pattern = self.dataX[start]
        text = []
        for i in range(size):
            x = np.reshape(pattern, (1, len(pattern), 1))
            x = x / float(n_vocab)
            prediction = model.predict(x, verbose=0)
            index = np.argmax(prediction)
            result = self.int_to_char[index]
            text.append(result)
            pattern.append(index)
            pattern = pattern[1:len(pattern)]
        return "".join(text)
        
    def on_epoch_end(self, epoch, logs=None):
        if epoch % 5 == 0:
            print(f'Epoch {epoch}\n')
            gen_text = self.period_text_gen(100)
            print(f'Generated text: {gen_text}\n')

#### Using ModelCheckpoint and TensorBoard callbacks

In [15]:
int_to_char = dict((i, c) for i, c in enumerate(chars))

In [13]:
filepath = "weights-improvement-{epoch:02d}-{loss:.4f}.hdf5"

checkpoint = ModelCheckpoint(filepath, monitor='loss', verbose=1, 
                             save_best_only=True, mode='min')

log_dir = "lab8_logs/fit/" + datetime.datetime.now().strftime("%Y%m%d-%H%M%S")

call1 = keras.callbacks.TensorBoard(log_dir=log_dir, histogram_freq=1)

In [16]:
model.fit(X, y, epochs=20, batch_size=128, callbacks=[checkpoint, call1, Custom_Callback(dataX, int_to_char)])

Epoch 1/20
Epoch 1: loss improved from inf to 2.98907, saving model to weights-improvement-01-2.9891.hdf5
Epoch 0

Generated text:  toe toe toe toe toe toe toe toe toe toe toe toe toe toe toe toe toe toe toe toe toe toe toe toe toe

Epoch 2/20
Epoch 2: loss improved from 2.98907 to 2.83258, saving model to weights-improvement-02-2.8326.hdf5
Epoch 3/20
Epoch 3: loss improved from 2.83258 to 2.74843, saving model to weights-improvement-03-2.7484.hdf5
Epoch 4/20
Epoch 4: loss improved from 2.74843 to 2.68092, saving model to weights-improvement-04-2.6809.hdf5
Epoch 5/20
Epoch 5: loss improved from 2.68092 to 2.62355, saving model to weights-improvement-05-2.6235.hdf5
Epoch 6/20
Epoch 6: loss improved from 2.62355 to 2.56708, saving model to weights-improvement-06-2.5671.hdf5
Epoch 5

Generated text: nd she sooe to the wooee to the wooee to the wooee to the wooee to the wooee to the wooee to the woo

Epoch 7/20
Epoch 7: loss improved from 2.56708 to 2.51465, saving model to weights-improve

<keras.callbacks.History at 0x1eab1903d90>

In [18]:
%tensorboard --logdir lab8_logs/fit

Reusing TensorBoard on port 6006 (pid 10784), started 0:00:25 ago. (Use '!kill 10784' to kill it.)

#### Epoch loss

![Epoch loss](./lab8_tb/1.png)

#### Time Series on Dense layers

![Epoch loss](./lab8_tb/2.png)

#### Histograms of Dense layers

![](./lab8_tb/3.png)

#### Histograms of LSTM

![Epoch loss](./lab8_tb/4.png)

### Text generation

In [36]:
# load the network weights
filename = "weights-improvement-20-2.0676.hdf5"
model.load_weights(filename)

model.compile(loss='categorical_crossentropy', optimizer='adam')

In [40]:
# pick a random seed
start = np.random.randint(0, len(dataX)-1)
pattern = dataX[start]
print("Seed:", start)
print("\"", ''.join([int_to_char[value] for value in pattern]), "\"")

# generate characters
for i in range(1000):
    x = np.reshape(pattern, (1, len(pattern), 1))
    x = x / float(n_vocab)
    prediction = model.predict(x, verbose=0)
    index = np.argmax(prediction)
    result = int_to_char[index]
    seq_in = [int_to_char[value] for value in pattern]
    sys.stdout.write(result)
    pattern.append(index)
    pattern = pattern[1:len(pattern)]

print("\nDone.")

Seed: 40680
" abbit’s voice; and alicecalled out as loud as she could, “if you do, i’ll set dinah at you!”there wa "
s a lang of the sabdit worh the rame, and was sotting oo the tooee “ht, and doeng the mook tu then she was soiek the whst an the rabbit  and whnt oedt the woode oadt to the thile  and was sotting oo the tooee the was to the bare and the had not the tabbit whrh the sas oo toeee the whst an anl of the sabdit  and was sotting to the toile, and thene tas no toeeen the hoose to leke that she woudd belin the was oo the saali, and the woode had been ano aor aor aoo oo the tabli, and the marter her aele a little toile to the thile  and was soinking to the thite  atd the tooed had aele deri and toie the wast oo tee shet  the would bedin to tee the harter wo toene to her her  and thene tas a lintle toiee and toine the rabbit was the pooer wite tie tas oo the thiee  and tas soink the toeee the was to the toile, and thene tas not in the toiee “hth toee of the soeeo tfe saadit  and t