## <center>Лабораторная работа № 8 'Генерация текста на основе “Алисы в стране чудес”'<center>

### <center>Выполнила студентка 3 курса группы БФИ2001 Калмыкова Дарья<center>

### Цель
Использовать рекуррентные нейронные сети в качестве генеративных моделей.

### Задачи
* Ознакомиться с генерацией текста
* Ознакомиться с системой Callback в Keras

### Требования
1. Реализовать модель ИНС, которая будет генерировать текст
2. Написать собственный CallBack, который будет показывать то как генерируется 
текст во время обучения (то есть раз в какое-то количество эпох генирировать и 
выводить текст у необученной модели)
3. Отследить процесс обучения при помощи TensorFlowCallBack (TensorBoard), в 
отчете привести результаты и их анализ

In [1]:
import keras
import numpy
import codecs
import re
import sys

import tensorflow as tf
from keras.models import Sequential
from keras.layers import Dense, Dropout, LSTM
from keras.callbacks import ModelCheckpoint
from keras.utils import np_utils

import datetime
%load_ext tensorboard

In [2]:
fileObj = codecs.open( "./wonderland.txt", "r", "utf_8" )
raw_text = fileObj.read()
text_clear = re.sub(r"[\r\n]", '', raw_text)
raw_text = text_clear.lower()

In [3]:
fileObj.close()

In [26]:
# raw_text

In [4]:
chars = sorted(list(set(raw_text)))
char_to_int = dict((c, i) for i, c in enumerate(chars))

In [5]:
n_chars = len(raw_text)
print("Total Characters: ", n_chars)

Total Characters:  141208


In [6]:
n_vocab = len(chars)
print("Total Vocab: ", n_vocab)

Total Vocab:  48


In [7]:
seq_length = 100
dataX = []
dataY = []
for i in range(0, n_chars - seq_length, 1):
    seq_in = raw_text[i:i + seq_length]
    seq_out = raw_text[i + seq_length]
    dataX.append([char_to_int[char] for char in seq_in])
    dataY.append(char_to_int[seq_out])
n_patterns = len(dataX)
print("Total Patterns: ", n_patterns)

Total Patterns:  141108


In [8]:
# reshape X to be [samples, time steps, features]
X = numpy.reshape(dataX, (n_patterns, seq_length, 1))
# normalize
X = X / float(n_vocab)
# one hot encode the output variable
y = np_utils.to_categorical(dataY)

In [11]:
model = Sequential()

model.add(LSTM(256, input_shape=(X.shape[1], X.shape[2])))
model.add(Dropout(0.2))
model.add(Dense(y.shape[1], activation='softmax'))

model.compile(loss='categorical_crossentropy', optimizer='adam')

#### Custom callback

In [None]:
def period_text_gen():
    

#### Using ModelCheckpoint and TensorBoard callbacks

In [12]:
# define the checkpoint

filepath = "weights-improvement-{epoch:02d}-{loss:.4f}.hdf5"

checkpoint = ModelCheckpoint(filepath, monitor='loss', 
                             verbose=1, save_best_only=True, mode='min')

# keras.callbacks.ModelCheckpoint(
#                     filepath = filepath,
#                     monitor = 'loss',
#                     verbose = 1,
#                     save_best_only = True,
#                     mode = 'min'
#                     ),

log_dir = "lab8_logs/fit/" + datetime.datetime.now().strftime("%Y%m%d-%H%M%S")

call1 = keras.callbacks.TensorBoard(
                    log_dir=log_dir,
                    histogram_freq=1)

In [13]:
model.fit(X, y, epochs=20, batch_size=128, callbacks=[checkpoint, call1])

Epoch 1/20
Epoch 1: loss improved from inf to 3.00540, saving model to weights-improvement-01-3.0054.hdf5
Epoch 2/20
Epoch 2: loss improved from 3.00540 to 2.83508, saving model to weights-improvement-02-2.8351.hdf5
Epoch 3/20
Epoch 3: loss improved from 2.83508 to 2.74212, saving model to weights-improvement-03-2.7421.hdf5
Epoch 4/20
Epoch 4: loss improved from 2.74212 to 2.67317, saving model to weights-improvement-04-2.6732.hdf5
Epoch 5/20
Epoch 5: loss improved from 2.67317 to 2.61372, saving model to weights-improvement-05-2.6137.hdf5
Epoch 6/20
Epoch 6: loss improved from 2.61372 to 2.55764, saving model to weights-improvement-06-2.5576.hdf5
Epoch 7/20
Epoch 7: loss improved from 2.55764 to 2.50521, saving model to weights-improvement-07-2.5052.hdf5
Epoch 8/20
Epoch 8: loss improved from 2.50521 to 2.45749, saving model to weights-improvement-08-2.4575.hdf5
Epoch 9/20
Epoch 9: loss improved from 2.45749 to 2.41622, saving model to weights-improvement-09-2.4162.hdf5
Epoch 10/20
Ep

<keras.callbacks.History at 0x29544f6f1c0>

In [15]:
%tensorboard --logdir lab8_logs/fit

Reusing TensorBoard on port 6006 (pid 15416), started 7:36:11 ago. (Use '!kill 15416' to kill it.)

#### Epoch loss

![Epoch loss](./lab8_tb/1.png)

#### Time Series on Dense layers

![Epoch loss](./lab8_tb/2.png)

#### Histograms of Dense layers

![](./lab8_tb/3.png)

#### Histograms of LSTM

![Epoch loss](./lab8_tb/4.png)

### Text generation

In [15]:
# load the network weights
filename = "weights-improvement-20-2.0652.hdf5"
model.load_weights(filename)

model.compile(loss='categorical_crossentropy', optimizer='adam')

In [16]:
int_to_char = dict((i, c) for i, c in enumerate(chars))

In [18]:
# pick a random seed
start = numpy.random.randint(0, len(dataX)-1)
pattern = dataX[start]
print("Seed:")
print("\"", ''.join([int_to_char[value] for value in pattern]), "\"")

# generate characters
for i in range(1000):
    x = numpy.reshape(pattern, (1, len(pattern), 1))
    x = x / float(n_vocab)
    prediction = model.predict(x, verbose=0)
    index = numpy.argmax(prediction)
    result = int_to_char[index]
    seq_in = [int_to_char[value] for value in pattern]
    sys.stdout.write(result)
    pattern.append(index)
    pattern = pattern[1:len(pattern)]

print("\nDone.")

Seed:
" re was no more to be said.at last the mouse, who seemed to be a person of authority among them,calle "
 her head  and the garter wothd the gar and the was soe kant of the care and the was sorednlng to to tea it tat  the was soenk on the tonle th the woudd of the care and the was so tork to toeke th the woudd of the doure tf the woudd of the dareeni,ana the gadt was soe kante was anl toene to the woudd of the dareenii the care an                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                