# Лабораторная работа № 8

Генерация текста на основе “Алисы в стране чудес”

Выполнил:
    Студент группы БФИ1901
    Чернышов Дмитрий
    
Задачи:

   1. Ознакомиться с генерацией текста
   2. Ознакомиться с системой Callback в Keras

# Цель работы:
Рекуррентные нейронные сети также могут быть использованы в качестве генеративных
моделей.
Это означает, что в дополнение к тому, что они используются для прогнозных моделей
(создания прогнозов), они могут изучать последовательности проблемы, а затем
генерировать совершенно новые вероятные последовательности для проблемной
области.
Подобные генеративные модели полезны не только для изучения того, насколько хорошо
модель выявила проблему, но и для того, чтобы узнать больше о самой проблемной
области.

In [1]:
import numpy
import sys
import keras
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import Dropout
from keras.layers import LSTM
from keras.callbacks import ModelCheckpoint
from keras.utils import np_utils

In [2]:
filename = "wonderland.txt"
raw_text = open(filename).read()
raw_text = raw_text.lower()
#создаем карту каждого символа с уникальным целым числом (преобразование символов в целые числа)
chars = sorted(list(set(raw_text)))
char_to_int = dict((c, i) for i, c in enumerate(chars))
#суммируем набор данных
n_chars = len(raw_text)
n_vocab = len(chars)
print ("Total Characters: ", n_chars)
print ("Total Vocab: ", n_vocab)

Total Characters:  144522
Total Vocab:  48


In [3]:
#разделяем текст книги на подпоследовательности с фиксированной длиной
#в 100 символов произвольной длины.
seq_length = 100
dataX = []
dataY = []
for i in range(0, n_chars - seq_length, 1):
    seq_in = raw_text[i:i + seq_length]
    seq_out = raw_text[i + seq_length]
    dataX.append([char_to_int[char] for char in seq_in])
    dataY.append(char_to_int[seq_out])

n_patterns = len(dataX)
print ("Total Patterns: ", n_patterns)
print ("Total Vocab: ", n_vocab)

Total Patterns:  144422
Total Vocab:  48


In [4]:
# преобразовать список входных последовательностей в форму[образцы, временные шаги, особенности]
X = numpy.reshape(dataX, (n_patterns, seq_length, 1))
# Затем нам нужно изменить масштаб целых чисел в диапазоне от 0 до 1,
X = X / float(n_vocab)
# нужно преобразовать выходные шаблоны (отдельные символы, преобразованные в целые числа) в одну кодировку.
y = np_utils.to_categorical(dataY)

model = Sequential()
model.add(LSTM(256, input_shape=(X.shape[1], X.shape[2])))
model.add(Dropout(0.2))
model.add(Dense(y.shape[1], activation='softmax'))
model.compile(loss='categorical_crossentropy',
optimizer='adam')

In [5]:
# Из-за медлительности и из-за наших требований по
#оптимизации мы будем использовать контрольные точки модели для записи всех сетевых
#весов, чтобы каждый раз регистрировать улучшение потерь в конце эпохи.
filepath="weights-improvement-{epoch:02d}-{loss:.4f}.hdf5"
checkpoint = ModelCheckpoint(filepath, monitor='loss',
verbose=1, save_best_only=True, mode='min')
callbacks_list = [checkpoint]
model.fit(X, y, epochs=50, batch_size=128,
callbacks=callbacks_list)

Epoch 1/50
Epoch 1: loss improved from inf to 2.95942, saving model to weights-improvement-01-2.9594.hdf5
Epoch 2/50
Epoch 2: loss improved from 2.95942 to 2.75733, saving model to weights-improvement-02-2.7573.hdf5
Epoch 3/50
Epoch 3: loss improved from 2.75733 to 2.65511, saving model to weights-improvement-03-2.6551.hdf5
Epoch 4/50
Epoch 4: loss improved from 2.65511 to 2.57896, saving model to weights-improvement-04-2.5790.hdf5
Epoch 5/50
Epoch 5: loss improved from 2.57896 to 2.51767, saving model to weights-improvement-05-2.5177.hdf5
Epoch 6/50
Epoch 6: loss improved from 2.51767 to 2.46294, saving model to weights-improvement-06-2.4629.hdf5
Epoch 7/50
Epoch 7: loss improved from 2.46294 to 2.41220, saving model to weights-improvement-07-2.4122.hdf5
Epoch 8/50
Epoch 8: loss improved from 2.41220 to 2.36677, saving model to weights-improvement-08-2.3668.hdf5
Epoch 9/50
Epoch 9: loss improved from 2.36677 to 2.32004, saving model to weights-improvement-09-2.3200.hdf5
Epoch 10/50
Ep

<keras.callbacks.History at 0x19f24b22920>

In [9]:
# загружаем данные и определяем сеть точно таким же образом, за
#исключением того, что веса сети загружаются из файла контрольных точек, и сеть не нуждается в обучении.
filename = "weights-improvement-48-1.5913.hdf5"
model.load_weights(filename)
model.compile(loss='categorical_crossentropy',
optimizer='adam')
#также необходимо создать обратное отображение, которое мы можем использовать для
#преобразования целых чисел обратно в символы, чтобы мы могли понять предсказания.
int_to_char = dict((i, c) for i, c in enumerate(chars))

# Выбираем случайный шаблон ввода в качестве начальной последовательности
start = numpy.random.randint(0, len(dataX)-1)
pattern = dataX[start]
print ("Seed:")
print ("\"", ''.join([int_to_char[value] for value in pattern]), "\"")
# печатаем сгенерированные символы
for i in range(1000):

    x = numpy.reshape(pattern, (1, len(pattern), 1))
    x = x / float(n_vocab)
    prediction = model.predict(x, verbose=0)
    index = numpy.argmax(prediction)
    result = int_to_char[index]
    seq_in = [int_to_char[value] for value in pattern]
    sys.stdout.write(result)
    pattern.append(index)
    pattern = pattern[1:len(pattern)]

print ("\nDone.")

Seed:
" rush, and had just begun 'well, of all the unjust
things--' when his eye chanced to fall upon alice, "
 then she was now agaun to the tanle, and she sett on  so the sere thine was a luttee of the rore of the tarle of the tarle, and the war aoiineed to tai it auay inck the had aooedde  and the wuile of the werl white sae iewting the har and the sinllder at the wast on, 'a-dad aell tire it  and the seie whin saye gard aedore the sabbit wored ball whth the rabei, and she woile rabeed to be salken time tha was oo the thidg on  and the wert ont lr the seales  the was alliered thrh the wuide hu was arl anoier.

'thet would not,' said the katter anded, aadan ou hir eeed woth a saik. and she sene the wasted a little sire ti thene sar soon it  and the said th the wurle.
and saed to the tueen, the was soi ant lort ani aroinrsing, 
atice tas the mirtle white rabbit wurld hes as she shile tab itir the har hor io a poeen ti thine say ari a lange harger oeme the white rabbit was so the toie.

In [10]:
# pick a random seed
start = numpy.random.randint(0, len(dataX)-1)
pattern = dataX[start]
print ("Seed:")
print ("\"", ''.join([int_to_char[value] for value in pattern]), "\"")
# generate characters
for i in range(1000):

    x = numpy.reshape(pattern, (1, len(pattern), 1))
    x = x / float(n_vocab)
    prediction = model.predict(x, verbose=0)
    index = numpy.argmax(prediction)
    result = int_to_char[index]
    seq_in = [int_to_char[value] for value in pattern]
    sys.stdout.write(result)
    pattern.append(index)
    pattern = pattern[1:len(pattern)]

print ("\nDone.")

Seed:
" s so large a house, that she did not like to go nearer till she had
nibbled some more of the lefthan "
s was so sie toid. and the west on  gnt it was an the winde sabdi  she was aoling toe piget her hrok alange th the terl to her oo the saale, and she soie let head out the hoos  sor ieeds the rabbit was in asllhets  and then she wal allcere that she was aolthe  bui it sas an the wan ou toe ti the thele  bnin the latthrs was the was toi tinl to sar the siaee, and the west on aroiersdy at the wasted out of the woid, and sae to the kotke su tee then shee ohe head out the harce of the caokse, and she taited out of the woide so see the had hor no the taale, and the white rabbit was the winte rabbit, and the wert on ar all thite was ao allc  and sar nort blice, and sae to thin his dead oo the toeee. 
'the surtldd thing,  said the goyphon, 'i wesl to the whitenr!'

'i movt s gn as all ' said alice, whr was sore aiained tore  
and the sere thin sire she seae tuine oot an all cor oo the

In [11]:
import numpy
import sys
import tensorflow as tf
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import Dropout
from keras.layers import LSTM
from keras.callbacks import ModelCheckpoint
from keras.utils import np_utils

filename = "wonderland.txt"
raw_text = open(filename).read()
raw_text = raw_text.lower()

chars = sorted(list(set(raw_text)))
char_to_int = dict((c, i) for i, c in enumerate(chars))

n_chars = len(raw_text)
n_vocab = len(chars)
print ("Total Characters: ", n_chars)
print ("Total Vocab: ", n_vocab)

seq_length = 100
dataX = []
dataY = []
for i in range(0, n_chars - seq_length, 1):
    seq_in = raw_text[i:i + seq_length]
    seq_out = raw_text[i + seq_length]
    dataX.append([char_to_int[char] for char in seq_in])
    dataY.append(char_to_int[seq_out])

n_patterns = len(dataX)
print ("Total Patterns: ", n_patterns)

# reshape X to be [samples, time steps, features]
X = numpy.reshape(dataX, (n_patterns, seq_length, 1))
# normalize
X = X / float(n_vocab)
# one hot encode the output variable
y = np_utils.to_categorical(dataY)

model = Sequential()
model.add(LSTM(256, input_shape=(X.shape[1], X.shape[2])))
model.add(Dropout(0.2))
model.add(Dense(y.shape[1], activation='softmax'))
model.compile(loss='categorical_crossentropy',
optimizer='adam')

# define the checkpoint
tb_callback = tf.keras.callbacks.TensorBoard(log_dir='./logs', histogram_freq=0, batch_size=32,
                                             write_graph=True, write_grads=False, write_images=False,
                                             embeddings_freq=0, embeddings_layer_names=None, embeddings_metadata=None,
                                             embeddings_data=None, update_freq='epoch')

model.fit(X, y, epochs=10, batch_size=512, callbacks=[tb_callback])

Total Characters:  144522
Total Vocab:  48
Total Patterns:  144422
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x19f5ba87160>

In [7]:
# load the network weights
%load_ext tensorboard
%tensorboard --logdir logs

The tensorboard extension is already loaded. To reload it, use:
  %reload_ext tensorboard


In [13]:
filename = "wonderland.txt"
raw_text = open(filename).read()
raw_text = raw_text.lower()

chars = sorted(list(set(raw_text)))
char_to_int = dict((c, i) for i, c in enumerate(chars))

n_chars = len(raw_text)
n_vocab = len(chars)
print ("Total Characters: ", n_chars)
print ("Total Vocab: ", n_vocab)

seq_length = 100
dataX = []
dataY = []
for i in range(0, n_chars - seq_length, 1):
    seq_in = raw_text[i:i + seq_length]
    seq_out = raw_text[i + seq_length]
    dataX.append([char_to_int[char] for char in seq_in])
    dataY.append(char_to_int[seq_out])

n_patterns = len(dataX)
print ("Total Patterns: ", n_patterns)

# reshape X to be [samples, time steps, features]
X = numpy.reshape(dataX, (n_patterns, seq_length, 1))
# normalize
X = X / float(n_vocab)
# one hot encode the output variable
y = np_utils.to_categorical(dataY)

model = Sequential()
model.add(LSTM(256, input_shape=(X.shape[1], X.shape[2])))
model.add(Dropout(0.2))
model.add(Dense(y.shape[1], activation='softmax'))
model.compile(loss='categorical_crossentropy',
optimizer='adam')


class CustomCallback(keras.callbacks.Callback):
    def on_epoch_end(self, epoch, logs=None):
            if (epoch + 1)%1 == 0:
                # pick a random seed
                start = numpy.random.randint(0, len(dataX)-1)
                pattern = dataX[start]
                print ("Seed:")
                print ("\"", ''.join([int_to_char[value] for value in pattern]), "\"")
    
            # generate characters
                for i in range(1000):
                    x = numpy.reshape(pattern, (1, len(pattern), 1))
                    x = x / float(n_vocab)
                    prediction = model.predict(x, verbose=0) 
                    index = numpy.argmax(prediction)
                    result = int_to_char[index]
                    seq_in = [int_to_char[value] for value in pattern]
                    sys.stdout.write(result)
                    pattern.append(index)
                    pattern = pattern[1:len(pattern)]
                    
                print ("\nDone.")
                
# define the checkpoint
#tb_callback = keras.callbacks.Callback.CustomCallback (log_dir='./logs', histogram_freq=0, batch_size=32,
#write_graph=True, write_grads=False, write_images=False,
#embeddings_freq=0, embeddings_layer_names=None, embeddings_metadata=None,
#embeddings_data=None, update_freq='epoch')

model.fit(X, y, epochs=30, batch_size=512, callbacks=[CustomCallback()])


Total Characters:  144522
Total Vocab:  48
Total Patterns:  144422
Epoch 1/30
" oice.

'back to land again, and that's all the first figure,' said the mock
turtle, suddenly droppin "
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 

 lattee  and the tas in a lattee  and the toee to the tast oo the taate  and the tai io the tart oo the taate  and the tai io the tart oo the taate  and the tai io the tart oo the taate  and the tai io the tart oo the taate  and the tai io the tart oo the taate  and the tai io the tart oo the taate  and the tai io the tart oo the taate  and the tai io the tart oo the taate  and the tai io the tart oo the taate  and the tai io the tart oo the taate  and the tai io the tart oo the taate  and the tai io the tart oo the taate  and the tai io the tart oo the taate  and the tai io the tart oo the taate  and the tai io the tart oo the taate  and the tai io the tart oo the taate  and the tai io the tart oo the taate  and the tai io the tart oo the taate  and the tai io the tart oo the taate  and the tai io the tart oo the taate  and the tai io the tart oo the taate  and the tai io the tart oo the taate  and the tai io the tart oo the taate  and the tai io the tart oo the taate  and the tai io 

and the wart on an ince to the thate  and whe suoen was an toee a aoo of the career  a donr thit har so ter an the could, 
'whe care sait doon thet ' said the marthr, ''that a lort oatee to thit ' said the caterpillar.

'iele you mane the garter ' said the cate pirlied 
'io a sard thi grrsouse ' sheu hage an alicd. 
'thel   shi motgh taid to the jury,on the sooe, 
and the wart on an ince to the thate  
Done.
Epoch 26/30
"  a dormouse was sitting
between them, fast asleep, and the other two were using it as a
cushion, res "
 an anl aad not on the cir. 
'the cru'o tooe th toen i aen to the ro thing to teae,' she said to herself, and seiu on an cnl oo the cir. and the was aoo ano der aaad io a lirtle so tho the har end the was so the theee  and was an in satten  

'tha sert to hoa'  said the caterpillar.

'iede you toolt ' said the konk turtle. 'toe was a lertle soreoe toued an in shiee an in shen shee  aaduus an in saaten to herd to teet th the to the tab if the cane an anle and not in t

<keras.callbacks.History at 0x19f5c0aa830>