### Dados de entrada ###

Vamos usar versos simples para fazer nosso modelo de predição:


In [1]:
data = """ João amava Teresa que amava Raimundo\n 
que amava Maria que amava Joaquim que amava Lili\n 
que não amava ninguém.\n 
João foi pra os Estados Unidos, Teresa para o convento,\n
Raimundo morreu de desastre, Maria ficou para tia,\n
Joaquim suicidou-se e Lili casou com J. Pinto Fernandes \n
que não tinha entrado na história.\n """

data2 = """ No meio do caminho tinha uma pedra\n
tinha uma pedra no meio do caminho\n
tinha uma pedra \n
no meio do caminho tinha uma pedra.\n

Nunca me esquecerei desse acontecimento\n
na vida de minhas retinas tão fatigadas.\n
Nunca me esquecerei que no meio do caminho\n
tinha uma pedra\n
tinha uma pedra no meio do caminho\n
no meio do caminho tinha uma pedra.\n """

Vamos fazer um modelo simples de predição de próxima palavra utilizando LSTM com keras. Para começar, vamos tokenizar:

In [14]:
from numpy import array
from keras.preprocessing.text import Tokenizer
from keras.utils import to_categorical
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM
from keras.layers import Embedding

tokenizer = Tokenizer()
tokenizer.fit_on_texts([data2])
encoded = tokenizer.texts_to_sequences([data2])[0]

In [15]:
vocab_size = len(tokenizer.word_index)+1
print(vocab_size)
print(encoded)

21
[4, 5, 6, 7, 1, 2, 3, 1, 2, 3, 4, 5, 6, 7, 1, 2, 3, 4, 5, 6, 7, 1, 2, 3, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 8, 9, 10, 20, 4, 5, 6, 7, 1, 2, 3, 1, 2, 3, 4, 5, 6, 7, 4, 5, 6, 7, 1, 2, 3]


In [16]:
sequences = list()
for i in range(1, len(encoded)):
    sequence = encoded[i-1:i+1]
    sequences.append(sequence)
print('Total Sequences: %d' % len(sequences))


sequences = array(sequences)
X, y = sequences[:,0],sequences[:,1]
y = to_categorical(y, num_classes=vocab_size)
print(sequences)

Total Sequences: 60
[[ 4  5]
 [ 5  6]
 [ 6  7]
 [ 7  1]
 [ 1  2]
 [ 2  3]
 [ 3  1]
 [ 1  2]
 [ 2  3]
 [ 3  4]
 [ 4  5]
 [ 5  6]
 [ 6  7]
 [ 7  1]
 [ 1  2]
 [ 2  3]
 [ 3  4]
 [ 4  5]
 [ 5  6]
 [ 6  7]
 [ 7  1]
 [ 1  2]
 [ 2  3]
 [ 3  8]
 [ 8  9]
 [ 9 10]
 [10 11]
 [11 12]
 [12 13]
 [13 14]
 [14 15]
 [15 16]
 [16 17]
 [17 18]
 [18 19]
 [19  8]
 [ 8  9]
 [ 9 10]
 [10 20]
 [20  4]
 [ 4  5]
 [ 5  6]
 [ 6  7]
 [ 7  1]
 [ 1  2]
 [ 2  3]
 [ 3  1]
 [ 1  2]
 [ 2  3]
 [ 3  4]
 [ 4  5]
 [ 5  6]
 [ 6  7]
 [ 7  4]
 [ 4  5]
 [ 5  6]
 [ 6  7]
 [ 7  1]
 [ 1  2]
 [ 2  3]]


Agora que temos os dados organizados, podemos treinar nosso modelo para fazer previsão da próxima palavra com LSTM:

In [17]:
model = Sequential()
model.add(Embedding(vocab_size, 10, input_length=1))
model.add(LSTM(50))
model.add(Dense(vocab_size, activation='softmax'))
print(model.summary())

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_2 (Embedding)      (None, 1, 10)             210       
_________________________________________________________________
lstm_2 (LSTM)                (None, 50)                12200     
_________________________________________________________________
dense_2 (Dense)              (None, 21)                1071      
Total params: 13,481
Trainable params: 13,481
Non-trainable params: 0
_________________________________________________________________
None


In [18]:
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
# fit network
model.fit(X, y, epochs=500, verbose=2)

Epoch 1/500
 - 1s - loss: 3.0447 - acc: 0.0167
Epoch 2/500
 - 0s - loss: 3.0424 - acc: 0.0833
Epoch 3/500
 - 0s - loss: 3.0398 - acc: 0.1333
Epoch 4/500
 - 0s - loss: 3.0373 - acc: 0.2000
Epoch 5/500
 - 0s - loss: 3.0345 - acc: 0.2000
Epoch 6/500
 - 0s - loss: 3.0318 - acc: 0.2000
Epoch 7/500
 - 0s - loss: 3.0290 - acc: 0.2000
Epoch 8/500
 - 0s - loss: 3.0261 - acc: 0.2000
Epoch 9/500
 - 0s - loss: 3.0231 - acc: 0.2000
Epoch 10/500
 - 0s - loss: 3.0203 - acc: 0.2000
Epoch 11/500
 - 0s - loss: 3.0170 - acc: 0.2167
Epoch 12/500
 - 0s - loss: 3.0139 - acc: 0.2833
Epoch 13/500
 - 0s - loss: 3.0104 - acc: 0.3333
Epoch 14/500
 - 0s - loss: 3.0072 - acc: 0.3333
Epoch 15/500
 - 0s - loss: 3.0035 - acc: 0.3333
Epoch 16/500
 - 0s - loss: 2.9997 - acc: 0.3333
Epoch 17/500
 - 0s - loss: 2.9958 - acc: 0.4333
Epoch 18/500
 - 0s - loss: 2.9918 - acc: 0.4333
Epoch 19/500
 - 0s - loss: 2.9877 - acc: 0.4333
Epoch 20/500
 - 0s - loss: 2.9830 - acc: 0.4333
Epoch 21/500
 - 0s - loss: 2.9785 - acc: 0.4333
E

Epoch 171/500
 - 0s - loss: 0.7860 - acc: 0.8667
Epoch 172/500
 - 0s - loss: 0.7797 - acc: 0.8667
Epoch 173/500
 - 0s - loss: 0.7733 - acc: 0.8667
Epoch 174/500
 - 0s - loss: 0.7669 - acc: 0.8667
Epoch 175/500
 - 0s - loss: 0.7606 - acc: 0.8833
Epoch 176/500
 - 0s - loss: 0.7547 - acc: 0.8833
Epoch 177/500
 - 0s - loss: 0.7481 - acc: 0.9000
Epoch 178/500
 - 0s - loss: 0.7424 - acc: 0.9000
Epoch 179/500
 - 0s - loss: 0.7361 - acc: 0.9000
Epoch 180/500
 - 0s - loss: 0.7305 - acc: 0.9000
Epoch 181/500
 - 0s - loss: 0.7242 - acc: 0.9000
Epoch 182/500
 - 0s - loss: 0.7188 - acc: 0.9000
Epoch 183/500
 - 0s - loss: 0.7130 - acc: 0.9000
Epoch 184/500
 - 0s - loss: 0.7070 - acc: 0.9000
Epoch 185/500
 - 0s - loss: 0.7015 - acc: 0.9000
Epoch 186/500
 - 0s - loss: 0.6958 - acc: 0.9000
Epoch 187/500
 - 0s - loss: 0.6906 - acc: 0.9000
Epoch 188/500
 - 0s - loss: 0.6847 - acc: 0.9000
Epoch 189/500
 - 0s - loss: 0.6793 - acc: 0.9000
Epoch 190/500
 - 0s - loss: 0.6736 - acc: 0.9167
Epoch 191/500
 - 0s 

 - 0s - loss: 0.2387 - acc: 0.9167
Epoch 339/500
 - 0s - loss: 0.2377 - acc: 0.9167
Epoch 340/500
 - 0s - loss: 0.2366 - acc: 0.9167
Epoch 341/500
 - 0s - loss: 0.2356 - acc: 0.9167
Epoch 342/500
 - 0s - loss: 0.2348 - acc: 0.9167
Epoch 343/500
 - 0s - loss: 0.2338 - acc: 0.9167
Epoch 344/500
 - 0s - loss: 0.2330 - acc: 0.9167
Epoch 345/500
 - 0s - loss: 0.2319 - acc: 0.9167
Epoch 346/500
 - 0s - loss: 0.2310 - acc: 0.9167
Epoch 347/500
 - 0s - loss: 0.2304 - acc: 0.9167
Epoch 348/500
 - 0s - loss: 0.2295 - acc: 0.9167
Epoch 349/500
 - 0s - loss: 0.2287 - acc: 0.9167
Epoch 350/500
 - 0s - loss: 0.2278 - acc: 0.9167
Epoch 351/500
 - 0s - loss: 0.2271 - acc: 0.9167
Epoch 352/500
 - 0s - loss: 0.2262 - acc: 0.9167
Epoch 353/500
 - 0s - loss: 0.2254 - acc: 0.9167
Epoch 354/500
 - 0s - loss: 0.2247 - acc: 0.9167
Epoch 355/500
 - 0s - loss: 0.2243 - acc: 0.9167
Epoch 356/500
 - 0s - loss: 0.2233 - acc: 0.9167
Epoch 357/500
 - 0s - loss: 0.2225 - acc: 0.9167
Epoch 358/500
 - 0s - loss: 0.2218

<keras.callbacks.History at 0x235cfad3208>

Vamos agora verificar a previsão de próxima palavra para "João":

In [7]:
in_text = 'João'
print(in_text)
encoded = tokenizer.texts_to_sequences([in_text])[0]
encoded = array(encoded)
yhat = model.predict_classes(encoded, verbose=0)
for word, index in tokenizer.word_index.items():
    if index == yhat:
        print(word)

João
amava


In [8]:
def generate_seq(model, tokenizer, seed_text, n_words):
	in_text, result = seed_text, seed_text
	# generate a fixed number of words
	for _ in range(n_words):
		# encode the text as integer
		encoded = tokenizer.texts_to_sequences([in_text])[0]
		encoded = array(encoded)
		# predict a word in the vocabulary
		yhat = model.predict_classes(encoded, verbose=0)
		# map predicted word index to word
		out_word = ''
		for word, index in tokenizer.word_index.items():
			if index == yhat:
				out_word = word
				break
		# append to input
		in_text, result = out_word, result + ' ' + out_word
	return result


In [23]:
generate_seq(model,tokenizer,'No',20)

'No meio do caminho tinha uma pedra no meio do caminho tinha uma pedra no meio do caminho tinha uma pedra'

Nào ficou muito bom né? Isso porque o LSTM está capturando quase nada da sequência inteira de palavras, já que ele só está recebendo a palavra atual como contexto.

### Linha a linha ###

Uma outra abordagem, é adicionar versos inteiros à LSTM, começando apenas com um termo e indo aumentando. Para fazer isto, precisaremos colocar padding nas entradas:



In [27]:
sequences = list()
for line in data2.split('\n'):
	encoded = tokenizer.texts_to_sequences([line])[0]
	for i in range(1, len(encoded)):
		sequence = encoded[:i+1]
		sequences.append(sequence)
print('Total Sequences: %d' % len(sequences))
print(sequences)

Total Sequences: 51
[[4, 5], [4, 5, 6], [4, 5, 6, 7], [4, 5, 6, 7, 1], [4, 5, 6, 7, 1, 2], [4, 5, 6, 7, 1, 2, 3], [1, 2], [1, 2, 3], [1, 2, 3, 4], [1, 2, 3, 4, 5], [1, 2, 3, 4, 5, 6], [1, 2, 3, 4, 5, 6, 7], [1, 2], [1, 2, 3], [4, 5], [4, 5, 6], [4, 5, 6, 7], [4, 5, 6, 7, 1], [4, 5, 6, 7, 1, 2], [4, 5, 6, 7, 1, 2, 3], [8, 9], [8, 9, 10], [8, 9, 10, 11], [8, 9, 10, 11, 12], [13, 14], [13, 14, 15], [13, 14, 15, 16], [13, 14, 15, 16, 17], [13, 14, 15, 16, 17, 18], [13, 14, 15, 16, 17, 18, 19], [8, 9], [8, 9, 10], [8, 9, 10, 20], [8, 9, 10, 20, 4], [8, 9, 10, 20, 4, 5], [8, 9, 10, 20, 4, 5, 6], [8, 9, 10, 20, 4, 5, 6, 7], [1, 2], [1, 2, 3], [1, 2], [1, 2, 3], [1, 2, 3, 4], [1, 2, 3, 4, 5], [1, 2, 3, 4, 5, 6], [1, 2, 3, 4, 5, 6, 7], [4, 5], [4, 5, 6], [4, 5, 6, 7], [4, 5, 6, 7, 1], [4, 5, 6, 7, 1, 2], [4, 5, 6, 7, 1, 2, 3]]


In [29]:
from keras.preprocessing.sequence import pad_sequences

max_length = max([len(seq) for seq in sequences])
sequences = pad_sequences(sequences, maxlen=max_length, padding='pre')
print('Max Sequence Length: %d' % max_length)


#Usar as sequências para prever o último termo
sequences = array(sequences)
X, y = sequences[:,:-1],sequences[:,-1]
y = to_categorical(y, num_classes=vocab_size)
print(sequences)

Max Sequence Length: 8
[[ 0  0  0  0  0  0  4  5]
 [ 0  0  0  0  0  4  5  6]
 [ 0  0  0  0  4  5  6  7]
 [ 0  0  0  4  5  6  7  1]
 [ 0  0  4  5  6  7  1  2]
 [ 0  4  5  6  7  1  2  3]
 [ 0  0  0  0  0  0  1  2]
 [ 0  0  0  0  0  1  2  3]
 [ 0  0  0  0  1  2  3  4]
 [ 0  0  0  1  2  3  4  5]
 [ 0  0  1  2  3  4  5  6]
 [ 0  1  2  3  4  5  6  7]
 [ 0  0  0  0  0  0  1  2]
 [ 0  0  0  0  0  1  2  3]
 [ 0  0  0  0  0  0  4  5]
 [ 0  0  0  0  0  4  5  6]
 [ 0  0  0  0  4  5  6  7]
 [ 0  0  0  4  5  6  7  1]
 [ 0  0  4  5  6  7  1  2]
 [ 0  4  5  6  7  1  2  3]
 [ 0  0  0  0  0  0  8  9]
 [ 0  0  0  0  0  8  9 10]
 [ 0  0  0  0  8  9 10 11]
 [ 0  0  0  8  9 10 11 12]
 [ 0  0  0  0  0  0 13 14]
 [ 0  0  0  0  0 13 14 15]
 [ 0  0  0  0 13 14 15 16]
 [ 0  0  0 13 14 15 16 17]
 [ 0  0 13 14 15 16 17 18]
 [ 0 13 14 15 16 17 18 19]
 [ 0  0  0  0  0  0  8  9]
 [ 0  0  0  0  0  8  9 10]
 [ 0  0  0  0  8  9 10 20]
 [ 0  0  0  8  9 10 20  4]
 [ 0  0  8  9 10 20  4  5]
 [ 0  8  9 10 20  4  5  6]
 [ 8 

Vamos usar o mesmo modelo anterior para prever nessa nova entrada:

In [30]:
model = Sequential()
model.add(Embedding(vocab_size, 10, input_length=max_length-1))
model.add(LSTM(50))
model.add(Dense(vocab_size, activation='softmax'))
print(model.summary())
# compile network
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
# fit network
model.fit(X, y, epochs=500, verbose=2)

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_3 (Embedding)      (None, 7, 10)             210       
_________________________________________________________________
lstm_3 (LSTM)                (None, 50)                12200     
_________________________________________________________________
dense_3 (Dense)              (None, 21)                1071      
Total params: 13,481
Trainable params: 13,481
Non-trainable params: 0
_________________________________________________________________
None
Epoch 1/500
 - 1s - loss: 3.0431 - acc: 0.0000e+00
Epoch 2/500
 - 0s - loss: 3.0392 - acc: 0.0000e+00
Epoch 3/500
 - 0s - loss: 3.0349 - acc: 0.0588
Epoch 4/500
 - 0s - loss: 3.0302 - acc: 0.1373
Epoch 5/500
 - 0s - loss: 3.0254 - acc: 0.1373
Epoch 6/500
 - 0s - loss: 3.0203 - acc: 0.1373
Epoch 7/500
 - 0s - loss: 3.0142 - acc: 0.1373
Epoch 8/500
 - 0s - loss: 3.0076 - acc: 0.1373
Epoch 9/500
 

 - 0s - loss: 0.4249 - acc: 0.9412
Epoch 156/500
 - 0s - loss: 0.4178 - acc: 0.9412
Epoch 157/500
 - 0s - loss: 0.4109 - acc: 0.9412
Epoch 158/500
 - 0s - loss: 0.4044 - acc: 0.9412
Epoch 159/500
 - 0s - loss: 0.3982 - acc: 0.9412
Epoch 160/500
 - 0s - loss: 0.3922 - acc: 0.9412
Epoch 161/500
 - 0s - loss: 0.3859 - acc: 0.9608
Epoch 162/500
 - 0s - loss: 0.3796 - acc: 0.9608
Epoch 163/500
 - 0s - loss: 0.3738 - acc: 0.9608
Epoch 164/500
 - 0s - loss: 0.3675 - acc: 0.9608
Epoch 165/500
 - 0s - loss: 0.3629 - acc: 0.9608
Epoch 166/500
 - 0s - loss: 0.3579 - acc: 0.9804
Epoch 167/500
 - 0s - loss: 0.3522 - acc: 0.9804
Epoch 168/500
 - 0s - loss: 0.3465 - acc: 0.9804
Epoch 169/500
 - 0s - loss: 0.3422 - acc: 0.9804
Epoch 170/500
 - 0s - loss: 0.3375 - acc: 0.9804
Epoch 171/500
 - 0s - loss: 0.3319 - acc: 0.9804
Epoch 172/500
 - 0s - loss: 0.3262 - acc: 0.9804
Epoch 173/500
 - 0s - loss: 0.3218 - acc: 0.9804
Epoch 174/500
 - 0s - loss: 0.3169 - acc: 0.9804
Epoch 175/500
 - 0s - loss: 0.3118

Epoch 323/500
 - 0s - loss: 0.0672 - acc: 0.9608
Epoch 324/500
 - 0s - loss: 0.0665 - acc: 0.9804
Epoch 325/500
 - 0s - loss: 0.0662 - acc: 0.9804
Epoch 326/500
 - 0s - loss: 0.0658 - acc: 0.9804
Epoch 327/500
 - 0s - loss: 0.0657 - acc: 0.9608
Epoch 328/500
 - 0s - loss: 0.0652 - acc: 0.9804
Epoch 329/500
 - 0s - loss: 0.0652 - acc: 0.9804
Epoch 330/500
 - 0s - loss: 0.0644 - acc: 0.9804
Epoch 331/500
 - 0s - loss: 0.0643 - acc: 0.9804
Epoch 332/500
 - 0s - loss: 0.0641 - acc: 0.9804
Epoch 333/500
 - 0s - loss: 0.0635 - acc: 0.9804
Epoch 334/500
 - 0s - loss: 0.0633 - acc: 0.9804
Epoch 335/500
 - 0s - loss: 0.0628 - acc: 0.9804
Epoch 336/500
 - 0s - loss: 0.0625 - acc: 0.9804
Epoch 337/500
 - 0s - loss: 0.0624 - acc: 0.9608
Epoch 338/500
 - 0s - loss: 0.0622 - acc: 0.9608
Epoch 339/500
 - 0s - loss: 0.0616 - acc: 0.9804
Epoch 340/500
 - 0s - loss: 0.0612 - acc: 0.9804
Epoch 341/500
 - 0s - loss: 0.0610 - acc: 0.9804
Epoch 342/500
 - 0s - loss: 0.0608 - acc: 0.9804
Epoch 343/500
 - 0s 

 - 0s - loss: 0.0405 - acc: 0.9608
Epoch 491/500
 - 0s - loss: 0.0402 - acc: 0.9804
Epoch 492/500
 - 0s - loss: 0.0403 - acc: 0.9804
Epoch 493/500
 - 0s - loss: 0.0405 - acc: 0.9804
Epoch 494/500
 - 0s - loss: 0.0401 - acc: 0.9804
Epoch 495/500
 - 0s - loss: 0.0402 - acc: 0.9804
Epoch 496/500
 - 0s - loss: 0.0403 - acc: 0.9804
Epoch 497/500
 - 0s - loss: 0.0399 - acc: 0.9804
Epoch 498/500
 - 0s - loss: 0.0399 - acc: 0.9804
Epoch 499/500
 - 0s - loss: 0.0400 - acc: 0.9804
Epoch 500/500
 - 0s - loss: 0.0397 - acc: 0.9804


<keras.callbacks.History at 0x2376f0acb70>

In [31]:
#adiciona as palavras previstas a entrada da proxima
def generate_seq(model, tokenizer, max_length, seed_text, n_words):
	in_text = seed_text
	# generate a fixed number of words
	for _ in range(n_words):
		# encode the text as integer
		encoded = tokenizer.texts_to_sequences([in_text])[0]
		# pre-pad sequences to a fixed length
		encoded = pad_sequences([encoded], maxlen=max_length, padding='pre')
		# predict probabilities for each word
		yhat = model.predict_classes(encoded, verbose=0)
		# map predicted word index to word
		out_word = ''
		for word, index in tokenizer.word_index.items():
			if index == yhat:
				out_word = word
				break
		# append to input
		in_text += ' ' + out_word
	return in_text

In [32]:
generate_seq(model,tokenizer,max_length-1,'Nunca',100)

'Nunca me esquecerei desse acontecimento acontecimento acontecimento acontecimento uma uma fatigadas pedra meio do caminho tinha uma pedra no meio do caminho tinha uma pedra no meio do caminho tinha uma pedra no meio do caminho tinha uma pedra no meio do caminho tinha uma pedra no meio do caminho tinha uma pedra no meio do caminho tinha uma pedra no meio do caminho tinha uma pedra no meio do caminho tinha uma pedra no meio do caminho tinha uma pedra no meio do caminho tinha uma pedra no meio do caminho tinha uma pedra no meio do caminho tinha uma'

### Dois para um ###

Podemos também modificar para ao invés de receber um verso inteiro, receber apenas duas palavras (um meio termo entre os dois):

In [33]:
sequences = list()
for i in range(2, len(encoded)):
	sequence = encoded[i-2:i+1]
	sequences.append(sequence)

In [34]:
model = Sequential()
model.add(Embedding(vocab_size, 10, input_length=max_length-1))
model.add(LSTM(50))
model.add(Dense(vocab_size, activation='softmax'))
print(model.summary())
# compile network
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
# fit network
model.fit(X, y, epochs=500, verbose=2)

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_4 (Embedding)      (None, 7, 10)             210       
_________________________________________________________________
lstm_4 (LSTM)                (None, 50)                12200     
_________________________________________________________________
dense_4 (Dense)              (None, 21)                1071      
Total params: 13,481
Trainable params: 13,481
Non-trainable params: 0
_________________________________________________________________
None
Epoch 1/500
 - 1s - loss: 3.0441 - acc: 0.0196
Epoch 2/500
 - 0s - loss: 3.0393 - acc: 0.0784
Epoch 3/500
 - 0s - loss: 3.0341 - acc: 0.0980
Epoch 4/500
 - 0s - loss: 3.0289 - acc: 0.1569
Epoch 5/500
 - 0s - loss: 3.0234 - acc: 0.2157
Epoch 6/500
 - 0s - loss: 3.0171 - acc: 0.2157
Epoch 7/500
 - 0s - loss: 3.0102 - acc: 0.1961
Epoch 8/500
 - 0s - loss: 3.0027 - acc: 0.1961
Epoch 9/500
 - 0s - l

Epoch 156/500
 - 0s - loss: 0.5537 - acc: 0.9216
Epoch 157/500
 - 0s - loss: 0.5433 - acc: 0.9216
Epoch 158/500
 - 0s - loss: 0.5345 - acc: 0.9216
Epoch 159/500
 - 0s - loss: 0.5276 - acc: 0.9216
Epoch 160/500
 - 0s - loss: 0.5175 - acc: 0.9216
Epoch 161/500
 - 0s - loss: 0.5091 - acc: 0.9216
Epoch 162/500
 - 0s - loss: 0.5016 - acc: 0.9216
Epoch 163/500
 - 0s - loss: 0.4924 - acc: 0.9216
Epoch 164/500
 - 0s - loss: 0.4853 - acc: 0.9216
Epoch 165/500
 - 0s - loss: 0.4790 - acc: 0.9216
Epoch 166/500
 - 0s - loss: 0.4712 - acc: 0.9216
Epoch 167/500
 - 0s - loss: 0.4626 - acc: 0.9216
Epoch 168/500
 - 0s - loss: 0.4558 - acc: 0.9216
Epoch 169/500
 - 0s - loss: 0.4496 - acc: 0.9216
Epoch 170/500
 - 0s - loss: 0.4430 - acc: 0.9216
Epoch 171/500
 - 0s - loss: 0.4347 - acc: 0.9216
Epoch 172/500
 - 0s - loss: 0.4281 - acc: 0.9412
Epoch 173/500
 - 0s - loss: 0.4220 - acc: 0.9412
Epoch 174/500
 - 0s - loss: 0.4146 - acc: 0.9412
Epoch 175/500
 - 0s - loss: 0.4086 - acc: 0.9412
Epoch 176/500
 - 0s 

 - 0s - loss: 0.0807 - acc: 0.9804
Epoch 324/500
 - 0s - loss: 0.0805 - acc: 0.9804
Epoch 325/500
 - 0s - loss: 0.0796 - acc: 0.9804
Epoch 326/500
 - 0s - loss: 0.0790 - acc: 0.9804
Epoch 327/500
 - 0s - loss: 0.0785 - acc: 0.9804
Epoch 328/500
 - 0s - loss: 0.0782 - acc: 0.9804
Epoch 329/500
 - 0s - loss: 0.0775 - acc: 0.9804
Epoch 330/500
 - 0s - loss: 0.0769 - acc: 0.9804
Epoch 331/500
 - 0s - loss: 0.0764 - acc: 0.9804
Epoch 332/500
 - 0s - loss: 0.0761 - acc: 0.9804
Epoch 333/500
 - 0s - loss: 0.0755 - acc: 0.9804
Epoch 334/500
 - 0s - loss: 0.0748 - acc: 0.9804
Epoch 335/500
 - 0s - loss: 0.0750 - acc: 0.9608
Epoch 336/500
 - 0s - loss: 0.0740 - acc: 0.9804
Epoch 337/500
 - 0s - loss: 0.0736 - acc: 0.9804
Epoch 338/500
 - 0s - loss: 0.0731 - acc: 0.9804
Epoch 339/500
 - 0s - loss: 0.0727 - acc: 0.9804
Epoch 340/500
 - 0s - loss: 0.0724 - acc: 0.9804
Epoch 341/500
 - 0s - loss: 0.0719 - acc: 0.9804
Epoch 342/500
 - 0s - loss: 0.0714 - acc: 0.9804
Epoch 343/500
 - 0s - loss: 0.0710

Epoch 491/500
 - 0s - loss: 0.0425 - acc: 0.9804
Epoch 492/500
 - 0s - loss: 0.0424 - acc: 0.9804
Epoch 493/500
 - 0s - loss: 0.0425 - acc: 0.9804
Epoch 494/500
 - 0s - loss: 0.0425 - acc: 0.9804
Epoch 495/500
 - 0s - loss: 0.0425 - acc: 0.9804
Epoch 496/500
 - 0s - loss: 0.0424 - acc: 0.9804
Epoch 497/500
 - 0s - loss: 0.0426 - acc: 0.9804
Epoch 498/500
 - 0s - loss: 0.0423 - acc: 0.9804
Epoch 499/500
 - 0s - loss: 0.0421 - acc: 0.9804
Epoch 500/500
 - 0s - loss: 0.0420 - acc: 0.9804


<keras.callbacks.History at 0x23770cd6160>

In [36]:
generate_seq(model,tokenizer,max_length-1,'Nunca',100)

'Nunca me esquecerei que no meio do caminho tinha uma pedra no meio do caminho tinha uma pedra no meio do caminho tinha uma pedra no meio do caminho tinha uma pedra no meio do caminho tinha uma pedra no meio do caminho tinha uma pedra no meio do caminho tinha uma pedra no meio do caminho tinha uma pedra no meio do caminho tinha uma pedra no meio do caminho tinha uma pedra no meio do caminho tinha uma pedra no meio do caminho tinha uma pedra no meio do caminho tinha uma pedra no meio do caminho tinha uma'

### Exercício ###

Faça uma versão que receba o verso inteiro como entrada, elemento por elemento.

In [42]:
sequences = list()
encoded = tokenizer.texts_to_sequences([data2])[0]
for i in range(1, len(encoded)):
    sequence = encoded[:i+1]
    sequences.append(sequence)
print(sequences)

max_length = max([len(seq) for seq in sequences])
sequences = pad_sequences(sequences, maxlen=max_length, padding='pre')
print('Max Sequence Length: %d' % max_length)


#Usar as sequências para prever o último termo
sequences = array(sequences)
X, y = sequences[:,:-1],sequences[:,-1]
y = to_categorical(y, num_classes=vocab_size)
#print(sequences)

[[4, 5], [4, 5, 6], [4, 5, 6, 7], [4, 5, 6, 7, 1], [4, 5, 6, 7, 1, 2], [4, 5, 6, 7, 1, 2, 3], [4, 5, 6, 7, 1, 2, 3, 1], [4, 5, 6, 7, 1, 2, 3, 1, 2], [4, 5, 6, 7, 1, 2, 3, 1, 2, 3], [4, 5, 6, 7, 1, 2, 3, 1, 2, 3, 4], [4, 5, 6, 7, 1, 2, 3, 1, 2, 3, 4, 5], [4, 5, 6, 7, 1, 2, 3, 1, 2, 3, 4, 5, 6], [4, 5, 6, 7, 1, 2, 3, 1, 2, 3, 4, 5, 6, 7], [4, 5, 6, 7, 1, 2, 3, 1, 2, 3, 4, 5, 6, 7, 1], [4, 5, 6, 7, 1, 2, 3, 1, 2, 3, 4, 5, 6, 7, 1, 2], [4, 5, 6, 7, 1, 2, 3, 1, 2, 3, 4, 5, 6, 7, 1, 2, 3], [4, 5, 6, 7, 1, 2, 3, 1, 2, 3, 4, 5, 6, 7, 1, 2, 3, 4], [4, 5, 6, 7, 1, 2, 3, 1, 2, 3, 4, 5, 6, 7, 1, 2, 3, 4, 5], [4, 5, 6, 7, 1, 2, 3, 1, 2, 3, 4, 5, 6, 7, 1, 2, 3, 4, 5, 6], [4, 5, 6, 7, 1, 2, 3, 1, 2, 3, 4, 5, 6, 7, 1, 2, 3, 4, 5, 6, 7], [4, 5, 6, 7, 1, 2, 3, 1, 2, 3, 4, 5, 6, 7, 1, 2, 3, 4, 5, 6, 7, 1], [4, 5, 6, 7, 1, 2, 3, 1, 2, 3, 4, 5, 6, 7, 1, 2, 3, 4, 5, 6, 7, 1, 2], [4, 5, 6, 7, 1, 2, 3, 1, 2, 3, 4, 5, 6, 7, 1, 2, 3, 4, 5, 6, 7, 1, 2, 3], [4, 5, 6, 7, 1, 2, 3, 1, 2, 3, 4, 5, 6, 7, 1, 2, 3, 4, 5

In [40]:
model = Sequential()
model.add(Embedding(vocab_size, 10, input_length=max_length-1))
model.add(LSTM(50))
model.add(Dense(vocab_size, activation='softmax'))
print(model.summary())
# compile network
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
# fit network
model.fit(X, y, epochs=500, verbose=2)

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_5 (Embedding)      (None, 60, 10)            210       
_________________________________________________________________
lstm_5 (LSTM)                (None, 50)                12200     
_________________________________________________________________
dense_5 (Dense)              (None, 21)                1071      
Total params: 13,481
Trainable params: 13,481
Non-trainable params: 0
_________________________________________________________________
None
Epoch 1/500
 - 1s - loss: 3.0435 - acc: 0.0333
Epoch 2/500
 - 0s - loss: 3.0395 - acc: 0.1333
Epoch 3/500
 - 0s - loss: 3.0349 - acc: 0.1500
Epoch 4/500
 - 0s - loss: 3.0304 - acc: 0.1500
Epoch 5/500
 - 0s - loss: 3.0251 - acc: 0.2000
Epoch 6/500
 - 0s - loss: 3.0189 - acc: 0.2000
Epoch 7/500
 - 0s - loss: 3.0115 - acc: 0.2000
Epoch 8/500
 - 0s - loss: 3.0029 - acc: 0.2000
Epoch 9/500
 - 0s - l

Epoch 156/500
 - 0s - loss: 1.1604 - acc: 0.6500
Epoch 157/500
 - 0s - loss: 1.1434 - acc: 0.7000
Epoch 158/500
 - 0s - loss: 1.1318 - acc: 0.6833
Epoch 159/500
 - 0s - loss: 1.1307 - acc: 0.6833
Epoch 160/500
 - 0s - loss: 1.1221 - acc: 0.7167
Epoch 161/500
 - 0s - loss: 1.1116 - acc: 0.7000
Epoch 162/500
 - 0s - loss: 1.1065 - acc: 0.6667
Epoch 163/500
 - 0s - loss: 1.0897 - acc: 0.7167
Epoch 164/500
 - 0s - loss: 1.0937 - acc: 0.7500
Epoch 165/500
 - 0s - loss: 1.0724 - acc: 0.7333
Epoch 166/500
 - 0s - loss: 1.0747 - acc: 0.7167
Epoch 167/500
 - 0s - loss: 1.0746 - acc: 0.7500
Epoch 168/500
 - 0s - loss: 1.0592 - acc: 0.7667
Epoch 169/500
 - 0s - loss: 1.0564 - acc: 0.7333
Epoch 170/500
 - 0s - loss: 1.0431 - acc: 0.7833
Epoch 171/500
 - 0s - loss: 1.0434 - acc: 0.7667
Epoch 172/500
 - 0s - loss: 1.0254 - acc: 0.7833
Epoch 173/500
 - 0s - loss: 1.0341 - acc: 0.7333
Epoch 174/500
 - 0s - loss: 1.0116 - acc: 0.8167
Epoch 175/500
 - 0s - loss: 1.0191 - acc: 0.8000
Epoch 176/500
 - 0s 

 - 0s - loss: 0.2562 - acc: 1.0000
Epoch 324/500
 - 0s - loss: 0.2534 - acc: 1.0000
Epoch 325/500
 - 0s - loss: 0.2530 - acc: 1.0000
Epoch 326/500
 - 0s - loss: 0.2477 - acc: 1.0000
Epoch 327/500
 - 0s - loss: 0.2488 - acc: 1.0000
Epoch 328/500
 - 0s - loss: 0.2438 - acc: 1.0000
Epoch 329/500
 - 0s - loss: 0.2439 - acc: 1.0000
Epoch 330/500
 - 0s - loss: 0.2394 - acc: 1.0000
Epoch 331/500
 - 0s - loss: 0.2364 - acc: 1.0000
Epoch 332/500
 - 0s - loss: 0.2360 - acc: 1.0000
Epoch 333/500
 - 0s - loss: 0.2324 - acc: 1.0000
Epoch 334/500
 - 0s - loss: 0.2317 - acc: 1.0000
Epoch 335/500
 - 0s - loss: 0.2285 - acc: 1.0000
Epoch 336/500
 - 0s - loss: 0.2266 - acc: 1.0000
Epoch 337/500
 - 0s - loss: 0.2246 - acc: 1.0000
Epoch 338/500
 - 0s - loss: 0.2223 - acc: 1.0000
Epoch 339/500
 - 0s - loss: 0.2198 - acc: 1.0000
Epoch 340/500
 - 0s - loss: 0.2189 - acc: 1.0000
Epoch 341/500
 - 0s - loss: 0.2176 - acc: 1.0000
Epoch 342/500
 - 0s - loss: 0.2140 - acc: 1.0000
Epoch 343/500
 - 0s - loss: 0.2158

Epoch 491/500
 - 0s - loss: 0.0795 - acc: 1.0000
Epoch 492/500
 - 0s - loss: 0.0773 - acc: 1.0000
Epoch 493/500
 - 0s - loss: 0.0778 - acc: 1.0000
Epoch 494/500
 - 0s - loss: 0.0763 - acc: 1.0000
Epoch 495/500
 - 0s - loss: 0.0754 - acc: 1.0000
Epoch 496/500
 - 0s - loss: 0.0754 - acc: 1.0000
Epoch 497/500
 - 0s - loss: 0.0747 - acc: 1.0000
Epoch 498/500
 - 0s - loss: 0.0738 - acc: 1.0000
Epoch 499/500
 - 0s - loss: 0.0735 - acc: 1.0000
Epoch 500/500
 - 0s - loss: 0.0727 - acc: 1.0000


<keras.callbacks.History at 0x237738f8278>

In [41]:
generate_seq(model,tokenizer,max_length-1,'No',100)

'No meio do caminho tinha uma pedra tinha uma pedra no meio do caminho tinha uma pedra no meio do caminho tinha uma pedra nunca me esquecerei desse acontecimento na vida de minhas retinas tão fatigadas nunca me esquecerei que no meio do caminho tinha uma pedra tinha uma pedra no meio do caminho no meio do caminho tinha uma pedra no nunca esquecerei me me esquecerei desse vida vida de minhas retinas retinas tão fatigadas nunca me esquecerei que no meio do caminho tinha uma pedra tinha uma pedra no meio do caminho no meio do caminho tinha uma pedra'