<a href="https://colab.research.google.com/github/Junhojuno/keras-tutorial/blob/master/08_Text_Generation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## 글자 수준의 LSTM 텍스트 생성모델
- 니체의 글을 사용하여 training...
- 학습된 모델은 일반적인 영어모델이 아닌 니체의 문체와 특정 주제를 따르는 모델이 될 것이다.

#### 데이터 전처리
- 말뭉치(Corpus) 다운받아 소문자로 바꿔준다.

In [6]:
# 말뭉치(Corpus) 다운받아 소문자로 바꿔준다.
import keras
import numpy as np

path = keras.utils.get_file(fname='nietzsche.txt', origin='https://s3.amazonaws.com/text-datasets/nietzsche.txt')
text = open(path).read().lower()
print("Corpus 크기 : ", len(text))
print(type(text))
print("내용일부 : \n", text[:500])

Corpus 크기 :  600893
<class 'str'>
내용일부 : 
 preface


supposing that truth is a woman--what then? is there not ground
for suspecting that all philosophers, in so far as they have been
dogmatists, have failed to understand women--that the terrible
seriousness and clumsy importunity with which they have usually paid
their addresses to truth, have been unskilled and unseemly methods for
winning a woman? certainly she has never allowed herself to be won; and
at present every kind of dogma stands with sad and discouraged mien--if,
indeed, it s


In [11]:
# 글자 시퀀스를 벡터로 바꿔주자.
# maxlen 길이의 sequence를 중복하여 추출.
# 추출된 sequence를 one-hot encoding으로 변환 --> (sequences, maxlen, unique_characters) 3D tensor로 합침
# target은 sequence 다음에 오는 one-hot encoding된 글자

maxlen = 60 # 60개 글자로 된 sequence 추출
step = 3 # 세글자씩 건너뀌면서 새로운 sequence 샘플링

sentences = []
next_chars = []

for i in range(0, len(text) - maxlen, step):
  sentences.append(text[i:i+maxlen])
  next_chars.append(text[i+maxlen])

print("sequences 갯수 : ", len(sentences))

chars = sorted(list(set(text)))
print("고유한 글자 갯수 : ", len(chars))
char_indices = dict((char, chars.index(char)) for char in chars)

print("Vectorized....")
x = np.zeros(shape=(len(sentences), maxlen, len(chars)), dtype=np.bool)
y = np.zeros(shape=(len(sentences), len(chars)), dtype=np.bool)

for i, sentence in enumerate(sentences):
  for t, char in enumerate(sentence):
    x[i,t, char_indices[char]] = 1
  y[i,char_indices[next_chars[i]]] = 1

sequences 갯수 :  200278
고유한 글자 갯수 :  57
Vectorized....


In [14]:
from keras.layers import *

model = keras.models.Sequential()
model.add(LSTM(128, input_shape=(maxlen, len(chars))))
model.add(Dense(len(chars), activation='softmax'))

model.summary()

model.compile(optimizer=keras.optimizers.RMSprop(lr=0.01), loss='categorical_crossentropy')

Instructions for updating:
Colocations handled automatically by placer.
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
lstm_1 (LSTM)                (None, 128)               95232     
_________________________________________________________________
dense_1 (Dense)              (None, 57)                7353      
Total params: 102,585
Trainable params: 102,585
Non-trainable params: 0
_________________________________________________________________


In [0]:
# 모델의 예측(어떤 글자일지)이 주어졌을때(이 예측을 받아)
# 새로운 글자를 샘플링하는 함수
def sample(preds, temperature=1.0):
  preds = np.asarray(preds).astype('float64')
  preds = np.log(preds) / temperature
  exp_preds = np.exp(preds)
  preds = exp_preds / sum(exp_preds)
  probabilities = np.random.multinomial(1, preds, size=1)
  return np.argmax(probabilities)

In [24]:
# text 생성 루프
import random
import sys

random.seed(42)
start_index = random.randint(0,len(text) - maxlen - 1)

for epoch in range(1,60):
  print("epoch : ",epoch)
  model.fit(x, y, batch_size=128, epochs=1) # 1번만 반복하여 training
  
  seed_text = text[start_index : start_index + maxlen]
  print('--- 시드 텍스트 :"' + seed_text + '"')
  
  for temperature in [0.2, 0.5, 1.0, 1.2]:
    print("----- Softmax Temperature : ", temperature)
    generated_text = seed_text
    sys.stdout.write(generated_text)
    
    for i in range(400):
      sampled = np.zeros((1, maxlen, len(chars)))
      for t, char in enumerate(generated_text):
        sampled[0, t, char_indices[char]] = 1.
        
      preds = model.predict(sampled, verbose=0)[0]
      next_index = sample(preds, temperature)
      next_char = chars[next_index]
      
      generated_text += next_char
      generated_text = generated_text[1:]
      
      sys.stdout.write(next_char)
      sys.stdout.flush()
    print()

epoch :  1
Epoch 1/1
--- 시드 텍스트 :"the slowly ascending ranks and classes, in which,
through fo"
----- Softmax Temperature :  0.2
the slowly ascending ranks and classes, in which,
through for the whole and the strengthing and self-who has an and the self--in the sense the strengthing something and soul and self--the sense and the more from the self--and such and interest in the fact and the self-such and the the self--the there is the sense of the self--the strengthing and interest and is the sense of the self--the sense the sense and the call the strengthing and an instincts and the
----- Softmax Temperature :  0.5
the slowly ascending ranks and classes, in which,
through for the strengtance and free with the sense of the than canacle perputues and the constament and are are be call severial be of prefentant and general still which his soul being. the german the soul the the stranger and such an art or which is the the understand in the hand
and the man from an extraden in an and under

  This is separate from the ipykernel package so we can avoid doing imports until


the standard of the entimates and believe in the presence and subject of the standard the measured the world of the standard of the presence of the fact the standard of the spirit of the standard the most nature and the masters of with the subtlent and the made and subject of the spirit of the highest a
----- Softmax Temperature :  0.5
the slowly ascending ranks and classes, in which,
through formed the man and a sublimest
who with deception of the conduct the master in the same soul the most life is a prisonour and the distinction of himself. the worker to the measure the heart as a thing for the master and in a new docture for the marves and conscience of which we are an exercity in the presence of philosophers and profound of a sublimest how be the subtle visition the sense it was bet
----- Softmax Temperature :  1.0
the slowly ascending ranks and classes, in which,
through formaring with the sphild
would be him health, we now soul a thing for up there is laugh itself of such be
but

KeyboardInterrupt: ignored