<a href="https://colab.research.google.com/github/Aaronsom/poem-generation/blob/master/colab_poem_generator_word.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
!git clone https://github.com/Aaronsom/poem-generation
%cd poem-generation
%mkdir models
!wget https://s3.amazonaws.com/dl4j-distribution/GoogleNews-vectors-negative300.bin.gz
!gunzip GoogleNews-vectors-negative300.bin

Cloning into 'poem-generation'...
remote: Enumerating objects: 109, done.[K
remote: Counting objects: 100% (109/109), done.[K
remote: Compressing objects: 100% (75/75), done.[K
remote: Total 109 (delta 66), reused 72 (delta 32), pack-reused 0[K
Receiving objects: 100% (109/109), 1.92 MiB | 10.21 MiB/s, done.
Resolving deltas: 100% (66/66), done.
/content/poem-generation
--2019-05-31 14:39:04--  https://s3.amazonaws.com/dl4j-distribution/GoogleNews-vectors-negative300.bin.gz
Resolving s3.amazonaws.com (s3.amazonaws.com)... 52.216.165.125
Connecting to s3.amazonaws.com (s3.amazonaws.com)|52.216.165.125|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1647046227 (1.5G) [application/x-gzip]
Saving to: ‘GoogleNews-vectors-negative300.bin.gz’


2019-05-31 14:39:27 (67.2 MB/s) - ‘GoogleNews-vectors-negative300.bin.gz’ saved [1647046227/1647046227]



In [0]:
from tensorflow.keras.callbacks import ModelCheckpoint, CSVLogger
import tensorflow.train as optimizer
from poem_generator.dataGenerator import TupleDataGenerator
import poem_generator.data_prepocessing as dp
import poem_generator.embedding as embedding_loader
from poem_generator.global_constants import TRAINING_DATA, EMBEDDING_DIMENSION, EMBEDDING_BINARY, MODELS_DICT
from poem_generator.transformer import transformer
from tensorflow.keras.layers import *
from tensorflow.keras import Sequential
from tensorflow.contrib.tpu import keras_to_tpu_model, TPUDistributionStrategy
from tensorflow.contrib.cluster_resolver import TPUClusterResolver
import os

In [3]:
def bidirectional_lstm(n, embedding, vocab_len):
    model = Sequential([
        Embedding(input_dim=vocab_len, output_dim=EMBEDDING_DIMENSION, input_length=n, weights=[embedding], trainable=False),
        Bidirectional(LSTM(1024, return_sequences=True)),
        Bidirectional(LSTM(1024, return_sequences=False)),
        Dropout(0.1),
        Dense(vocab_len, activation="softmax")
    ])
    return model

ns = [5]
epochs = 20
batch_size = 1024
max_limit = 25000
validation_split = 0.9

poems = dp.tokenize_poems(TRAINING_DATA)
words = sorted(list(set([token for poem in poems for token in poem])))

#Save embedding for generator
embedding, dictionary = embedding_loader.get_embedding(words, binary=EMBEDDING_BINARY, limit=max_limit, save=True, file="GoogleNews-vectors-negative300.bin")

#model = load_model(MODELS_DICT+"/5model.hdf5", custom_objects={"PositionalEncoding": PositionalEncoding, "Attention": Attention})
#model = transformer(100, embedding, len(dictionary), single_out=True, train_embedding=False, input_sequence_length=20)
model = bidirectional_lstm(5, embedding, len(dictionary))
model.summary()
tpu_model = keras_to_tpu_model(
    model,
    strategy=TPUDistributionStrategy(
        TPUClusterResolver(tpu='grpc://' + os.environ['COLAB_TPU_ADDR'])
    )
)
tpu_model.compile(optimizer=optimizer.AdamOptimizer(),
            loss="categorical_crossentropy", metrics=["accuracy"])

generator = TupleDataGenerator(poems[:int(validation_split*len(poems))], ns, dictionary, 0, batch_size, single=True)
validation_generator = TupleDataGenerator(poems[int(validation_split*len(poems)):], ns, dictionary, 0, batch_size, single=True)
callbacks = [ModelCheckpoint(MODELS_DICT+"/model.hdf5", save_best_only=True),
           CSVLogger(MODELS_DICT+"/log.csv", append=True, separator=';')]
tpu_model.fit_generator(
  generator, epochs=epochs, callbacks=callbacks, validation_data=validation_generator, workers=4)

Instructions for updating:
Colocations handled automatically by placer.
Instructions for updating:
Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`.
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding (Embedding)        (None, 5, 300)            2662800   
_________________________________________________________________
bidirectional (Bidirectional (None, 5, 2048)           10854400  
_________________________________________________________________
bidirectional_1 (Bidirection (None, 2048)              25174016  
_________________________________________________________________
dropout (Dropout)            (None, 2048)              0         
_________________________________________________________________
dense (Dense)                (None, 8876)              18186924  
Total params: 56,878,140
Trainable params: 54,215,340
Non-trainable params: 2,662,800
_

KeyboardInterrupt: ignored

In [0]:
!mkdir generated

In [5]:
from poem_generator.word_generator import generate_poems
n = 5
generate_poems(1000, n, "generated/"+str(n)+"-poems.zip", MODELS_DICT+"/model.hdf5", single=True)

Using TensorFlow backend.


1/1000
seen t seen seen d seen seen seen seen seen seen seen seen seen seen seen seen seen seen seen seen seen seen seen seen seen seen seen seen o seen seen seen seen seen h seen seen seen seen seen seen seen seen seen seen seen seen seen seen seen seen seen seen seen seen seen seen seen 


2/1000
n rich d rich rich rich rich rich rich rich rich rich rich rich rich rich rich rich rich rich rich rich rich rich rich rich rich rich rich rich rich rich rich rich rich rich rich rich rich rich rich rich rich rich rich rich rich pastoral rich rich rich rich rich rich rich rich rich rich rich 


3/1000
t r long t e f e e again b he b n e he e w even he e e l v 
such i k such such such such such such such i such such such such such such such such such l such airs e r e h 
n 
from 
o t e 


4/1000
h e weigh o peace e r e peace w peace peace peace peace peace peace peace peace peace peace peace peace peace peace peace peace peace peace peace peace peace peace peace peace peace peace peace peace 

TypeError: ignored