RNN (LSTM, GRU) for generating haiku based on UTF-16 examples
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
data
save
saveBasho
saveIssa
LICENSE
README.md
model.py
sample.py
train.py
utils.py

README.md

haiku_rnn

This model is based upon a Tensorflow version of Andrej Karpathy's char-rnn that was developed by Sherjil Ozair. It has been modified to work with UTF-16 files that better account for all of the characters of Japanese.

Data included are the haiku of Matsuo Basho and an incomplete set of the haiku of Kobayashi Issa. Both were members of the "Great Four" haiku masters in Japan. Issa, in particular, was especially prolific, composing over 20,000 haiku in his lifetime.

The Basho haiku are cleaned, with one haiku per line. However, Basho composed fewer than 1,000 haiku. The Issa haiku data comes from the website of Ichiro Kobayashi, of the Nagano Regional History Study Group. This data is in the process of being cleaned and split onto three lines each. In its current state, the structure of the website from which it was scraped is reflected in the predictions (as would be expected).

The model can be trained using train.py and sampled from using sample.py.

Please contact Henry Wolf if you have access to a larger corpus of East Asian poetry upon which this could be tested.