Implementation of "Unsupervised joke generation from big data" (ACL 2013)
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
README.md
joke.py
ngram.py
wordnet.py

README.md

nazokake

Implementation of Japanese version of "Unsupervised joke generation from big data" (ACL 2013)

Training

You have to prepare Japanese WordNet sqlite3 database. Download from here (http://nlpwww.nict.go.jp/wn-ja/). Put wnjpn.db in the same directory as the scripts.

To train the model, run this command. $ python joke.py --corpus [your n-gram file]

N-gram file should consist of the line which has the format as follows: [token-1][token-2]...[token-n][count]

Google N-gram corpus follows this format, so you can use it as corpus.

Generation

To generate the nazokake, run this command.

$ python joke.py --model [model generated by training mode]

Example Output

https://twitter.com/pepsin_amylase/status/505420501112455168

References

Sasa Petrovic and David Matthews, "Unsupervised joke generation from big data," The 51st Annual Meeting of the Association for Computational Linguistics - Short Papers (ACL Short Papers 2013), Sofia, Bulgaria, August 4-9, 2013

yanbe.diff, "Frontend program to search from Japanese WordNet database." (Japanese: 日本語WordNetのデータベースを探索するフロントエンドプログラム) http://subtech.g.hatena.ne.jp/y_yanbe/20090314/p2