Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Intent to participate [First lines of novels] #75

Open
janelleshane opened this issue Nov 4, 2017 · 3 comments
Open

Intent to participate [First lines of novels] #75

janelleshane opened this issue Nov 4, 2017 · 3 comments
Labels
completed For completed novels! preview There is an excerpt somewhere in the thread!

Comments

@janelleshane
Copy link

janelleshane commented Nov 4, 2017

A tiny dataset produced mixed results in my first attempt to generate the first sentence of a novel http://aiweirdness.com/post/167049313837/a-neural-network-tries-writing-the-first-sentence

Highlights:

  • There was a man and he had seventy first sight.
  • It is a truth universally acknowledged, that a single man in possession of a good fortune must be in want of my life, fire of my loins.
    Lowlights:
  • Stop! I caused the Narguuse man who was new on Alabama, the screaming constipated eggs.
  • I am an angry grass, the symposium square, proved fatal to the throbbing, the howling wind tire…

The really big repositories I've found (Project Gutenburg, for example) are formatted inconsistently enough that they're difficult to scrape.

So now I'm crowdsourcing a larger dataset: https://docs.google.com/forms/d/e/1FAIpQLScod8P-kcLX98u6gT0rX6-20GwkDo_glz-okVVkrhr6KgQONQ/viewform. This has been posted for about 36 hours and already has 3532 submissions (not all unique). People are welcome to contribute through this form - or let me know if you have a smarter way to contribute a dataset.

At the end of the month, I'll try again with a hopefully much larger dataset, and post the results and dataset afterwards, as well as a link to whatever open-source package I end up using. It won't produce a full novel in the traditional sense, but I'll declare a moral victory if a human announces their admiration of one of the neural network's lines.

@janelleshane
Copy link
Author

Marking this one complete! Big thanks to everyone who contributed to the dataset.

Writeup and highlights here: http://aiweirdness.com/post/168051907512/the-first-line-of-a-novel-by-an-improved-neural

I ended up using a syll-rnn (lstm mode) to do the generation, which ran for about 16 hours on my Macbook. Syll-rnn seems to be better at larger datasets than char-rnn, yet can handle a larger vocabulary than word-rnn. Here's the framework I used:

https://github.com/learningtitans/torch-rnn/blob/valle-syllables/doc/flags.md#preprocessing

Sequence length was 40 syllables (based roughly on the number of syllables in "It is a truth universally acknowledged that a single man in possession of a good fortune must be in want of a wife."
LSTM size is 512, 3 layers (based on what would fit on my computer; I'm running a 1064-size LSTM now but it's taking a long time and it's not clear that the results will be any better).

140,000 words of output available here. Unfortunately, due to a prank in the input data that I didn’t catch till after I trained the neural network, 37,000 of them are the word “sand”.

https://github.com/janelleshane/novel-first-lines-dataset/blob/master/output_checkpoint10000_temp0p6.txt

Crowdsourced dataset available here: https://github.com/janelleshane/novel-first-lines-dataset

@hugovk hugovk added the completed For completed novels! label Nov 30, 2017
@hugovk
Copy link
Member

hugovk commented Nov 30, 2017

(We're using issues as a sort of forum, so I'll re-open this to make it easier to find.)

Good stuff!

Unfortunately, due to a prank in the input data that I didn’t catch till after I trained the neural network, 37,000 of them are the word “sand”.

I think the eternal sand is quite appropriate for NaNoGenMo!

As a way at the ground, and the cat could have been in the town and a shock and the type on the back of the pilsage and belched and the color of the great little person who was still and the imface of the decoction of the heat between the box against the three interesting seament and the eternal sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand sand ...

@hugovk hugovk reopened this Nov 30, 2017
@hugovk hugovk added the preview There is an excerpt somewhere in the thread! label Nov 30, 2017
@janelleshane
Copy link
Author

Thanks for clearing that up! And for adding the completed tag!

Yes, eternal sand. People have been making Star Wars jokes at me all day.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
completed For completed novels! preview There is an excerpt somewhere in the thread!
Projects
None yet
Development

No branches or pull requests

2 participants