-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Gutengrep and Gutenstory by @hugovk #116
Comments
o wow o |
That works so well. The chapter titles are very code-y. They explicate. But perhaps they should be footnotes? They distract from Le Plaisir du Texte. |
Yeah, perhaps the chapter titles could be relegated elsewhere. I had thought about stripping out the code-y bits from the title (e.g. I'd been tweaking the number of random sentences to pick to end up with ~50k words, but then on this particular run the penultimate sentence landed on a 13k-word sentence from Joyce's Ulysses! |
BWAH HAH HAH HAH HAH! We owe much of the 20th century to Mr. Joyce. -Michael Paulukonis Sent from somewhere in the Cloud On Tue, Nov 25, 2014 at 9:38 AM, Hugo notifications@github.com wrote:
|
A refactoring bug just resulted in a book of 19,123,315 words instead of ~50k. |
Here's a 50,143-word second volume, with an added "enchant" chapter and the regexes relegated to an appendix: I like the beginning:
|
Here's an article about this and #50: http://www.theatlantic.com/technology/archive/2014/12/moby-dick-in-50000-meows-and-other-tales-that-computers-tell/383340/ |
Gutengrep and Gutenstory
Repo: https://github.com/hugovk/gutengrep
Gutengrep poetry generator
Riffing on a suggestion made in #55, I wrote a script to grep full sentences using regexes from the Project Gutenberg CD. It uses NTLK to find full sentences rather than the arbitrary lines in a file that grep finds. It can also sort them by shortest sentence first.
But first.
The OED has a word of the day email, and the quotations for "moonlit" struck me as particularly poetic:
Moonlit
Let's try this on Project Gutenberg. There are 597 text files on the CD containing 3,583,390 sentences. Full output of these can be found in the repo.
...
Once upon a time
Let's search for "once upon a time":
...
And then!
Or "And then" at the start of each sentence (regex:
[^\w]*And then
):...
...
But why?
Or "But why" at the start of each sentence (regex:
[^\w]*But why
):...
...
...
Happily ever after
Not many, so here's the full thing:
Gutenstory, a grepped story
To generate a full book, gutenstory.py repeatedly searches the 3,583,390 sentences in the 597 text files of the Project Gutenberg CD.
First it collects all the sentences containing "once upon a time". Next it collected all the sentences with "happily ever after" or ending "the end." Each chapter randomly begins and ends with one of these sentences.
After that, the remainder of each chapter's content is generated from 80 random sentences, sorted by length, of different sets of sentences. For example, one chapter of those beginning "But why". Another beginning "Of course", others starting "Suddenly" or "Presently", and yet more containing "year-old", "princess", "violin", "laughed", the months or days.
Here's example output of a 65,383-worder: HTML | PDF | MD
Generated with:
Then print to PDF using Chrome. Big thanks to @moonmilk for the CSS:
The text was updated successfully, but these errors were encountered: