RESOURCES! #11

dariusk · 2013-11-01T17:08:25Z

This is an open issue where you can comment and add resources that might come in handy for NaNoGenMo.

NOTE: at some point I will turn this into a more organized document, probably on the wiki for this repo.

dariusk · 2013-11-01T17:15:57Z

A submission from @scottmadin:

Python Markov chains: https://pypi.python.org/pypi/PyMarkovChain/
Python internet archive api: https://pypi.python.org/pypi/internetarchive/0.4.4

Also, similar things in NodeJS:

https://npmjs.org/package/archive.org
https://npmjs.org/package/markov

darkliquid · 2013-11-01T17:21:16Z

http://stackoverflow.com/questions/353274/story-telling-building-algorithms

willf · 2013-11-01T17:47:15Z

I wrote a "Samsa bot" that uses Bing's Ngram database to generate text. You might find it and the associated libraries useful (all Ruby).

https://github.com/willf/microsoft_ngram/blob/master/examples/samsabot.rb

General library:

https://github.com/willf/microsoft_ngram

dariusk · 2013-11-01T18:16:04Z

Since @willf is too humble to plug it, Wordnik is an indispensable resource for all things text-related: definitions, parts of speech, random words, rhymes, hypernyms, etc:

http://developer.wordnik.com/docs.html#!/word

vitorio · 2013-11-01T18:23:57Z

Here's a dump of my notes about generating stories:

@rfreebern researched this problem a few years back for this game project of his:

Curses! is a single-player open-ended adventure game with the basic premise that the player is a fairy tale villain bent on wrecking many potential fairy tales as completely as possible. Fairy tale plots would be generated on-the-fly based on a basic generator template that attempts to intelligently combine dozens or hundreds of very basic fairy tale elements to create situations that are both unique and familiar. The PC's goal is not to just thwart the happy ending but to do it thoroughly: not just kill the handsome prince, but cripple and disfigure him while making the princess hate him and get exiled from her kingdom, for example.

Fairy tales are really well-explored variants of the standard storytelling archetypes described by people like Joseph Campbell. There are a couple of ways that fairy tales are organized, which include their plot outlines (although not their cultural or moral implications): Aarne-Thompson, and Propp. http://en.wikipedia.org/wiki/Aarne-Thompson_classification_system

Propp's classification system has been used as the basis for a number of generators and is still the most-used mechanism in the academic literature for such things: http://en.wikipedia.org/wiki/Vladimir_Propp

Propp generators are things like: http://www.fdi.ucm.es/profesor/fpeinado/projects/kiids/apps/protopropp/

Clicking through to their later Bard system shows examples at the bottom, and that whole KIIDS things is for interactive narrative and computational narratology, which are the academic terms for this sort of thing (I call my work in this area automated storytelling with post-hoc computational narratives, as my use and implementation aren't for interaction).

Mark Finlayson's work out of MIT is a little more recent: http://www.mit.edu/~markaf/research.html

Plugging any of that research into Google Scholar and looking at recent citations of those papers are a good way to catch up.

The massively-multiplayer video game Star Wars Galaxies tried something along these lines with their Dynamic Points of Interest, but they weren't really well executed from a design and technical implementation perspective. They had a lot of potential, but Raph Koster describes their problems here: http://www.raphkoster.com/2010/04/30/dynamic-pois/

Outside of fairy tales, there are works like Plotto, which provide narrative guides to plot generation, and the monomyth-related works by Campbell, etc.: http://www.brainpickings.org/index.php/2012/01/06/plotto/

Plotto is actually in the public domain, and can be found in the Internet Archive here: https://archive.org/details/plottonewmethodo00cook

And journalism is getting into it, too. A program at Northwestern worked out so well, taking sports stats and turning them into sports articles, they didn't publish much research at all and went right into a startup. The Wired article is here: http://www.wired.com/gadgetlab/2012/04/can-an-algorithm-write-a-better-news-story-than-a-human-reporter/all/1

The one paper I found by the Northwestern group cites one major paper from 1977 about "Tale-spin." You can look for citations from the Tale-spin article, and that brings up some interesting recent work from elsewhere: http://scholar.google.com/scholar?cites=8316499405683938909&as_sdt=5,44&sciodt=0,44&hl=en

Finally, there's this failed Kickstarter: http://www.kickstarter.com/projects/storybricks/storybricks-the-mmorpg-storytelling-toolset

Even more finally, I also found this PDF in a second set of notes: https://research.cc.gatech.edu/inc/content/sequential-recommendation-approach-interactive-personalized-story-generation

darrentorpey · 2013-11-01T18:40:26Z

Thanks, @vitorio! That looks helpful.

smadin · 2013-11-01T19:02:20Z

(OK, I made a github account.)
https://pypi.python.org/pypi/wikipedia/1.0.3 is a python interface to wikipedia, which may also be helpful for the quick-and-dirty Markov-chain approach. It was very easy to hack together a script to fetch random Wikipedia tables for source text and churn out a "novel" of a given word-count.

nickheer · 2013-11-01T19:49:25Z

SC Chen's Simple HTML DOM Parser for PHP.

dariusk · 2013-11-01T19:54:09Z

While in-browser DOM manipulation is obviously ruled by jQuery, my favorite NodeJS DOM parser/manipulator is Cheerio, which uses jQuery-style selectors.

Also if you're in Ruby and need to do HTML/XML parsing, Nokogiri rules the roost.

rfreebern · 2013-11-01T19:58:27Z

I'm hanging out in #nanogenmo on FreeNode if anyone wants to join. We can toss ideas around on a casual basis there.

dariusk · 2013-11-01T20:02:01Z

For those who aren't super IRC-literate, or just don't want to install an irc client, you can go here, pick a username, and visit #nanogenmo from your web browser:

http://webchat.freenode.net/?channels=#nanogenmo

jiko · 2013-11-01T21:20:57Z

The Bard project looks awesome. Thanks @vitorio!

jiko · 2013-11-01T22:35:01Z

Some Python resources:

Natural Language Toolkit. This library is pretty huge and academic.
Pattern. Even more huge and academic.
TextBlob. Still lots of features but more approachable.

agladysh · 2013-11-02T10:52:27Z

An article about generator of Recursive Fairy Tales in Haskell (in Russian): http://habrahabr.ru/post/136007/

Google Translate: http://translate.google.com/translate?hl=en&sl=ru&tl=en&u=http%3A%2F%2Fhabrahabr.ru%2Fpost%2F136007%2F

darkliquid · 2013-11-02T19:21:47Z

Not strictly related, but there are several story-based/narrative-focused roleplaying games that could be used/formalised into a system for generating overall plot structures. I'm currently looking at Microscope, Fiasco and FATE Core as potential systems for having characters 'play' through a game and recording what they do and what actions they take to generate stories.

jiko · 2013-11-02T23:17:42Z

Here's some of my Python code for generating sentences based on supplied text. None of the Twitter-related code has been tested with v1.1 of the Twitter API, but worked fine on v1.

Jambot, my first Twitter bot. Uses a 3-gram Markov model by default.
JamLitBot, a site that generates random 'sentences' and runs on Heroku. Here is the source code, which builds on JamBot's.
@lovecraft_ebooks also builds on JamBot, but uses a 4-gram Markov model.
omnibot simplifies bot creation and management. It includes three distinct text-generation methods.
wikov makes Lorem Ipsum from Wikipedia pages using a 2-gram Markov model.

jiko · 2013-11-03T01:27:02Z

The Dada Engine, which powers the infamous Postmodernism Generator, might come in handy. There's an online manual and a clone on GitHub.

erkyrath · 2013-11-03T17:37:45Z

Not a resource, but a suggestion: when you complete a novel, change the title of your issue to "$NovelTitle by $Author", so that we can easily browse them.

(Yeah, someone is now going to actually title their novel "$NovelTitle".)

If I were an over-organizational nerd, I would suggest setting up appropriate issue tags ("In Progress", "Complete", "Stupid Ideas", etc). But I leave that up to whether Darius is an over-organizational nerd.

dariusk · 2013-11-03T19:38:57Z

I agree with you @erkyrath -- I'll try and prod people to do that when they're done. Issue tags... I might start labeling things myself!

dariusk · 2013-11-03T19:46:49Z

Okay, I opened a new Issue ( #42 ) for general discussion. This thread remains the place for technical resources; the other thread is open to everything else.

vitorio · 2013-11-03T21:31:14Z

Ficly ( http://ficly.com/stories and its predecessor Ficlets http://ficlets.ficly.com/ ) is a very-short-story writing community, where you have a 1024 character limit. There are lots of tiny stories on the site, but also, you can fork any story and write prequels and sequels to it. Some stories have multiple prequels and sequels, like an unintentional choose-your-own-adventure.

All of the Ficly and Ficlets content is licensed CC-BY-SA.

In late May 2013, I scraped all of Ficly and dumped 13,144 stories, all of which had at least one prequel or sequel, into a matching amount of JSON files (there should be no standalone 1k character stories). Each JSON file records the ID, URL and title of the story; the author's avatar, name and URL; the IDs and URLs of prequels and sequels; and the story content in Markdown.

The scraper (in Python) is probably a little prickly, as it's mostly uncommented, but the .zip of 13k JSON files could be dumped straight into a JSON document store and worked with directly. Perhaps someone wants to generate 50k words of choose-your-own-adventure stories or something.

https://github.com/vitorio/NaNoGenMo2013

darkliquid · 2013-11-03T22:22:02Z

I've done some basic gathering of info over a few sources to generate a bunch of sentence structures using parts-of-speech tagging while I've been researching. Other might find this useful, so you can find them here: https://github.com/darkliquid/NaNoGenMo/tree/master/data

The data is basically one sentence to a line, each line containing a stream of space separated parts-of-speech tags. There are likely to be mistakes in the set as I've hacked this together without any real understanding of what it is I'm doing or what I yet hope to achieve from it, but have at it and good luck!

dariusk · 2013-11-03T22:35:12Z

To be clear, @darkliquid's output can be interpreted by looking at this list of part of speech tags.

aparrish · 2013-11-03T23:13:51Z

this might be inspiring for some folks http://en.wikipedia.org/wiki/Postmodern_literature#Common_themes_and_techniques

catseye · 2013-11-03T23:24:35Z

It would be very difficult to use it in an automated way (and I realize it may be unpopular with some participants) but if you haven't heard of it, there's this site called TVTropes. It contains a vast array of, well, tropes (from fiction in general, mostly mass-media but not exclusively television,) pre-deconstructed for your convenience. For example, Applied Phlebotinum.

lazerwalker · 2013-11-04T15:15:25Z

Speaking of parts-of-speech tagging (cc @darkliquid), if you're literate in Objective-C Apple's NSLinguisticTagger API is fantastic. (http://nshipster.com/nslinguistictagger/)

darkliquid · 2013-11-04T16:18:18Z

Wow, that is nice. Sadly it's of no use to me in linux world but that looks like a much richer source of data for the kinds of analysis I'm looking to do.

On another note, I've started annotating the parts-of-speech tag definitions with example words and some extra rules for their use in sentences where applicable (which hopefully I can then use to scan my sentence structure list to bin structures that are grammatically incorrect). https://github.com/darkliquid/NaNoGenMo/blob/master/data/tag_types.txt

enkiv2 · 2013-11-04T16:29:21Z

WordNet can be coaxed into doing part of speech tagging (in addition to
providing synonyms, antonyms, and other related words), although part of
speech tagging requires a hack (iterate over parts of speech until the word
has a synonym in that group, then guess which part of speech the word is
actually being used as). I'd recommend using that on *nix, since it has
other (more useful) functions.

Tangentially, I have a resource to contribute.
https://github.com/enkiv2/synonym-warp will take a text document and
randomly replace some words with synonyms (which slightly warps the
semantics since the synonyms it uses aren't necessarily appropriate to the
context). It expects to run on a unix under zsh, with wordnet in the path.
I'm planning to run input texts through it before training a markov model,
to add a little noise.

On Mon, Nov 4, 2013 at 11:18 AM, Andrew Montgomery-Hurrell <
notifications@github.com> wrote:

Wow, that is nice. Sadly it's of no use to me in linux world but that
looks like a much richer source of data for the kinds of analysis I'm
looking to do.

On another note, I've started annotating the parts-of-speech tag
definitions with example words and some extra rules for their use in
sentences where applicable (which hopefully I can then use to scan my
sentence structure list to bin structures that are grammatically
incorrect).
https://github.com/darkliquid/NaNoGenMo/blob/master/data/tag_types.txt

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/11#issuecomment-27698071
.

warnaars · 2013-11-09T09:22:18Z

You might find this an interesting take on 'automated content authorship'
http://youtu.be/SkS5PkHQphY

MichaelPaulukonis · 2013-11-09T14:23:30Z

@warnaars Philip M. Parker! I would love to see some of his novelistic output.... I'd really love to see some of his code. I've got some more links on him at http://www.xradiograph.com/WordSalad/AutomaticForThePeople

lilinx · 2013-11-09T21:39:25Z

"If the atoms have by chance formed so many sorts of figures, why did it never fall out that they made a house or a shoe? Why at the same rate should we not believe that an infinite number of Greek letters, strewed all over a certain place, might fall into the contexture of the Iliad?"
Michel de Montaigne (1533-1592), Essais

ikarth · 2013-11-12T23:13:03Z

For that matter, how about a Library of Babel generator? (Not mine) http://dicelog.com/babel

notio · 2013-11-12T23:16:25Z

Not open source, but still! The Fiction Idea Generator is interesting: http://figapps.net/fig.html

It's free this month (iTunes): https://itunes.apple.com/app/fiction-idea-generator-ef/id507536455?mt=8

lilinx · 2013-11-14T21:18:08Z

Also you might be interested in the works of Jean-Pierre Balpe
This man has been doing generative literature experiment for a while. He has countless bot-blogs generating the weirdest things. Unfortunately he seems to do everything in French : it's very difficult to find anything about him in English (even no english wikipedia article). But there is this short article : http://www.digitalarti.com/blog/digitalarti_mag/portrait_jean_pierre_balpe_inventor_of_literature

catseye · 2013-11-21T19:59:26Z

In one issue here somewhere I obliquely suggested generating a graphic novel -- that is to say, a comic book. While I would love to try, I definitely won't have the time to do this in what remains of November, but here are some resources I found while researching it:

http://openclipart.org is a collection of SVG images, all in the public domain. It can also render them as PNGs for you, at the scale you choose. It has a JSON API: http://openclipart.org/developers

If you wanted to use that JSON API on your own web page (perhaps to display these images on an HTML5 canvas element) you could use this generic JSONP proxy to make a mockery of the same-origin policy: http://jsonp.jit.su/

Here is a library of onomatopoeic sound-effects: http://www.writtensound.com/index.php Not sure how easy it would be to scrape, but probably wouldn't be hard to pick a random item from a desired category, like: http://www.writtensound.com/index.php?term=movement

Here is a list of catchphrases: https://en.wikipedia.org/wiki/List_of_catchphrases

And, just for that extra dadaist touch & in no way limited to graphic novels, here is a list of various abuses of the statistical meaning of p-value, collected from various academic papers: http://mchankins.wordpress.com/2013/04/21/still-not-significant-2/

What I imagine the result of using these resources to be something like:

a sombrero with a word balloon saying "Cowabunga" next to Tux (the Linux penguin) with a thought bubble saying "did not quite reach conventional levels of statistical significance (p=0.079)"... with the word SCHHWAFF at a slight angle and in a large-point font, in the background

MichaelPaulukonis · 2013-11-21T21:17:09Z

@catseye check out blotcomics and the graphic novel harsh noise.

I can't shake the feeling that the end result of your automation, however, will end up looking like ELER.
source

ikarth · 2013-11-21T22:15:09Z

If we're going graphical I should probably mention the billion-year archives of the webcomic mezzacotta: http://www.mezzacotta.net/

bredfern · 2015-11-18T08:54:01Z

You can take a look at the text of my Automated Lovecraft project here: https://github.com/bredfern/automated-lovecraft/blob/master/automated_lovecraft.md

bredfern · 2015-11-18T08:57:14Z

The interesting thing I learned is that more firepower doesn't produce a better result there's a sweet spot between the size of the data set and the number of layers, so to train on all of lovecraft's text I got the best results using torch with just 4 layers. Since I was running off char nn most of the code I wrote and just bash script actually to run torch processes. I want to get deeper into this stuff so I can go further with it but its exciting to see the training result never having done this before.

hugovk · 2015-11-18T09:21:41Z

@bredfern Wrong repo! This is the 2013 one, here's this year's: dariusk/NaNoGenMo-2015#1

dariusk mentioned this issue Nov 2, 2013

Great idea! #20

Open

aparrish mentioned this issue Nov 3, 2013

participant #41

Open

dariusk mentioned this issue Oct 20, 2014

Resources dariusk/NaNoGenMo-2014#1

Open

cpressey mentioned this issue Nov 11, 2014

collective consciousness fiction generator dariusk/NaNoGenMo-2014#83

Open

MichaelPaulukonis mentioned this issue Apr 7, 2015

Resources NaPoGenMo/NaPoGenMo2015#1

Open

dariusk mentioned this issue Oct 25, 2015

Resources dariusk/NaNoGenMo-2015#1

Open

hugovk mentioned this issue Apr 4, 2016

Resources NaPoGenMo/NaPoGenMo2016#1

Open

dariusk mentioned this issue Oct 22, 2016

Resources NaNoGenMo/2016#1

Open

hugovk mentioned this issue Mar 25, 2017

Resources NaPoGenMo/NaPoGenMo2017#1

Open

hugovk mentioned this issue Aug 8, 2017

Resources NaNoGenMo/2017#1

Open

hugovk mentioned this issue Oct 3, 2018

Resources NaNoGenMo/2018#1

Open

hugovk mentioned this issue Oct 1, 2019

Resources NaNoGenMo/2019#1

Open

hugovk mentioned this issue Sep 9, 2020

Resources NaNoGenMo/2020#1

Open

hugovk mentioned this issue Mar 8, 2021

Resources NaPoGenMo/NaPoGenMo2021#1

Open

hugovk mentioned this issue Oct 9, 2021

Resources NaNoGenMo/2021#1

Open

dluman mentioned this issue Mar 20, 2022

Resources NaPoGenMo/NaPoGenMo2022#1

Open

hugovk mentioned this issue Sep 19, 2022

Resources NaNoGenMo/2022#1

Open

dluman mentioned this issue Mar 25, 2023

Resources NaPoGenMo/NaPoGenMo2023#1

Open

hugovk mentioned this issue Sep 30, 2023

Resources NaNoGenMo/2023#1

Open

dluman mentioned this issue Mar 23, 2024

Resources NaPoGenMo/NaPoGenMo2024#1

Open

RESOURCES! #11

RESOURCES! #11

Comments

dariusk commented Nov 1, 2013

dariusk commented Nov 1, 2013

darkliquid commented Nov 1, 2013

willf commented Nov 1, 2013

dariusk commented Nov 1, 2013

vitorio commented Nov 1, 2013

darrentorpey commented Nov 1, 2013

smadin commented Nov 1, 2013

nickheer commented Nov 1, 2013

dariusk commented Nov 1, 2013

rfreebern commented Nov 1, 2013

dariusk commented Nov 1, 2013

jiko commented Nov 1, 2013

jiko commented Nov 1, 2013

agladysh commented Nov 2, 2013

darkliquid commented Nov 2, 2013

jiko commented Nov 2, 2013

jiko commented Nov 3, 2013

erkyrath commented Nov 3, 2013

dariusk commented Nov 3, 2013

dariusk commented Nov 3, 2013

vitorio commented Nov 3, 2013

darkliquid commented Nov 3, 2013

dariusk commented Nov 3, 2013

aparrish commented Nov 3, 2013

catseye commented Nov 3, 2013

lazerwalker commented Nov 4, 2013

darkliquid commented Nov 4, 2013

enkiv2 commented Nov 4, 2013

warnaars commented Nov 9, 2013

MichaelPaulukonis commented Nov 9, 2013

lilinx commented Nov 9, 2013

ikarth commented Nov 12, 2013

notio commented Nov 12, 2013

lilinx commented Nov 14, 2013

catseye commented Nov 21, 2013

MichaelPaulukonis commented Nov 21, 2013

ikarth commented Nov 21, 2013

bredfern commented Nov 18, 2015

bredfern commented Nov 18, 2015

hugovk commented Nov 18, 2015