New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RESOURCES! #11

Open
dariusk opened this Issue Nov 1, 2013 · 48 comments

Comments

Projects
None yet
@dariusk
Owner

dariusk commented Nov 1, 2013

This is an open issue where you can comment and add resources that might come in handy for NaNoGenMo.

NOTE: at some point I will turn this into a more organized document, probably on the wiki for this repo.

@dariusk

This comment has been minimized.

Show comment
Hide comment
@dariusk
Owner

dariusk commented Nov 1, 2013

@willf

This comment has been minimized.

Show comment
Hide comment
@willf

willf Nov 1, 2013

I wrote a "Samsa bot" that uses Bing's Ngram database to generate text. You might find it and the associated libraries useful (all Ruby).

https://github.com/willf/microsoft_ngram/blob/master/examples/samsabot.rb

General library:

https://github.com/willf/microsoft_ngram

willf commented Nov 1, 2013

I wrote a "Samsa bot" that uses Bing's Ngram database to generate text. You might find it and the associated libraries useful (all Ruby).

https://github.com/willf/microsoft_ngram/blob/master/examples/samsabot.rb

General library:

https://github.com/willf/microsoft_ngram

@dariusk

This comment has been minimized.

Show comment
Hide comment
@dariusk

dariusk Nov 1, 2013

Owner

Since @willf is too humble to plug it, Wordnik is an indispensable resource for all things text-related: definitions, parts of speech, random words, rhymes, hypernyms, etc:

http://developer.wordnik.com/docs.html#!/word

Owner

dariusk commented Nov 1, 2013

Since @willf is too humble to plug it, Wordnik is an indispensable resource for all things text-related: definitions, parts of speech, random words, rhymes, hypernyms, etc:

http://developer.wordnik.com/docs.html#!/word

@vitorio

This comment has been minimized.

Show comment
Hide comment
@vitorio

vitorio Nov 1, 2013

Here's a dump of my notes about generating stories:

@rfreebern researched this problem a few years back for this game project of his:

Curses! is a single-player open-ended adventure game with the basic premise that the player is a fairy tale villain bent on wrecking many potential fairy tales as completely as possible. Fairy tale plots would be generated on-the-fly based on a basic generator template that attempts to intelligently combine dozens or hundreds of very basic fairy tale elements to create situations that are both unique and familiar. The PC's goal is not to just thwart the happy ending but to do it thoroughly: not just kill the handsome prince, but cripple and disfigure him while making the princess hate him and get exiled from her kingdom, for example.

Fairy tales are really well-explored variants of the standard storytelling archetypes described by people like Joseph Campbell. There are a couple of ways that fairy tales are organized, which include their plot outlines (although not their cultural or moral implications): Aarne-Thompson, and Propp. http://en.wikipedia.org/wiki/Aarne-Thompson_classification_system

Propp's classification system has been used as the basis for a number of generators and is still the most-used mechanism in the academic literature for such things: http://en.wikipedia.org/wiki/Vladimir_Propp

Propp generators are things like: http://www.fdi.ucm.es/profesor/fpeinado/projects/kiids/apps/protopropp/

Clicking through to their later Bard system shows examples at the bottom, and that whole KIIDS things is for interactive narrative and computational narratology, which are the academic terms for this sort of thing (I call my work in this area automated storytelling with post-hoc computational narratives, as my use and implementation aren't for interaction).

Mark Finlayson's work out of MIT is a little more recent: http://www.mit.edu/~markaf/research.html

Plugging any of that research into Google Scholar and looking at recent citations of those papers are a good way to catch up.

The massively-multiplayer video game Star Wars Galaxies tried something along these lines with their Dynamic Points of Interest, but they weren't really well executed from a design and technical implementation perspective. They had a lot of potential, but Raph Koster describes their problems here: http://www.raphkoster.com/2010/04/30/dynamic-pois/

Outside of fairy tales, there are works like Plotto, which provide narrative guides to plot generation, and the monomyth-related works by Campbell, etc.: http://www.brainpickings.org/index.php/2012/01/06/plotto/

Plotto is actually in the public domain, and can be found in the Internet Archive here: https://archive.org/details/plottonewmethodo00cook

And journalism is getting into it, too. A program at Northwestern worked out so well, taking sports stats and turning them into sports articles, they didn't publish much research at all and went right into a startup. The Wired article is here: http://www.wired.com/gadgetlab/2012/04/can-an-algorithm-write-a-better-news-story-than-a-human-reporter/all/1

The one paper I found by the Northwestern group cites one major paper from 1977 about "Tale-spin." You can look for citations from the Tale-spin article, and that brings up some interesting recent work from elsewhere: http://scholar.google.com/scholar?cites=8316499405683938909&as_sdt=5,44&sciodt=0,44&hl=en

Finally, there's this failed Kickstarter: http://www.kickstarter.com/projects/storybricks/storybricks-the-mmorpg-storytelling-toolset

Even more finally, I also found this PDF in a second set of notes: https://research.cc.gatech.edu/inc/content/sequential-recommendation-approach-interactive-personalized-story-generation

vitorio commented Nov 1, 2013

Here's a dump of my notes about generating stories:

@rfreebern researched this problem a few years back for this game project of his:

Curses! is a single-player open-ended adventure game with the basic premise that the player is a fairy tale villain bent on wrecking many potential fairy tales as completely as possible. Fairy tale plots would be generated on-the-fly based on a basic generator template that attempts to intelligently combine dozens or hundreds of very basic fairy tale elements to create situations that are both unique and familiar. The PC's goal is not to just thwart the happy ending but to do it thoroughly: not just kill the handsome prince, but cripple and disfigure him while making the princess hate him and get exiled from her kingdom, for example.

Fairy tales are really well-explored variants of the standard storytelling archetypes described by people like Joseph Campbell. There are a couple of ways that fairy tales are organized, which include their plot outlines (although not their cultural or moral implications): Aarne-Thompson, and Propp. http://en.wikipedia.org/wiki/Aarne-Thompson_classification_system

Propp's classification system has been used as the basis for a number of generators and is still the most-used mechanism in the academic literature for such things: http://en.wikipedia.org/wiki/Vladimir_Propp

Propp generators are things like: http://www.fdi.ucm.es/profesor/fpeinado/projects/kiids/apps/protopropp/

Clicking through to their later Bard system shows examples at the bottom, and that whole KIIDS things is for interactive narrative and computational narratology, which are the academic terms for this sort of thing (I call my work in this area automated storytelling with post-hoc computational narratives, as my use and implementation aren't for interaction).

Mark Finlayson's work out of MIT is a little more recent: http://www.mit.edu/~markaf/research.html

Plugging any of that research into Google Scholar and looking at recent citations of those papers are a good way to catch up.

The massively-multiplayer video game Star Wars Galaxies tried something along these lines with their Dynamic Points of Interest, but they weren't really well executed from a design and technical implementation perspective. They had a lot of potential, but Raph Koster describes their problems here: http://www.raphkoster.com/2010/04/30/dynamic-pois/

Outside of fairy tales, there are works like Plotto, which provide narrative guides to plot generation, and the monomyth-related works by Campbell, etc.: http://www.brainpickings.org/index.php/2012/01/06/plotto/

Plotto is actually in the public domain, and can be found in the Internet Archive here: https://archive.org/details/plottonewmethodo00cook

And journalism is getting into it, too. A program at Northwestern worked out so well, taking sports stats and turning them into sports articles, they didn't publish much research at all and went right into a startup. The Wired article is here: http://www.wired.com/gadgetlab/2012/04/can-an-algorithm-write-a-better-news-story-than-a-human-reporter/all/1

The one paper I found by the Northwestern group cites one major paper from 1977 about "Tale-spin." You can look for citations from the Tale-spin article, and that brings up some interesting recent work from elsewhere: http://scholar.google.com/scholar?cites=8316499405683938909&as_sdt=5,44&sciodt=0,44&hl=en

Finally, there's this failed Kickstarter: http://www.kickstarter.com/projects/storybricks/storybricks-the-mmorpg-storytelling-toolset

Even more finally, I also found this PDF in a second set of notes: https://research.cc.gatech.edu/inc/content/sequential-recommendation-approach-interactive-personalized-story-generation

@darrentorpey

This comment has been minimized.

Show comment
Hide comment
@darrentorpey

darrentorpey Nov 1, 2013

Thanks, @vitorio! That looks helpful.

darrentorpey commented Nov 1, 2013

Thanks, @vitorio! That looks helpful.

@smadin

This comment has been minimized.

Show comment
Hide comment
@smadin

smadin Nov 1, 2013

(OK, I made a github account.)
https://pypi.python.org/pypi/wikipedia/1.0.3 is a python interface to wikipedia, which may also be helpful for the quick-and-dirty Markov-chain approach. It was very easy to hack together a script to fetch random Wikipedia tables for source text and churn out a "novel" of a given word-count.

smadin commented Nov 1, 2013

(OK, I made a github account.)
https://pypi.python.org/pypi/wikipedia/1.0.3 is a python interface to wikipedia, which may also be helpful for the quick-and-dirty Markov-chain approach. It was very easy to hack together a script to fetch random Wikipedia tables for source text and churn out a "novel" of a given word-count.

@nickheer

This comment has been minimized.

Show comment
Hide comment
@nickheer

nickheer commented Nov 1, 2013

@dariusk

This comment has been minimized.

Show comment
Hide comment
@dariusk

dariusk Nov 1, 2013

Owner

While in-browser DOM manipulation is obviously ruled by jQuery, my favorite NodeJS DOM parser/manipulator is Cheerio, which uses jQuery-style selectors.

Also if you're in Ruby and need to do HTML/XML parsing, Nokogiri rules the roost.

Owner

dariusk commented Nov 1, 2013

While in-browser DOM manipulation is obviously ruled by jQuery, my favorite NodeJS DOM parser/manipulator is Cheerio, which uses jQuery-style selectors.

Also if you're in Ruby and need to do HTML/XML parsing, Nokogiri rules the roost.

@rfreebern

This comment has been minimized.

Show comment
Hide comment
@rfreebern

rfreebern Nov 1, 2013

I'm hanging out in #nanogenmo on FreeNode if anyone wants to join. We can toss ideas around on a casual basis there.

rfreebern commented Nov 1, 2013

I'm hanging out in #nanogenmo on FreeNode if anyone wants to join. We can toss ideas around on a casual basis there.

@dariusk

This comment has been minimized.

Show comment
Hide comment
@dariusk

dariusk Nov 1, 2013

Owner

For those who aren't super IRC-literate, or just don't want to install an irc client, you can go here, pick a username, and visit #nanogenmo from your web browser:

http://webchat.freenode.net/?channels=#nanogenmo

Owner

dariusk commented Nov 1, 2013

For those who aren't super IRC-literate, or just don't want to install an irc client, you can go here, pick a username, and visit #nanogenmo from your web browser:

http://webchat.freenode.net/?channels=#nanogenmo

@jiko

This comment has been minimized.

Show comment
Hide comment
@jiko

jiko Nov 1, 2013

The Bard project looks awesome. Thanks @vitorio!

jiko commented Nov 1, 2013

The Bard project looks awesome. Thanks @vitorio!

@jiko

This comment has been minimized.

Show comment
Hide comment
@jiko

jiko Nov 1, 2013

Some Python resources:

jiko commented Nov 1, 2013

Some Python resources:

@agladysh

This comment has been minimized.

Show comment
Hide comment
@agladysh

agladysh commented Nov 2, 2013

An article about generator of Recursive Fairy Tales in Haskell (in Russian): http://habrahabr.ru/post/136007/

Google Translate: http://translate.google.com/translate?hl=en&sl=ru&tl=en&u=http%3A%2F%2Fhabrahabr.ru%2Fpost%2F136007%2F

@dariusk dariusk referenced this issue Nov 2, 2013

Open

Great idea! #20

@darkliquid

This comment has been minimized.

Show comment
Hide comment
@darkliquid

darkliquid Nov 2, 2013

Not strictly related, but there are several story-based/narrative-focused roleplaying games that could be used/formalised into a system for generating overall plot structures. I'm currently looking at Microscope, Fiasco and FATE Core as potential systems for having characters 'play' through a game and recording what they do and what actions they take to generate stories.

darkliquid commented Nov 2, 2013

Not strictly related, but there are several story-based/narrative-focused roleplaying games that could be used/formalised into a system for generating overall plot structures. I'm currently looking at Microscope, Fiasco and FATE Core as potential systems for having characters 'play' through a game and recording what they do and what actions they take to generate stories.

@jiko

This comment has been minimized.

Show comment
Hide comment
@jiko

jiko Nov 2, 2013

Here's some of my Python code for generating sentences based on supplied text. None of the Twitter-related code has been tested with v1.1 of the Twitter API, but worked fine on v1.

  • Jambot, my first Twitter bot. Uses a 3-gram Markov model by default.
  • JamLitBot, a site that generates random 'sentences' and runs on Heroku. Here is the source code, which builds on JamBot's.
  • @lovecraft_ebooks also builds on JamBot, but uses a 4-gram Markov model.
  • omnibot simplifies bot creation and management. It includes three distinct text-generation methods.
  • wikov makes Lorem Ipsum from Wikipedia pages using a 2-gram Markov model.

jiko commented Nov 2, 2013

Here's some of my Python code for generating sentences based on supplied text. None of the Twitter-related code has been tested with v1.1 of the Twitter API, but worked fine on v1.

  • Jambot, my first Twitter bot. Uses a 3-gram Markov model by default.
  • JamLitBot, a site that generates random 'sentences' and runs on Heroku. Here is the source code, which builds on JamBot's.
  • @lovecraft_ebooks also builds on JamBot, but uses a 4-gram Markov model.
  • omnibot simplifies bot creation and management. It includes three distinct text-generation methods.
  • wikov makes Lorem Ipsum from Wikipedia pages using a 2-gram Markov model.
@jiko

This comment has been minimized.

Show comment
Hide comment
@jiko

jiko Nov 3, 2013

The Dada Engine, which powers the infamous Postmodernism Generator, might come in handy. There's an online manual and a clone on GitHub.

jiko commented Nov 3, 2013

The Dada Engine, which powers the infamous Postmodernism Generator, might come in handy. There's an online manual and a clone on GitHub.

@erkyrath

This comment has been minimized.

Show comment
Hide comment
@erkyrath

erkyrath Nov 3, 2013

Not a resource, but a suggestion: when you complete a novel, change the title of your issue to "$NovelTitle by $Author", so that we can easily browse them.

(Yeah, someone is now going to actually title their novel "$NovelTitle".)

If I were an over-organizational nerd, I would suggest setting up appropriate issue tags ("In Progress", "Complete", "Stupid Ideas", etc). But I leave that up to whether Darius is an over-organizational nerd.

erkyrath commented Nov 3, 2013

Not a resource, but a suggestion: when you complete a novel, change the title of your issue to "$NovelTitle by $Author", so that we can easily browse them.

(Yeah, someone is now going to actually title their novel "$NovelTitle".)

If I were an over-organizational nerd, I would suggest setting up appropriate issue tags ("In Progress", "Complete", "Stupid Ideas", etc). But I leave that up to whether Darius is an over-organizational nerd.

@dariusk

This comment has been minimized.

Show comment
Hide comment
@dariusk

dariusk Nov 3, 2013

Owner

I agree with you @erkyrath -- I'll try and prod people to do that when they're done. Issue tags... I might start labeling things myself!

Owner

dariusk commented Nov 3, 2013

I agree with you @erkyrath -- I'll try and prod people to do that when they're done. Issue tags... I might start labeling things myself!

@dariusk

This comment has been minimized.

Show comment
Hide comment
@dariusk

dariusk Nov 3, 2013

Owner

Okay, I opened a new Issue ( #42 ) for general discussion. This thread remains the place for technical resources; the other thread is open to everything else.

Owner

dariusk commented Nov 3, 2013

Okay, I opened a new Issue ( #42 ) for general discussion. This thread remains the place for technical resources; the other thread is open to everything else.

@vitorio

This comment has been minimized.

Show comment
Hide comment
@vitorio

vitorio Nov 3, 2013

Ficly ( http://ficly.com/stories and its predecessor Ficlets http://ficlets.ficly.com/ ) is a very-short-story writing community, where you have a 1024 character limit. There are lots of tiny stories on the site, but also, you can fork any story and write prequels and sequels to it. Some stories have multiple prequels and sequels, like an unintentional choose-your-own-adventure.

All of the Ficly and Ficlets content is licensed CC-BY-SA.

In late May 2013, I scraped all of Ficly and dumped 13,144 stories, all of which had at least one prequel or sequel, into a matching amount of JSON files (there should be no standalone 1k character stories). Each JSON file records the ID, URL and title of the story; the author's avatar, name and URL; the IDs and URLs of prequels and sequels; and the story content in Markdown.

The scraper (in Python) is probably a little prickly, as it's mostly uncommented, but the .zip of 13k JSON files could be dumped straight into a JSON document store and worked with directly. Perhaps someone wants to generate 50k words of choose-your-own-adventure stories or something.

https://github.com/vitorio/NaNoGenMo2013

vitorio commented Nov 3, 2013

Ficly ( http://ficly.com/stories and its predecessor Ficlets http://ficlets.ficly.com/ ) is a very-short-story writing community, where you have a 1024 character limit. There are lots of tiny stories on the site, but also, you can fork any story and write prequels and sequels to it. Some stories have multiple prequels and sequels, like an unintentional choose-your-own-adventure.

All of the Ficly and Ficlets content is licensed CC-BY-SA.

In late May 2013, I scraped all of Ficly and dumped 13,144 stories, all of which had at least one prequel or sequel, into a matching amount of JSON files (there should be no standalone 1k character stories). Each JSON file records the ID, URL and title of the story; the author's avatar, name and URL; the IDs and URLs of prequels and sequels; and the story content in Markdown.

The scraper (in Python) is probably a little prickly, as it's mostly uncommented, but the .zip of 13k JSON files could be dumped straight into a JSON document store and worked with directly. Perhaps someone wants to generate 50k words of choose-your-own-adventure stories or something.

https://github.com/vitorio/NaNoGenMo2013

@darkliquid

This comment has been minimized.

Show comment
Hide comment
@darkliquid

darkliquid Nov 3, 2013

I've done some basic gathering of info over a few sources to generate a bunch of sentence structures using parts-of-speech tagging while I've been researching. Other might find this useful, so you can find them here: https://github.com/darkliquid/NaNoGenMo/tree/master/data

The data is basically one sentence to a line, each line containing a stream of space separated parts-of-speech tags. There are likely to be mistakes in the set as I've hacked this together without any real understanding of what it is I'm doing or what I yet hope to achieve from it, but have at it and good luck!

darkliquid commented Nov 3, 2013

I've done some basic gathering of info over a few sources to generate a bunch of sentence structures using parts-of-speech tagging while I've been researching. Other might find this useful, so you can find them here: https://github.com/darkliquid/NaNoGenMo/tree/master/data

The data is basically one sentence to a line, each line containing a stream of space separated parts-of-speech tags. There are likely to be mistakes in the set as I've hacked this together without any real understanding of what it is I'm doing or what I yet hope to achieve from it, but have at it and good luck!

@aparrish aparrish referenced this issue Nov 3, 2013

Open

participant #41

@dariusk

This comment has been minimized.

Show comment
Hide comment
@dariusk

dariusk Nov 3, 2013

Owner

To be clear, @darkliquid's output can be interpreted by looking at this list of part of speech tags.

Owner

dariusk commented Nov 3, 2013

To be clear, @darkliquid's output can be interpreted by looking at this list of part of speech tags.

@aparrish

This comment has been minimized.

Show comment
Hide comment

aparrish commented Nov 3, 2013

@ghost

This comment has been minimized.

Show comment
Hide comment
@ghost

ghost Nov 3, 2013

It would be very difficult to use it in an automated way (and I realize it may be unpopular with some participants) but if you haven't heard of it, there's this site called TVTropes. It contains a vast array of, well, tropes (from fiction in general, mostly mass-media but not exclusively television,) pre-deconstructed for your convenience. For example, Applied Phlebotinum.

ghost commented Nov 3, 2013

It would be very difficult to use it in an automated way (and I realize it may be unpopular with some participants) but if you haven't heard of it, there's this site called TVTropes. It contains a vast array of, well, tropes (from fiction in general, mostly mass-media but not exclusively television,) pre-deconstructed for your convenience. For example, Applied Phlebotinum.

@lazerwalker

This comment has been minimized.

Show comment
Hide comment
@lazerwalker

lazerwalker Nov 4, 2013

Speaking of parts-of-speech tagging (cc @darkliquid), if you're literate in Objective-C Apple's NSLinguisticTagger API is fantastic. (http://nshipster.com/nslinguistictagger/)

lazerwalker commented Nov 4, 2013

Speaking of parts-of-speech tagging (cc @darkliquid), if you're literate in Objective-C Apple's NSLinguisticTagger API is fantastic. (http://nshipster.com/nslinguistictagger/)

@darkliquid

This comment has been minimized.

Show comment
Hide comment
@darkliquid

darkliquid Nov 4, 2013

Wow, that is nice. Sadly it's of no use to me in linux world but that looks like a much richer source of data for the kinds of analysis I'm looking to do.

On another note, I've started annotating the parts-of-speech tag definitions with example words and some extra rules for their use in sentences where applicable (which hopefully I can then use to scan my sentence structure list to bin structures that are grammatically incorrect). https://github.com/darkliquid/NaNoGenMo/blob/master/data/tag_types.txt

darkliquid commented Nov 4, 2013

Wow, that is nice. Sadly it's of no use to me in linux world but that looks like a much richer source of data for the kinds of analysis I'm looking to do.

On another note, I've started annotating the parts-of-speech tag definitions with example words and some extra rules for their use in sentences where applicable (which hopefully I can then use to scan my sentence structure list to bin structures that are grammatically incorrect). https://github.com/darkliquid/NaNoGenMo/blob/master/data/tag_types.txt

@enkiv2

This comment has been minimized.

Show comment
Hide comment
@enkiv2

enkiv2 Nov 4, 2013

WordNet can be coaxed into doing part of speech tagging (in addition to
providing synonyms, antonyms, and other related words), although part of
speech tagging requires a hack (iterate over parts of speech until the word
has a synonym in that group, then guess which part of speech the word is
actually being used as). I'd recommend using that on *nix, since it has
other (more useful) functions.

Tangentially, I have a resource to contribute.
https://github.com/enkiv2/synonym-warp will take a text document and
randomly replace some words with synonyms (which slightly warps the
semantics since the synonyms it uses aren't necessarily appropriate to the
context). It expects to run on a unix under zsh, with wordnet in the path.
I'm planning to run input texts through it before training a markov model,
to add a little noise.

On Mon, Nov 4, 2013 at 11:18 AM, Andrew Montgomery-Hurrell <
notifications@github.com> wrote:

Wow, that is nice. Sadly it's of no use to me in linux world but that
looks like a much richer source of data for the kinds of analysis I'm
looking to do.

On another note, I've started annotating the parts-of-speech tag
definitions with example words and some extra rules for their use in
sentences where applicable (which hopefully I can then use to scan my
sentence structure list to bin structures that are grammatically
incorrect).
https://github.com/darkliquid/NaNoGenMo/blob/master/data/tag_types.txt


Reply to this email directly or view it on GitHubhttps://github.com/dariusk/NaNoGenMo/issues/11#issuecomment-27698071
.

enkiv2 commented Nov 4, 2013

WordNet can be coaxed into doing part of speech tagging (in addition to
providing synonyms, antonyms, and other related words), although part of
speech tagging requires a hack (iterate over parts of speech until the word
has a synonym in that group, then guess which part of speech the word is
actually being used as). I'd recommend using that on *nix, since it has
other (more useful) functions.

Tangentially, I have a resource to contribute.
https://github.com/enkiv2/synonym-warp will take a text document and
randomly replace some words with synonyms (which slightly warps the
semantics since the synonyms it uses aren't necessarily appropriate to the
context). It expects to run on a unix under zsh, with wordnet in the path.
I'm planning to run input texts through it before training a markov model,
to add a little noise.

On Mon, Nov 4, 2013 at 11:18 AM, Andrew Montgomery-Hurrell <
notifications@github.com> wrote:

Wow, that is nice. Sadly it's of no use to me in linux world but that
looks like a much richer source of data for the kinds of analysis I'm
looking to do.

On another note, I've started annotating the parts-of-speech tag
definitions with example words and some extra rules for their use in
sentences where applicable (which hopefully I can then use to scan my
sentence structure list to bin structures that are grammatically
incorrect).
https://github.com/darkliquid/NaNoGenMo/blob/master/data/tag_types.txt


Reply to this email directly or view it on GitHubhttps://github.com/dariusk/NaNoGenMo/issues/11#issuecomment-27698071
.

@jiko

This comment has been minimized.

Show comment
Hide comment
@jiko

jiko Nov 4, 2013

@darkliquid Nice work! Part of speech tagging seems like a fruitful avenue.

I've played with this Javascript PoS tagger in the last few days. I found it through The node.js Natural Language Story blog post by the maintainer of a package of general natural language facilities for node. I found another interesting Node package to generate random sentences from BNF grammars, along the lines of the Dada Engine mentioned above.

jiko commented Nov 4, 2013

@darkliquid Nice work! Part of speech tagging seems like a fruitful avenue.

I've played with this Javascript PoS tagger in the last few days. I found it through The node.js Natural Language Story blog post by the maintainer of a package of general natural language facilities for node. I found another interesting Node package to generate random sentences from BNF grammars, along the lines of the Dada Engine mentioned above.

@jiko

This comment has been minimized.

Show comment
Hide comment
@enkiv2

This comment has been minimized.

Show comment
Hide comment
@enkiv2

enkiv2 Nov 6, 2013

For anybody rolling their own grammars, I found a constraint solver in
python: https://github.com/switham/constrainer

On Wed, Nov 6, 2013 at 4:37 AM, Andrew Montgomery-Hurrell <
notifications@github.com> wrote:

Some lists of names, places, occupations, etc for generating character
details.

Names http://stackoverflow.com/questions/1803628/raw-list-of-person-names

Titles http://www.gutenberg.org/dirs/GUTINDEX.ALL

US Cities http://wiki.skullsecurity.org/images/5/54/US_Cities.txt

Job Titles http://www.bls.gov/soc/soc_2010_direct_match_title_file.xls

Adjectives http://www.enchantedlearning.com/wordlist/adjectives.shtml

Nouns http://www.momswhothink.com/reading/list-of-nouns.html


Reply to this email directly or view it on GitHubhttps://github.com/dariusk/NaNoGenMo/issues/11#issuecomment-27855298
.

enkiv2 commented Nov 6, 2013

For anybody rolling their own grammars, I found a constraint solver in
python: https://github.com/switham/constrainer

On Wed, Nov 6, 2013 at 4:37 AM, Andrew Montgomery-Hurrell <
notifications@github.com> wrote:

Some lists of names, places, occupations, etc for generating character
details.

Names http://stackoverflow.com/questions/1803628/raw-list-of-person-names

Titles http://www.gutenberg.org/dirs/GUTINDEX.ALL

US Cities http://wiki.skullsecurity.org/images/5/54/US_Cities.txt

Job Titles http://www.bls.gov/soc/soc_2010_direct_match_title_file.xls

Adjectives http://www.enchantedlearning.com/wordlist/adjectives.shtml

Nouns http://www.momswhothink.com/reading/list-of-nouns.html


Reply to this email directly or view it on GitHubhttps://github.com/dariusk/NaNoGenMo/issues/11#issuecomment-27855298
.

@elib

This comment has been minimized.

Show comment
Hide comment
@elib

elib Nov 6, 2013

I don't know if anyone has referenced this crucial resource.
https://www.youtube.com/watch?v=FUa7oBsSDk8

elib commented Nov 6, 2013

I don't know if anyone has referenced this crucial resource.
https://www.youtube.com/watch?v=FUa7oBsSDk8

@darkliquid

This comment has been minimized.

Show comment
Hide comment
@darkliquid

darkliquid Nov 6, 2013

I've been running a term extraction for the last couple of days that just finished running. It has various 'terms' i.e. the key noun or noun phrase/topic that a sentence is about, extracted from around half a million sentences across a wide range of sources (gutenberg novels, news articles, etc). I'm not sure I'll even use it now, but it might be of use for people looking to seed their stories with random topics.

https://github.com/darkliquid/NaNoGenMo/blob/master/data/terms_cleaned.txt.gz

darkliquid commented Nov 6, 2013

I've been running a term extraction for the last couple of days that just finished running. It has various 'terms' i.e. the key noun or noun phrase/topic that a sentence is about, extracted from around half a million sentences across a wide range of sources (gutenberg novels, news articles, etc). I'm not sure I'll even use it now, but it might be of use for people looking to seed their stories with random topics.

https://github.com/darkliquid/NaNoGenMo/blob/master/data/terms_cleaned.txt.gz

@enkiv2

This comment has been minimized.

Show comment
Hide comment
@enkiv2

enkiv2 Nov 8, 2013

I was inspired by somebody's example of dialogue generation, and so I wrote
some code to parse an ontology and create some question/answer pairs based
on categories: https://github.com/enkiv2/NaNoGenMo2013

At some point, I'll need to hack it to generate other kinds of dialogue.

On Wed, Nov 6, 2013 at 4:15 PM, Andrew Montgomery-Hurrell <
notifications@github.com> wrote:

I've been running a term extraction for the last couple of days that just
finished running. It has various 'terms' i.e. the key noun or noun
phrase/topic that a sentence is about, extracted from around half a million
sentences across a wide range of sources (gutenberg novels, news articles,
etc). I'm not sure I'll even use it now, but it might be of use for people
looking to seed their stories with random topics.

https://github.com/darkliquid/NaNoGenMo/blob/master/data/terms_cleaned.txt.gz


Reply to this email directly or view it on GitHubhttps://github.com/dariusk/NaNoGenMo/issues/11#issuecomment-27914090
.

enkiv2 commented Nov 8, 2013

I was inspired by somebody's example of dialogue generation, and so I wrote
some code to parse an ontology and create some question/answer pairs based
on categories: https://github.com/enkiv2/NaNoGenMo2013

At some point, I'll need to hack it to generate other kinds of dialogue.

On Wed, Nov 6, 2013 at 4:15 PM, Andrew Montgomery-Hurrell <
notifications@github.com> wrote:

I've been running a term extraction for the last couple of days that just
finished running. It has various 'terms' i.e. the key noun or noun
phrase/topic that a sentence is about, extracted from around half a million
sentences across a wide range of sources (gutenberg novels, news articles,
etc). I'm not sure I'll even use it now, but it might be of use for people
looking to seed their stories with random topics.

https://github.com/darkliquid/NaNoGenMo/blob/master/data/terms_cleaned.txt.gz


Reply to this email directly or view it on GitHubhttps://github.com/dariusk/NaNoGenMo/issues/11#issuecomment-27914090
.

@warnaars

This comment has been minimized.

Show comment
Hide comment
@warnaars

warnaars Nov 9, 2013

You might find this an interesting take on 'automated content authorship'
http://youtu.be/SkS5PkHQphY

warnaars commented Nov 9, 2013

You might find this an interesting take on 'automated content authorship'
http://youtu.be/SkS5PkHQphY

@MichaelPaulukonis

This comment has been minimized.

Show comment
Hide comment
@MichaelPaulukonis

MichaelPaulukonis Nov 9, 2013

@warnaars Philip M. Parker! I would love to see some of his novelistic output.... I'd really love to see some of his code. I've got some more links on him at http://www.xradiograph.com/WordSalad/AutomaticForThePeople

MichaelPaulukonis commented Nov 9, 2013

@warnaars Philip M. Parker! I would love to see some of his novelistic output.... I'd really love to see some of his code. I've got some more links on him at http://www.xradiograph.com/WordSalad/AutomaticForThePeople

@lilinx

This comment has been minimized.

Show comment
Hide comment
@lilinx

lilinx Nov 9, 2013

"If the atoms have by chance formed so many sorts of figures, why did it never fall out that they made a house or a shoe? Why at the same rate should we not believe that an infinite number of Greek letters, strewed all over a certain place, might fall into the contexture of the Iliad?"
Michel de Montaigne (1533-1592), Essais

lilinx commented Nov 9, 2013

"If the atoms have by chance formed so many sorts of figures, why did it never fall out that they made a house or a shoe? Why at the same rate should we not believe that an infinite number of Greek letters, strewed all over a certain place, might fall into the contexture of the Iliad?"
Michel de Montaigne (1533-1592), Essais

@ikarth

This comment has been minimized.

Show comment
Hide comment
@ikarth

ikarth Nov 12, 2013

For that matter, how about a Library of Babel generator? (Not mine) http://dicelog.com/babel

ikarth commented Nov 12, 2013

For that matter, how about a Library of Babel generator? (Not mine) http://dicelog.com/babel

@notio

This comment has been minimized.

Show comment
Hide comment
@notio

notio Nov 12, 2013

Not open source, but still! The Fiction Idea Generator is interesting: http://figapps.net/fig.html

It's free this month (iTunes): https://itunes.apple.com/app/fiction-idea-generator-ef/id507536455?mt=8

notio commented Nov 12, 2013

Not open source, but still! The Fiction Idea Generator is interesting: http://figapps.net/fig.html

It's free this month (iTunes): https://itunes.apple.com/app/fiction-idea-generator-ef/id507536455?mt=8

@lilinx

This comment has been minimized.

Show comment
Hide comment
@lilinx

lilinx Nov 14, 2013

Also you might be interested in the works of Jean-Pierre Balpe
This man has been doing generative literature experiment for a while. He has countless bot-blogs generating the weirdest things. Unfortunately he seems to do everything in French : it's very difficult to find anything about him in English (even no english wikipedia article). But there is this short article : http://www.digitalarti.com/blog/digitalarti_mag/portrait_jean_pierre_balpe_inventor_of_literature

lilinx commented Nov 14, 2013

Also you might be interested in the works of Jean-Pierre Balpe
This man has been doing generative literature experiment for a while. He has countless bot-blogs generating the weirdest things. Unfortunately he seems to do everything in French : it's very difficult to find anything about him in English (even no english wikipedia article). But there is this short article : http://www.digitalarti.com/blog/digitalarti_mag/portrait_jean_pierre_balpe_inventor_of_literature

@ghost

This comment has been minimized.

Show comment
Hide comment
@ghost

ghost Nov 21, 2013

In one issue here somewhere I obliquely suggested generating a graphic novel -- that is to say, a comic book. While I would love to try, I definitely won't have the time to do this in what remains of November, but here are some resources I found while researching it:

http://openclipart.org is a collection of SVG images, all in the public domain. It can also render them as PNGs for you, at the scale you choose. It has a JSON API: http://openclipart.org/developers

If you wanted to use that JSON API on your own web page (perhaps to display these images on an HTML5 canvas element) you could use this generic JSONP proxy to make a mockery of the same-origin policy: http://jsonp.jit.su/

Here is a library of onomatopoeic sound-effects: http://www.writtensound.com/index.php Not sure how easy it would be to scrape, but probably wouldn't be hard to pick a random item from a desired category, like: http://www.writtensound.com/index.php?term=movement

Here is a list of catchphrases: https://en.wikipedia.org/wiki/List_of_catchphrases

And, just for that extra dadaist touch & in no way limited to graphic novels, here is a list of various abuses of the statistical meaning of p-value, collected from various academic papers: http://mchankins.wordpress.com/2013/04/21/still-not-significant-2/

What I imagine the result of using these resources to be something like:

a sombrero with a word balloon saying "Cowabunga" next to Tux (the Linux penguin) with a thought bubble saying "did not quite reach conventional levels of statistical significance (p=0.079)"... with the word SCHHWAFF at a slight angle and in a large-point font, in the background

ghost commented Nov 21, 2013

In one issue here somewhere I obliquely suggested generating a graphic novel -- that is to say, a comic book. While I would love to try, I definitely won't have the time to do this in what remains of November, but here are some resources I found while researching it:

http://openclipart.org is a collection of SVG images, all in the public domain. It can also render them as PNGs for you, at the scale you choose. It has a JSON API: http://openclipart.org/developers

If you wanted to use that JSON API on your own web page (perhaps to display these images on an HTML5 canvas element) you could use this generic JSONP proxy to make a mockery of the same-origin policy: http://jsonp.jit.su/

Here is a library of onomatopoeic sound-effects: http://www.writtensound.com/index.php Not sure how easy it would be to scrape, but probably wouldn't be hard to pick a random item from a desired category, like: http://www.writtensound.com/index.php?term=movement

Here is a list of catchphrases: https://en.wikipedia.org/wiki/List_of_catchphrases

And, just for that extra dadaist touch & in no way limited to graphic novels, here is a list of various abuses of the statistical meaning of p-value, collected from various academic papers: http://mchankins.wordpress.com/2013/04/21/still-not-significant-2/

What I imagine the result of using these resources to be something like:

a sombrero with a word balloon saying "Cowabunga" next to Tux (the Linux penguin) with a thought bubble saying "did not quite reach conventional levels of statistical significance (p=0.079)"... with the word SCHHWAFF at a slight angle and in a large-point font, in the background

@MichaelPaulukonis

This comment has been minimized.

Show comment
Hide comment
@MichaelPaulukonis

MichaelPaulukonis Nov 21, 2013

@catseye check out blotcomics and the graphic novel harsh noise.

I can't shake the feeling that the end result of your automation, however, will end up looking like ELER.
ep064 source

MichaelPaulukonis commented Nov 21, 2013

@catseye check out blotcomics and the graphic novel harsh noise.

I can't shake the feeling that the end result of your automation, however, will end up looking like ELER.
ep064 source

@ikarth

This comment has been minimized.

Show comment
Hide comment
@ikarth

ikarth Nov 21, 2013

If we're going graphical I should probably mention the billion-year archives of the webcomic mezzacotta: http://www.mezzacotta.net/

ikarth commented Nov 21, 2013

If we're going graphical I should probably mention the billion-year archives of the webcomic mezzacotta: http://www.mezzacotta.net/

@bredfern

This comment has been minimized.

Show comment
Hide comment
@bredfern

bredfern Nov 18, 2015

You can take a look at the text of my Automated Lovecraft project here: https://github.com/bredfern/automated-lovecraft/blob/master/automated_lovecraft.md

bredfern commented Nov 18, 2015

You can take a look at the text of my Automated Lovecraft project here: https://github.com/bredfern/automated-lovecraft/blob/master/automated_lovecraft.md

@bredfern

This comment has been minimized.

Show comment
Hide comment
@bredfern

bredfern Nov 18, 2015

The interesting thing I learned is that more firepower doesn't produce a better result there's a sweet spot between the size of the data set and the number of layers, so to train on all of lovecraft's text I got the best results using torch with just 4 layers. Since I was running off char nn most of the code I wrote and just bash script actually to run torch processes. I want to get deeper into this stuff so I can go further with it but its exciting to see the training result never having done this before.

bredfern commented Nov 18, 2015

The interesting thing I learned is that more firepower doesn't produce a better result there's a sweet spot between the size of the data set and the number of layers, so to train on all of lovecraft's text I got the best results using torch with just 4 layers. Since I was running off char nn most of the code I wrote and just bash script actually to run torch processes. I want to get deeper into this stuff so I can go further with it but its exciting to see the training result never having done this before.

@hugovk

This comment has been minimized.

Show comment
Hide comment
@hugovk

hugovk Nov 18, 2015

Contributor

@bredfern Wrong repo! This is the 2013 one, here's this year's: dariusk/NaNoGenMo-2015#1

Contributor

hugovk commented Nov 18, 2015

@bredfern Wrong repo! This is the 2013 one, here's this year's: dariusk/NaNoGenMo-2015#1

@hugovk hugovk referenced this issue Apr 4, 2016

Open

Resources #1

@dariusk dariusk referenced this issue Oct 22, 2016

Open

Resources #1

@hugovk hugovk referenced this issue Mar 25, 2017

Open

Resources #1

@hugovk hugovk referenced this issue Aug 8, 2017

Open

Resources #1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment