Join GitHub today
GitHub is home to over 36 million developers working together to host and review code, manage projects, and build software together.Sign up
"Where I'm From" poem & novel generator #49
You can download CDs and DVDs of Project Gutenberg books here:
I didn't know there is a Google Books API, I'll have to check it.
In my teaching years, this poem was everywhere:
Where I'm From
I am from clothespins,
I'm from fudge and eyeglasses,
I'm from Artemus and Billie's Branch,
Under my bed was a dress box
For my first trick, I'll be working on a poem generator (I know I know, we're building a novel, stay tuned ok) to identify the parts of speech at work here and generate new "I'm From" poems that mimic parts of speech and important sound patterns. This should be good practice in working with natural language processors in order to generate poem-length memoir-esque bits of text -- which I can then use as the base for further novel expansions.
Not a bad start! I got RiTa loaded and working, so that's a huge step in the right direction. Next I think I need to find some word banks / corpora for specific parts of the poem (example: nature words). Rita's proper nouns are kind of cringe-y but I'll run it more times and see if I need to substitute something else there. FYI for anyone getting started with Rita, here's a list of the parts of speech abbreviations:
I spent a few hours this evening working on linking up random choices from custom word lists. I forked Darius's corpora repo linked in the NaNoGenMo resources and also found some good word lists on the internet for what I am looking for. Fun fact: as a middle school English teacher, I loved word lists, or "word pools" we would sometimes call them. The walls of my classroom were plastered with posters of color words, verbs, adjectives, sensory words, etc. (until mandatory testing took over the entire Spring and they had to be covered up).
Like this: Fear the Repo
Shush, You, I'll DRY it up later.
I'm pretty happy with how it's shaping up, I love using RiTA to be able to control syllable length.
As a reminder, the source poem is here.
I'm hoping to finish assembling the poem tomorrow, then I can figure out where I want to take it from there.
DAY THIRD -- oh it is very late make that DAY FORTH
Just checking in with some sample output. I wasn't happy with the trees and bushes lists available to me, so I'm just inventing some instead. :D Done through second stanza, two to go!
I am from nightclubs,
I'm from parsnip and statistics,
DAY FOUR (FOR REAL)
We have a completed poem!
Where I'm From
I am from birthdays,
I'm from celery and byproducts,
I'm from South Gate and Beaverton,
Above my tea cart was a aft box
For the next step I can go one of (at least) two ways:
DAY ... TEN?
Ok, after taking some time off to learn all the data structures and algorithms (or not learn, as the case may be), I needed a quick win so I came back to this and was able to publish a version of the poem generator!
It's not very fancy, and probably breaks all the Node/Express rules (I am a very proficient Ruby on Rails developer seriously you should hire me), but it meets the prime objective of generating a new poem on demand.
I like this so much I am not sure how to translate it into a novel... but let's not call it "done" yet, because I'm going to sleep on that.
I found a couple open-source texts that work well for "memoir" style (Anne of Green Gables is the frontrunner), so I played with using RiTA to markov it up. My idea was to start with the base text, and then see if there's any way to prioritize the keywords generated in the 'Where I'm From' poem (so it would be a poem followed by short vignette featuring terms mentioned in that poem, and then more in that pattern).
It's interesting, but it isn't very readable in paragraph form. So I think I need to consider another method for text generation. Which puts me back at the starting line. :)
Maybe I'll just write more poems...? #NaPoGenMo! I'm not 100% invested in the novel form, at least not for my first experiment this year, but I'm shooting to adhere to the 50,000 word count...
Some quick text to share, I'm playing with the RiTA RiLexicon to find near replacement words for a classic poem (again with the poems!!! she just won't stop...). My goal here is to generate output that is clearly recognizable, but sounds bananas.
You might be curious, what is the difference between Rita's RiLexicon methods similarBySound(), similarByLetter(), similarBySoundAndLetter(), and rhymes()? So glad you asked... let's take a look at each of these at play! Each method returns an array of matches, so the computer is choosing a random match (or the original word) each time.
Similar by Sound
Compares the phonemes of the input word (using a version of the min-edit distance algorithm) to each word in the lexicon, returning the set of closest matches.
Two reeds divert in a yell good,
Similar by Letter
Compares the characters of the input string (using a version of the min-edit distance algorithm) to each word in the lexicon, returning the set of closest matches.
Two loads diverged in a fellow wood,
Similar by Sound and Letter
First calls similarBySound(), then filters the result set by the algorithm used in similarByLetter();
Two rods diverge in a bellow good,
Two words rhyme are considered as rhyming if their final stressed vowel and all following phonemes are identical
Two episodes diverged in a mellow likelihood,
I hadn't tried by letter before this little exercise (thinking the sound would be more important) but I actually like that output the best, here. It does seem to be keeping the sound and rhythm of the word as well. Linguistical coincidence? Edit-distance magick?
Rhyme is clearly variating greatest from the source text -- this could be fun to play with for replacing end words (or generating new rhyme words) but I won't use it in this "replace nearly every word" exercise.
Just for fun: Alliteration
Finds alliterations by comparing the phonemes of the input string to those of each word in the lexicon
Two razor diverged in a abuse wings,
^^Yikes, that's dark, RiTA! I won't be using this but watch this:
Two [roads organizational] diverged in a [yellow impugning] [wood whittle],
These are the word pairs it's claiming for alliteration. Some are truly weird. I feel like this would need some human editing if you were to use it in text generation, or else I might just throw out anything that doesn't start with the same letter as the base word (those all seem to work well!).
Signing off for now, I'm going to keep working on Bob Frost then see what else I can do in RiTA.
DAY THE LAST
After debating what to do with my poor poem-that-is-not-a-novel I decided to go ahead and use Rita's markov functionality, but use it on the poem as source material. What results is an epic memoir poem that doesn't have much plot but generates some interesting language. Not bad for a first attempt!
And here is the source code
How I made it:
I was going to serve up the results through express and node just like with my poem generator, but as soon as I got close, I ran into an 'Maximum call stack size exceeded' error. So, eff that. Markdown it is! An interesting aspect of markdown is that it doesn't preserve all the line breaks. I played with this and ultimately decided that I liked the paragraphs/prose poem format for such a long text document, so I left it alone (for a formatted version, see my earlier attempt which does preserve line breaks). I did discover that RiTA will occasionally generate language I wouldn't want to use in an app, so I'm curious if anyone (Darius) has already made a filter for this.
Questions or comments? I will answer what I can... if I do it again, I'll be purposeful about chapter headings or something that can break up the 50,000 words to help the flow. At this point, though, I can tinker no more.
Thanks for the opportunity and see you next year!
Have a completed label!
These are mainly aimed at bots, but should still be generally useful.
Here's a headline filter:
Tips on transphobic joke detection:
Some lists of bad words:
[Inactive] muted Twitter topics:
Some general etiquette things: