Give it a word and it'll build you an acrostic poem from n-grams built from a vocabulary it scraped off of the Twitter public stream. The Twitter bot, @acrostician is managed by this code. This README is a rundown of the bot's typical behavior, BUT, there are some easter eggs inside (and more to come), so don't read the code if you'd like the bot to surprise you!
To execute this Python script, from an environment which has access to the tweepy library installed, you'll need to provide three additional arguments:
$ python acrosticBot.py [config file] [acrostic word] ['read', 'write', or 'both']
If you provide fewer than four arguments (including the actual script name,
acrosticBot.py), you'll raise a usage
message reiterating the above, and the program will exit. Similarly, if your fourth argument is anything other than
the literal keywords
both, you'll get the usage message and the program will exit. From here on,
the README is going to describe the third argument,
acrostic word, as the target word for the program.
The configuration file is expecting key-value pairs to appear line by line, separated by an
There are five mandatory keys:
CONSUMER_KEY: The consumer key token for the app with which you'll be reading tweets and posting acrostics
CONSUMER_SECRET: The corresponding secret for the above key
ACCESS_KEY: The access key corresponding to the account that's going to use the above app for reading and writing
ACCESS_SECRET: The corresponding secret for the above key
NGRAM_DB_PATH: Path to the directory in which you'd like the n-gram SQLite3 database file to live.
Note that it's expecting these five key strings, verbatim. Regarding the keys and secrets: my cryptography background is awful weak, so I'm pretty much just rote parroting what I've learned about setting up these kinds of authorization handshakes from blogposts like this one from Jeff Miller (thanks, Jeff!).
If you include
both in the fourth argument position, the program will stream in 10,000 tweets from the
public stream of English-language tweets. For each tweet, the text of the tweet will be cleaned, converted into a
collection of n-grams. The n-grams whose initials appear as consecutive letters in the target word will be counted in
the target word's n-gram database. (It considers the "initial" of a hashtag, like
#worldcup, to be the first letter
For example, if the target word is TARGET, a tweet with the text
GREAT! Tegan and #Rob are here!!! would increment
Given a target word (let's say of length 6, as with TARGET), the program will try and greedily build an acrostic from the beginning of the target.
Given it's starting at position 0, the start of the target word, it will draw a pool of 200 random n-grams that have
It picks them in a biased fashion: it will specifically ask for 200 random n-grams of length 6 (hexagrams matching
target). Should it only receive 41 back from the database, since it's uncommon to actually read tweets with natural
TARGET acrostics in them premade, it will ask for 159 random n-grams of length five (quintagrams matching
keeps asking the next-lower table in such a fashion until it has at least 200 candidate n-grams of any length.
There's then a scoring procedure where each candidate n-gram gets mapped to a positive real-number. I plan on frequently playing around with exactly how this scoring works, but the score's always going to go up for an n-gram that:
- Has been seen more often in the public twitter stream, or
- has more terms in it, or
- has more characters/letters, or
- is/includes a hashtage, or
- has been rarely included in previous acrostic poems.
From this pool of 200 scored n-grams, I draw a winning candidate randomly from the pool proportional to its score as a fraction of the sum of all candidate scores.
If in this case it picked
tegan and #rob, it repeats this whole procedure trying now to match GET: it asks for
get trigrams, topping off the pool with
ge bigrams and
This keeps going until a full acrostic poem is made. The constituent n-grams all see their "usage counts" incremented in the database, to try and provide a signal for avoiding picking the same terms over and over. The poem is formatted nicely and tweeted out via the API keys provided in the configuration file.
Work in progress
In all honesty, the above is more how I WANT the bot to behave than how this code currently instructs it to. For instance, the usage count mechanism isn't all there. I'll try and keep putting work in to patch things here and there, and of course to add further kooky surprise behaviors. Issues and questions encouraged!