public
Description: A Twitter archiver, making tweets safe for historians.
Clone URL: git://github.com/mja/aviary.git
mja (author)
Sat Apr 19 10:12:25 -0700 2008
commit  d3de31e2db65d7317d821a4db36a8049eb383acb
tree    febffeedcfb6c030f56dc1378147b6d0719aa5df
parent  1542f2a1780693cd5a5c8e4fdbe8b16cd7092283
aviary /
name age message
file .gitignore Sat Mar 08 14:47:53 -0800 2008 Ignore XML tweets and Darwin clutter [mja]
file MIT-LICENSE Fri Mar 07 15:26:26 -0800 2008 Release under MIT License [Mark James Adams]
file MOTIVATION Fri Mar 07 14:54:12 -0800 2008 Add background on Project motivation [Mark Adams]
file README Sat Apr 19 10:12:25 -0700 2008 Update README text for new config options. [mja]
file TODO Sun Mar 09 14:10:49 -0700 2008 Added TODO [Geoff Cheshire]
directory assets/ Sat Mar 08 16:53:04 -0800 2008 Timeline event titles summarize tweet content. [mja]
file aviary.rb Sat Apr 19 10:09:14 -0700 2008 Catch exception is there is no config.yml file ... [mja]
file cat_statuses.sh Sat Mar 08 16:53:04 -0800 2008 Timeline event titles summarize tweet content. [mja]
file config-example.yml Sat Apr 19 09:59:16 -0700 2008 Username/pass in config.yml file. [mja]
file timeline.rb Sat Mar 08 16:53:04 -0800 2008 Timeline event titles summarize tweet content. [mja]
README
== Aviary, a Twitter Archiver

Aviary is a simple Ruby script to retrieve and archive your tweets. Twitter 
"claim no intellectual property rights over the material you provide to the 
Twitter service," so you should have a way of storing and repurposing your 
data.

== Requirements

Aviary requires the Hpricot and builder gems.

== Running Aviary

First, copy the config-example.yml to config.yml, then fill in your username
and password. Run Aviary with:

  $ ruby aviary.rb --updates [new|all] [--page XXX]

Aviary will create a directory called USERNAME and begin parsing your Twitter 
archive and downloading the raw XML representation of each tweet. The 
"--updates all" option will parse and save every tweet, while "--updates new" 
will stop parsing your online archive when it encounters a tweet that's 
already been downloaded. Optionally, specify a page number to retrieve tweets 
starting from that page. This is useful when restarting the script after a 
timeout.

Currently Aviary will not use your password to authenticate requests, so your
timeline will need to be public. This is to get around the API request limits.

Use the cat_statuses.sh script to combine all downloaded tweets for a user
into one statuses XML file.

  $ SCREEN_NAME=username ./cat_statuses.sh > username.xml
  
will read in all tweets in the username directory and concatenate them 
together in a file called username.xml. 

== Creating an events timeline

Assuming you have already put all of your status files into a single XML file
using cat_statuses.sh, you can transform it into a SIMILE timeline 
(http://simile.mit.edu/timeline/) using the assets/simili.xsl transformation 
stylesheet. View the data by modifying assets/events_template.html, replacing 
"screenname_events.xml" (Line 66) with the filename of your transformed 
statuses XML document.

This can also be done automatically using the timeline.rb script:

    $ ruby timeline.rb -u USERNAME
    
which will read in single tweets from the USERNAME's directory and construct
an events timeline (USERNAME_events.xml) in the timelines directory.
    

== What next?

Now your data is yours! Start thinking about your own way to utilize your
own data. Start thinking about your children's children's children. Will they
be accessing the possibly non-existent http://twitter.com or would they
prefer a paper record of "what you were doing"?