public
Description: Infinite Monkeywrench - A frameworks for collecting, peeling, and sharing delicious bananas of data.
Homepage: http://infinitemonkeywrench.org
Clone URL: git://github.com/infochimps/imw.git
imw /
name age message
file .gitignore Wed Dec 02 12:46:02 -0800 2009 Moved the code for the packager into IMW. [dhruvbansal]
file CHANGELOG Tue May 13 12:10:52 -0700 2008 Initial Commit -- trashing all our BZR history [Dhruv Bansal]
file LICENSE Tue May 13 12:10:52 -0700 2008 Initial Commit -- trashing all our BZR history [Dhruv Bansal]
file README.rdoc Wed Dec 02 12:53:49 -0800 2009 Changed README to rdoc instead of textile....fu... [dhruvbansal]
file Rakefile Wed Dec 02 12:46:02 -0800 2009 Moved the code for the packager into IMW. [dhruvbansal]
file VERSION Tue Dec 01 18:49:45 -0800 2009 Version bump to 0.1.0 [Dhruv Bansal]
directory etc/ Thu Dec 03 18:33:01 -0800 2009 Added a packager to move to S3. [dhruvbansal]
directory lib/ Thu Dec 03 20:39:20 -0800 2009 seriously... [dhruvbansal]
directory old/ Tue Dec 01 19:12:11 -0800 2009 Restored config file. IMW boots. [dhruvbansal]
directory spec/ Thu Dec 03 20:37:09 -0800 2009 Seriously, very stupid. [dhruvbansal]
README.rdoc

Overview

The Infinite Monkeywrench (IMW) is a Ruby frameworks to simplify the tasks of acquiring, extracting, transforming, loading, and packaging data. It has the following goals:

  • Minimize programmer time even at the expense of increasing run time.
  • Take data through a full transformation from raw source to packaged purity in as few lines of code as possible.
  • Treat data records as objects as much as possible.
  • Use instead of repeat better code that already exists in other libraries (FasterCSV, I’m talkin’ to you).
  • Make what’s common easy without making what’s uncommon impossible.
  • Work with messy data as well as clean data.
  • Let you incorporate your own tools wherever you choose to.

The Infinite Monkeywrench is a powerful tool but it is not always the right one to use. IMW is *not* designed for

Setup

IMW is hosted on Gemcutter so it’s easy to install.

You’ll have to set up Gemcutter

  $ sudo gem install gemcutter
  $ gem tumble

and then install IMW

  $ sudo gem install imw

Using IMW

The central goal of IMW is to make workflow involved in processing a dataset from a raw source to a finished product as simple as possible.

So consider that there exist two datasets that I want to combine. The first details the historical price of bananas over the past century and the second

Working with paths and files

  require 'rubygems'
  require 'imw'

IMW holds a registry of paths that you can define on the fly or store in a configuration file.

  IMW.add_path :dropbox, "/var/www/public/dropbox"
  IMW.add_path :raw,     "/mnt/data/raw"
  IMW.add_path :

This makes it easeir

  IMW.path_to :raw, "one/particular/dataset"
  #=> "/mnt/data/raw/one/particular/dataset"

IMW makes it easy to manipulate compressed files and archives.

  # Move a collection of files from a public dropbox to a processing directory

  raw

  Dir["/public/*"].each do |path|
    file = IMW.open(path)
    case
    when file.compressed?
      file.decompress.mv_to_dir "/raw"
    when file.archive?
      FileUtils.cd("/raw") do
        file.extract
      end
    else
      file.mv_to_dir("/raw")
    end
  end