Skip to content
This repository

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

Infinite Monkeywrench - A frameworks for collecting, peeling, and sharing delicious bananas of data.

tree: 629639d68c

Fetching latest commit…

Octocat-spinner-32-eaf2f5

Cannot retrieve the latest commit at this time

Octocat-spinner-32 etc
Octocat-spinner-32 lib
Octocat-spinner-32 old
Octocat-spinner-32 spec
Octocat-spinner-32 .gitignore
Octocat-spinner-32 CHANGELOG
Octocat-spinner-32 LICENSE
Octocat-spinner-32 README.rdoc
Octocat-spinner-32 Rakefile
Octocat-spinner-32 VERSION
README.rdoc

Overview

The Infinite Monkeywrench (IMW) is a Ruby frameworks to simplify the tasks of acquiring, extracting, transforming, loading, and packaging data. It has the following goals:

  • Minimize programmer time even at the expense of increasing run time.

  • Take data through a full transformation from raw source to packaged purity in as few lines of code as possible.

  • Treat data records as objects as much as possible.

  • Use instead of repeat better code that already exists in other libraries (FasterCSV, I'm talkin' to you).

  • Make what's common easy without making what's uncommon impossible.

  • Work with messy data as well as clean data.

  • Let you incorporate your own tools wherever you choose to.

The Infinite Monkeywrench is a powerful tool but it is not always the right one to use. IMW is *not* designed for

Setup

IMW is hosted on Gemcutter so it's easy to install.

You'll have to set up Gemcutter

$ sudo gem install gemcutter
$ gem tumble

and then install IMW

$ sudo gem install imw

Using IMW

The central goal of IMW is to make workflow involved in processing a dataset from a raw source to a finished product as simple as possible.

So consider that there exist two datasets that I want to combine. The first details the historical price of bananas over the past century and the second

Working with paths and files

require 'rubygems'
require 'imw'

IMW holds a registry of paths that you can define on the fly or store in a configuration file.

IMW.add_path :dropbox, "/var/www/public/dropbox"
IMW.add_path :raw,     "/mnt/data/raw"
IMW.add_path :

This makes it easeir

IMW.path_to :raw, "one/particular/dataset"
#=> "/mnt/data/raw/one/particular/dataset"

IMW makes it easy to manipulate compressed files and archives.

# Move a collection of files from a public dropbox to a processing directory

raw

Dir["/public/*"].each do |path|
  file = IMW.open(path)
  case
  when file.compressed?
    file.decompress.mv_to_dir "/raw"
  when file.archive?
    FileUtils.cd("/raw") do
      file.extract
    end
  else
    file.mv_to_dir("/raw")
  end
end

Something went wrong with that request. Please try again.