Every repository with this icon (
Every repository with this icon (
| name | age | message | |
|---|---|---|---|
| |
.gitignore | Wed Dec 02 12:46:02 -0800 2009 | |
| |
CHANGELOG | Tue May 13 12:10:52 -0700 2008 | |
| |
LICENSE | Tue May 13 12:10:52 -0700 2008 | |
| |
README.rdoc | Wed Dec 02 12:53:49 -0800 2009 | |
| |
Rakefile | Wed Dec 02 12:46:02 -0800 2009 | |
| |
VERSION | Tue Dec 01 18:49:45 -0800 2009 | |
| |
etc/ | Thu Dec 03 18:33:01 -0800 2009 | |
| |
lib/ | Thu Dec 03 20:39:20 -0800 2009 | |
| |
old/ | Tue Dec 01 19:12:11 -0800 2009 | |
| |
spec/ | Thu Dec 03 20:37:09 -0800 2009 |
Overview
The Infinite Monkeywrench (IMW) is a Ruby frameworks to simplify the tasks of acquiring, extracting, transforming, loading, and packaging data. It has the following goals:
- Minimize programmer time even at the expense of increasing run time.
- Take data through a full transformation from raw source to packaged purity in as few lines of code as possible.
- Treat data records as objects as much as possible.
- Use instead of repeat better code that already exists in other libraries (FasterCSV, I’m talkin’ to you).
- Make what’s common easy without making what’s uncommon impossible.
- Work with messy data as well as clean data.
- Let you incorporate your own tools wherever you choose to.
The Infinite Monkeywrench is a powerful tool but it is not always the right one to use. IMW is *not* designed for
- Scraping vast amounts of data (use Wuclan, Monkeyshines, and Edamame.)
- Really, really big datasets (use Wukong and Hadoop)
- Data mining
- Data visualization
Setup
IMW is hosted on Gemcutter so it’s easy to install.
You’ll have to set up Gemcutter
$ sudo gem install gemcutter $ gem tumble
and then install IMW
$ sudo gem install imw
Using IMW
The central goal of IMW is to make workflow involved in processing a dataset from a raw source to a finished product as simple as possible.
So consider that there exist two datasets that I want to combine. The first details the historical price of bananas over the past century and the second
Working with paths and files
require 'rubygems' require 'imw'
IMW holds a registry of paths that you can define on the fly or store in a configuration file.
IMW.add_path :dropbox, "/var/www/public/dropbox" IMW.add_path :raw, "/mnt/data/raw" IMW.add_path :
This makes it easeir
IMW.path_to :raw, "one/particular/dataset" #=> "/mnt/data/raw/one/particular/dataset"
IMW makes it easy to manipulate compressed files and archives.
# Move a collection of files from a public dropbox to a processing directory
raw
Dir["/public/*"].each do |path|
file = IMW.open(path)
case
when file.compressed?
file.decompress.mv_to_dir "/raw"
when file.archive?
FileUtils.cd("/raw") do
file.extract
end
else
file.mv_to_dir("/raw")
end
end







