github
Advanced Search
  • Home
  • Pricing and Signup
  • Explore GitHub
  • Blog
  • Login

infochimps / machetec2

  • Admin
  • Watch Unwatch
  • Fork
  • Your Fork
  • Pull Request
  • Download Source
    • 21
    • 1
  • Source
  • Commits
  • Network (1)
  • Issues (0)
  • Downloads (0)
  • Wiki (1)
  • Graphs
  • Branch: master

click here to add a description

click here to add a homepage

  • Branches (1)
    • master ✓
  • Tags (0)
Sending Request…
Enable Donations

Pledgie Donations

Once activated, we'll place the following badge in your repository's detail box:
Pledgie_example
This service is courtesy of Pledgie.

Hacking Data through Amazon's EC2 - An AMI configuration for exploring Big Data — Read more

  cancel

http://machetec2.org

  cancel
  • Private
  • Read-Only
  • HTTP Read-Only

This URL has Read+Write access

The uname line is automatically inserted by bootmisc.sh so it's not needed 
in motd. 
dhruvbansal (author)
Fri Feb 06 13:00:59 -0800 2009
commit  2e3cbf26979f9dda74b43049f6f401bb1fafd220
tree    f113f451ed53324f644d17001c349d52fe8342e2
parent  bb6cbcd113c906b92287d0fde724d778d1436642
machetec2 /
name age
history
message
file Makefile Loading commit data...
file README
file Rakefile
directory components/
directory config/
directory files/
directory lib/
directory sources/
README
machetEC2 is an Amazon Machine Image (AMI) designed at
http://infochimps.org to make working with data on an EC2 instance
easy.  Instances of machetEC2 come with software and libraries for
data processing, analysis, and visualization.  Read more about what's
available (and make suggestions!) at http://machetec2.org.

* Building an image of machetEC2

machetEC2 is based on a barebones AMI of Ubuntu 8.10 Intrepid Ibex by
Eric Hammond (http://alestic.com).  The build scripts for machetEC2
add tools for working with data on top of this AMI.  The build scripts
live in this directory but can also be found at
http://github.com/infochimps/machetec2/tree/master.

These build scripts are written in Ruby Rakefiles; install Rake if it
isn't already installed by running the Makefile:

    root@ec2-instance:/usr/local/share/machetec2/build$ make

and then run the Rakefile to actually build this instance:

    root@ec2-instance:/usr/local/share/machetec2/build$ rake

The final step in the build is to remove all SSH keys from the
filesystem.  This includes the public SSH key in the file
~/.ssh/authorized_keys which you probably used to log into the
instance you're building!  Save those keys somewhere and run the task
to "seal" the instance by deleting its SSH keys:

   root@ec2-instance:/usr/local/share/machetec2/build$ rake seal

Now bundle the image in the usual way (consult the documentation
available at http://aws.amazon.com/ec2/ for more) and put the SSH keys
back where you got them from (probably ~/.ssh/authorized_keys).  You
can then publish the image files and still be able to SSH into your
instance.

* Adding/removing packages

If you want to add/remove packages before building, there is no need
to write any code: simply edit config/packages.yaml.  Every package in
this list is installed by Rake.

* More complicated build instructions

Actual Rake code only needs to be written for special cases.  All
files ending with '.rb' in the 'components' or 'sources' directory
will be included by the master Rakefile.  To hook a new task into the
main build, add a dependency on that task to the
"machetec2:components" or "machetec2:sources" tasks

As an example, updating Ruby's 'gem' installer and installing the gems
in 'config/gems.yaml' is handled by the tasks defined in
'components/ruby.rb', which are hooked into the main build by having
'machetec2:components' depend on 'ruby:install_gems' in
'components/ruby.rb'.

As a further example, installing Python's NLTK module is handled by
the tasks defined in 'sources/nltk.rb' which are hooked into the main
build by having 'machetec2:sources' depend on 'nlkt:install' in
'sources/nltk.rb'.

When writing Ruby code for the build, try to use the helper functions
in "lib/helpers.rb".  Configuration settings are stored in
"config/config.rb".

================================================================
See http://machetec2.org or http://infochimps.org for more
information.  Enjoy hacking!


Blog | Support | Training | Contact | API | Status | Twitter | Help | Security
© 2010 GitHub Inc. All rights reserved. | Terms of Service | Privacy Policy
Powered by the Dedicated Servers and
Cloud Computing of Rackspace Hosting®
Dedicated Server