infochimps / machetec2
- Source
- Commits
- Network (1)
- Issues (0)
- Downloads (0)
- Wiki (1)
- Graphs
-
Branch:
master
README
machetEC2 is an Amazon Machine Image (AMI) designed at http://infochimps.org to make working with data on an EC2 instance easy. Instances of machetEC2 come with software and libraries for data processing, analysis, and visualization. Read more about what's available (and make suggestions!) at http://machetec2.org. * Building an image of machetEC2 machetEC2 is based on a barebones AMI of Ubuntu 8.10 Intrepid Ibex by Eric Hammond (http://alestic.com). The build scripts for machetEC2 add tools for working with data on top of this AMI. The build scripts live in this directory but can also be found at http://github.com/infochimps/machetec2/tree/master. These build scripts are written in Ruby Rakefiles; install Rake if it isn't already installed by running the Makefile: root@ec2-instance:/usr/local/share/machetec2/build$ make and then run the Rakefile to actually build this instance: root@ec2-instance:/usr/local/share/machetec2/build$ rake The final step in the build is to remove all SSH keys from the filesystem. This includes the public SSH key in the file ~/.ssh/authorized_keys which you probably used to log into the instance you're building! Save those keys somewhere and run the task to "seal" the instance by deleting its SSH keys: root@ec2-instance:/usr/local/share/machetec2/build$ rake seal Now bundle the image in the usual way (consult the documentation available at http://aws.amazon.com/ec2/ for more) and put the SSH keys back where you got them from (probably ~/.ssh/authorized_keys). You can then publish the image files and still be able to SSH into your instance. * Adding/removing packages If you want to add/remove packages before building, there is no need to write any code: simply edit config/packages.yaml. Every package in this list is installed by Rake. * More complicated build instructions Actual Rake code only needs to be written for special cases. All files ending with '.rb' in the 'components' or 'sources' directory will be included by the master Rakefile. To hook a new task into the main build, add a dependency on that task to the "machetec2:components" or "machetec2:sources" tasks As an example, updating Ruby's 'gem' installer and installing the gems in 'config/gems.yaml' is handled by the tasks defined in 'components/ruby.rb', which are hooked into the main build by having 'machetec2:components' depend on 'ruby:install_gems' in 'components/ruby.rb'. As a further example, installing Python's NLTK module is handled by the tasks defined in 'sources/nltk.rb' which are hooked into the main build by having 'machetec2:sources' depend on 'nlkt:install' in 'sources/nltk.rb'. When writing Ruby code for the build, try to use the helper functions in "lib/helpers.rb". Configuration settings are stored in "config/config.rb". ================================================================ See http://machetec2.org or http://infochimps.org for more information. Enjoy hacking!

