Skip to content
Branch: master
Find file History
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
..
Failed to load latest commit information.
packer
README.md
allocate_aws_cluster.sh

README.md

This directory contains files relevant to the presentation Automating Cloud Cluster Deployment: Beyond the Book, given at Strata Data NYC in September 2017. They serve as examples of automating the deployment of Hadoop clusters in AWS, using ideas found in the book Moving Hadoop to the Cloud.

If you are looking for the video recording of the presentation, it's available on Safari.

If you are looking for the presentation slides, they are posted in the conference proceedings.

Prerequisites

  • An AWS account with permissions to work with EC2 instances
  • EC2 networking established, with a VPC, subnet, and optionally a security group
  • The AWS CLI installed and configured locally
  • Packer

Step 1. Create an AMI

Create an Amazon Machine Image (AMI) using Packer that contains most of what you need for your cluster. Check out the README for the Packer template to get started.

Step 2. Launch Instances

Use the allocate_aws_cluster.sh script to spin up the instances for your cluster. Each of them should be based on your custom AMI. Run the script with the -h option for documentation. You must have the AWS CLI installed and configured locally for the script to work.

The script reports the instance IDs and private and public IP addresses for the instances it allocates.

Step 3. Configure New SSH Keys

Use the coordinate_ssh_keys.sh script to configure fresh SSH keypairs for each cluster instance. Run the script with the -h option for documentation. The script can be run from the same machine you ran the allocation script on, or any that can reach the cluster's manager instance. Feed this script the IP addresses reported by the allocation script.

Step 4. Configure Hadoop

Use the config_hadoop.sh script to configure Hadoop on the instances as a single cluster. Run the script with the -h option for documentation. Unlike the other scripts, this one must be run on the cluster's manager instance. Feed this script the private IP addresses reported by the allocation script.

Ready to Go

Your Hadoop cluster in the cloud is ready for use. Perform the usual initialization steps, such as formatting the HDFS namenode, and then start the cluster services.

License

Source code in this project are licensed under the Apache 2.0 license. See NOTICE.md for third-party licenses.

You can’t perform that action at this time.