Skip to content
This repository has been archived by the owner on Aug 2, 2019. It is now read-only.

Starting an HDFS cluster on EC2 with Provisionr and Rundeck

cimi edited this page Feb 18, 2013 · 5 revisions

We've made a short video illustrating this capability, you can watch it on YouTube by clicking the image below:

Watch on YouTube

There are some background steps you need to take in order to make the integration work. First of all, before creating an AWS pool through Provisionr, you need to set your credentials. You can do this in two ways - either through the console, as illustrated in the project's readme:

$ ./bin/provisionr
provisionr [0.0.1-SNAPSHOT] $ config:edit com.axemblr.provisionr.amazon
provisionr [0.0.1-SNAPSHOT] $ config:proplist
    service.pid = com.axemblr.provisionr.amazon
    secretKey = secret
    felix.fileinstall.filename = file:[...]/etc/com.axemblr.provisionr.amazon.cfg
    region = us-east-1
    accessKey = access
provisionr [0.0.1-SNAPSHOT] $ config:propset accessKey "XXXXXXX"
provisionr [0.0.1-SNAPSHOT] $ config:propset secretKey "XXXXXXX"
provisionr [0.0.1-SNAPSHOT] $ config:update
provisionr [0.0.1-SNAPSHOT] $ config:list "(service.pid=com.axemblr.provisionr.amazon)"

Or by editing etc/com.axemblr.provisionr.amazon.cfg directly in the unzipped distribution directory after starting the service.

The machines in the pool can be automatically provisioned according to a template of your choice. For this demo, we used CDH4 packages defined in this template.

The create command must specify the template by its ID, cdh4. The command would look like:

provisionr [0.4.0-SNAPSHOT] $ provisionr:create --id amazon --key demo --size 10 --hardware-type m1.small --template cdh4 --timeout 1200

In Rundeck, you need to have a project with the endpoint set to the dedicated URL exposed by Provisionr: http://localhost:8181/rundeck/machines.xml. The configuration job needs to receive a parameter named namenodePrivateHostname set to any hostname in the pool, which will be set up as the namenode. There's only one workflow step - a bash script that runs the actual configuration on each node. You can see the source for the script in this gist.

If you want, you can download a prebuilt Rundeck project that does all this from here.