Starting an HDFS cluster on EC2 with Provisionr and Rundeck
Clone this wiki locally
We've made a short video illustrating this capability, you can watch it on YouTube by clicking the image below:
There are some background steps you need to take in order to make the integration work. First of all, before creating an AWS pool through Provisionr, you need to set your credentials. You can do this in two ways - either through the console, as illustrated in the project's readme:
$ ./bin/provisionr provisionr [0.4.0-SNAPSHOT] $ config:edit com.axemblr.provisionr.amazon provisionr [0.4.0-SNAPSHOT] $ config:proplist service.pid = com.axemblr.provisionr.amazon secretKey = secret felix.fileinstall.filename = file:[...]/etc/com.axemblr.provisionr.amazon.cfg region = us-east-1 accessKey = access provisionr [0.4.0-SNAPSHOT] $ config:propset accessKey "XXXXXXX" provisionr [0.4.0-SNAPSHOT] $ config:propset secretKey "XXXXXXX" provisionr [0.4.0-SNAPSHOT] $ config:update provisionr [0.4.0-SNAPSHOT] $ config:list "(service.pid=com.axemblr.provisionr.amazon)"
Or by editing
etc/com.axemblr.provisionr.amazon.cfg directly in the unzipped distribution directory after starting the service.
The machines in the pool can be automatically provisioned according to a template of your choice. For this demo, we used CDH4 packages defined in this template.
The create command must specify the template by its ID,
cdh4. The command would look like:
provisionr [0.4.0-SNAPSHOT] $ provisionr:create --id amazon --key demo --size 10 --hardware-type m1.small --template cdh4 --timeout 1200
In Rundeck, you need to have a project with the endpoint set to the dedicated URL exposed by Provisionr:
http://localhost:8181/rundeck/machines.xml. The configuration job needs to receive a parameter named
namenodePrivateHostname set to any hostname in the pool, which will be set up as the namenode. There's only one workflow step - a bash script that runs the actual configuration on each node. You can see the source for the script in this gist.
If you want, you can download a prebuilt Rundeck project that does all this from here.
The demo has been recorded with a nightly build of Provisionr (version 0.4.0-SNAPSHOT). This means that some features, like the
cdh4 template or
--timeout option may not be available in the latest release.