Puppet provisioning project to add Storm and Kettle-Storm to Hortonworks Sandbox 2.0 VM
Hortonworks has a long detailed document on how to add Storm to their Sandbox VM ( http://hortonworks.com/blog/storm-technical-preview-available-now/ ).
This project automates that process plus the process of installing the Pentaho Kettle-Storm project.
The Hortonworks Sandbox 2.0 (virtualbox version) : http://hortonworks.com/products/hortonworks-sandbox/
The Sandbox VM should be running and configured with internet access.
To provision the VM:
Log into the VM via SSH (i.e. putty on Windows) as instructed on the VM's splash screen.
Run these commands:
# git clone firstname.lastname@example.org:deinspanjer/hw-sandbox-storm-provision.git # cd hw-sandbox-storm-provision # ./provision.sh
When the provisioning is done, you should be able to tail the storm logs:
# tail -F /var/log/storm/*
The last step is to set up port forwarding for the two new HTTP ports.
- Open the VirtualBox application
- Select the Hortonworks Sandbox 2.0 VM from the list on the left
- Click the Settings button on the toolbar
- Click the Network icon on the Settings dialog toolbar
- Under the Advanced section for Adapter 1, click Port Forwarding
- Click the + button to add a new port forward
- Name: storm-ui; Host Port: 8880; Guest Port: 8880
- Click the + button again to add a new port forward
- Name: storm-logviewer; Host Port: 8881; Guest Port: 8881
- Open the VirtualBox application
Visit the Storm Nimbus UI: http://localhost:8880/
To try out the Storm Starter sample topologies (jobs):
To try out the Kettle-Storm sample transformation:
Since this transformation uses twitter4j, you must set up credentials for the demo.
Create app and access token
- Log in to the Twitter Dev site: https://apps.twitter.com/
- Click the "Create New App button"
- Fill out the "Application details" form.
- The contents of this form are not important for the demo.
- You can leave the "Callback URL" field blank.
- Click the "Create your Twitter application" button.
- On the "Application Mangement" page for your newly created app, click the "manage API keys" link under "Application settings" / "API key"
- On the "API Keys" page, click the "Create my access token" button at the bottom.
- Wait a few moments, then refresh the page to see your access token.
Edit twitter4j.properties file
- In the hw-sandbox-storm-provision directory of your VM, edit the twitter4j.properties file
- Copy each of the four values from the Twitter App API Keys page into this file:
- API key == "oath.consumerKey"
- API secret == "oath.consumerSecret"
- Access token == "oath.accessToken"
- Access token secret == "oath.accessTokenSecret"
Add the modified twitter4j.properties file to the jar so it can be found in the classpath
In the hw-sandbox-storm-provision directory of your VM, run this command:
# jar uf kettle-engine-storm-0.0.2-SNAPSHOT-for-remote-topology.jar twitter4j.properties
Submit the jar to the Storm cluster, passing in the Kettle transformation to run
# storm jar kettle-engine-storm-0.0.2-SNAPSHOT-for-remote-topology.jar org.pentaho.kettle.engines.storm.KettleStorm demo-twitter4j.ktr
By default, the transformation will run for 15 seconds then automatically shut down.
View the results in the output text file
# cat /home/storm/tweets.txt
Optionally, modify the duration or change the filter keywords by editing the demo-twitter4j.ktr