Hive Worker is a Scala library to run Hive job flows on the Amazon Elastic MapReduce platform using the AWS Java SDK.
Hive Worker uses Google guice and twitter-server as a server stack. twitter-server is built on top of Finagle -- see Finagle's User's Guide for more information.
NOTE: this library is a work in-progress
git clone git://github.com/cacoco/hiveworker.git
Hive Worker is built using Maven and requires Scala 2.10.1
To build, just run:
cd hiveworker
mvn clean install
The parsing of the job configuration steps supports a basic form of date/time formatting if the value sent is of type io.angstrom.hiveworker.util.StepArgument
. The default timezone is UTC and
is not configurable. Supported formatting includes:
Hour, LastHour, Today, Yesterday, TwoDaysAgo, LastMonth
Hive Worker uses the joda-time library for date/time manipulation and formatting.
To run locally:
mvn exec:java -Dexec.args="-aws.access.key=ACCESSS_KEY
-aws.access.secret.key=SECRET_KEY
-hadoop.bucket=s3:///hadoop.angstrom.io
-hadoop.log.uri=s3://hadoop.angstrom.io/logs
-aws.sns.topic.arn.job.errors=arn:aws:sns:us-east-1:111111111111:job-errors
-aws.sqs.queue.url.default=https://queue.amazonaws.com/11111111111/HIVE_JOB_FLOW"