Skip to content
This repository has been archived by the owner on May 11, 2022. It is now read-only.
/ lembos Public archive

Lembos a Java-based library that provides an ecosystem allowing you to write your MapReduce jobs using Node.js and have them run natively within Hadoop as if written using Java.

License

Notifications You must be signed in to change notification settings

apigee/lembos

Repository files navigation

Lembos a Java-based library that provides an ecosystem allowing you to write your MapReduce jobs using Node.js and have them run natively within Hadoop as if written using Java. The project is packaged as a JAR file that is intended to be run via hadoop jar just as you would have you written your MapReduce job in Java.

(Note: There is an option to build a standalone executable JAR but that would require you to package up the JAR with the necessary Hadoop JARs in a shaded JAR. Since the build would be Hadoop version specific, we're not building those right now. For more details on how to do this, read the Getting Started (Developer) documentation below.)

Release Notes

Lembos uses the GitHub Releases for release notes and binary downloads and they are located here.

Getting Started (User)

First things first, we need to get a build of Lembos. You can do this by downloading the latest release or by cloning the project and building it yourself. Once you have a Lembos build and you have built your Node.js MapReduce Job, we need to use hadoop jar to run the Lembos runner. (For those of you not familiar with running hadoop jar, here is its usage: hadoop jar <jar> [generic options] [application args/options]...) Below is an example of how to run the word count example shipped with Lembos:

hadoop jar target/lembos-1.1-SNAPSHOT.jar \
  -D io.apigee.lembos.mapreduce.moduleName=wordcount \
  -D io.apigee.lembos.mapreduce.modulePath=examples/wordcount

It's that simple but before moving on, let's make sure we know what is going on here. The Lembos runner, and runtime, use the two Hadoop configuration properties above. These are required properties. io.apigee.lembos.mapreduce.moduleName is used to specify the name of the module, used by the Lembos runtime to load your module, and io.apigee.lembos.mapreduce.modulePath is used to specify the path to your module. (Note: The path can be an HDFS URL or a local filesystem path.) At this point, the Lembos runner loads your module and based on your job definition, the job is configured and submitted to the Hadoop cluster.

Getting Started (Developer)

If you are interested in contributing to Lembos or you need to build against Lembos, the process is pretty straight forward. As with any GitHub project, just clone the project. Here is the full Git command to save you some trouble: git clone https://github.com/apigee/lembos.git. We use Maven for our build process. As with most Maven projects, the commands are pretty standard:

  • mvn test: Run the unit tests (Does not require a running Hadoop instance)
  • mvn integration-test: Run the integration tests (Requires a running Hadoop instance to connect to)
  • mvn site: Build the Maven project documentation (Most useful when ran after mvn test or mvn integration-test as the code coverage reports will be built)
  • mvn package: Build a JAR file of Lembos

If you're building against Lembos, the project is submitted to Maven central so you just need to update your pom.xml to have following dependency:

<!-- ... -->
<dependency>
  <groupId>io.apigee.lembos</groupId>
  <artifactId>lembos</artifactId>
  <version>1.1-SNAPSHOT</version>
</dependency>
<!-- ... -->

Further Reading

While running the Lembos runner is quite simple, a lot happens behind the scenes that are not explained above. To better understand what is going on and how Lembos workse, below is a list of useful resources that can help explain what is going on and what else Lembos brings to the table:

About

Lembos a Java-based library that provides an ecosystem allowing you to write your MapReduce jobs using Node.js and have them run natively within Hadoop as if written using Java.

Resources

License

Stars

Watchers

Forks

Packages

No packages published