Buildoop: Hadoop Ecosystem Builder
The Hadoop Ecosystem Builder -Buildoop- provides interoperable tools, metadata, and processes that enable the rapid, repeatable development of a Linux Hadoop based system.
With Buildoop you can build a complete set of Hadoop ecosystem components based on RPM or DEB packages, make integration tests for this tools on a RedHat/CentOS, or Debian/Ubuntu virtual system, and maintain a set of configuration files for baremetal deployment.
The Buildoop is splitted in the following fundations:
- A main command line program for metadata operations: buildoop.
- A set of recipes: The metadata for package and tools building.
- A set of system integration tests: SIT framework.
- A central repository for baremetal deployment configuration.
From the technology point of view Buildoop is based on:
- Command line "buildoop" based on Groovy.
- Packaging recipes based on JSON.
- SIT Framework: based on Groovy test scripts, and Vagrant for virtual development enviroment.
- Set of Puppet files for baremetal deployment. Note: this feature has been delegated to the project Deploop 
buildoop: Main folder for Buildoop main controler.
conf: Buildoop configuration folder, BOM definitions, targets definitions.
deploy: Folder for deploy in VM and Baremetal systems. Based on Puppet and Chef.
sit: System Integration Testing tests for VM pseudo-cluster system.
recipes: Download, build and packaging recipes.
toolchain: Tools for cross-compiling for diferent targets.
- Download Groovy binary:
wget http://dl.bintray.com/groovy/maven/groovy-binary-2.3.3.zip 2. Clone the project:
git clone https://github.com/buildoop/buildoop.git
- Set the enviroment:
cd buildoop && source set-buildoop-env
- In order to build some packages you need install some dependecies:
- Usage examples:
Build the whole ecosystem for the distribution openbus-0.0.1:
buildoop openbus-0.0.1 -build
Build the zookeeper package for the distribuion openbus-0.0.1:
buildoop openbus-0.0.1 zookeeper -build
List the available distributions:
- For more commands:
GitHub projects forked in Buildoop
The https://github.com/buildoop project contains a set of GitHub projects forked from other authors. This forks are used by Buildoop in order to make relevant packages in the ecosystem.
The list of forked porjects are:
- Camus: Kafka Camus is LinkedIn's Kafka HDFS pipeline
- Marcelo Valle (Redoop) https://github.com/mvalleavila
- flume-ng-kafka-sink: Flume to Kafka Sink
- Marcelo Valle (Redoop) https://github.com/mvalleavila/flume-ng-kafka-sink
- storm-kafka: Storm Spout for Kafka
- Marcelo Valle (Redoop) https://github.com/mvalleavila/storm-kafka-0.8-plus
- Storm-0.9.1-Kafka-0.8-Test: Storm Topology for Kafka Spout example for testing
- Marcelo Valle (Redoop) https://github.com/mvalleavila/Storm-0.9.1-Kafka-0.8-Test
- storm-hbase: Storm to HBase connector
- P. Taylor Goetz (Hortonworks) https://github.com/ptgoetz/storm-hbase
- kafka-hue: Hue application for Apache Kafka
- Daniel Tardon (Redoop) https://github.com/danieltardon/kafka-hue
- AvroRepoKafkaProducerTest: kafka producer to send Avro Messages with an Avro schema repository
- Marcelo Valle (Redoop) https://github.com/mvalleavila/AvroRepoKafkaProducerTest
- avro-1.7.4-schema-repo: Avro Schema Repository Server
- Marcelo Valle (Redoop) https://github.com/mvalleavila/avro-1.7.4-schema-repo
- flume-ng-kafka-avro-sink: Apache Flume sink to produce Avro messages to Apache Kafka linked with Avro Schema Respository Server from Camus.
- Daniel Tardon (Redoop): https://github.com/danieltardon/flume-ng-kafka-avro-sink
- siddhi: Siddhi CEP is a lightweight, easy-to-use Open Source Complex Event Processing Engine (CEP).
Pull request flow
Clone the repository from your project fork:
$ git clone https://github.com/buildoop/buildoop.git
The clone has as active branch the "development branch"
$ git branch
Yo have to make your changes in the "development branch".
$ git add .
$ git commit -m "...."
$ git push origin
When you are ready to purpose a change to the original repository, you have to use the "Pull Request" button from GitHub interface.
The point is the pull request have to go to the "development branch" so the pull request revisor can check the change, pull to original "development branch", and the last step is to push this "development pull request" to the "master branch".
So the project has two branches:
- The "master branch": The deployable branch, only hard tested code and ready to use.
- The "development branch": Where the work is made and where the pull request has to make.
|Core Engine||Core building engine||Done|
|POM versioning||Simple BOM multi-versioning||Done|
|Git repsotory||Download sources from GIT||Done|
|Svn repsotory||Download sources from Subversion||Pending|
|Code refactoring||More elegant code||Forever Pending|
|Cross-Architecture||Cross build from different distributions||Pending|
|DEB Support||Debian/Ubuntu Support||Pending|
|Layers||Add/Modify features without modify the core folders||Pending|
|SIT||System Integration Tests||Pending|
-- Javi Roman firstname.lastname@example.org