Grow your team on GitHub
GitHub is home to over 28 million developers working together. Join them to grow your own development teams, manage permissions, and collaborate on projects.Sign up
Sahara aims to provide users with simple means to provision a data intensive cluster (Hadoop, Spark) by specifying several parameters like software versions, cluster topology, nodes hardware details and a few more.
Disk image elements for Savanna
Implementation of a new ROLLUP operator for Apache Pig, that results in optimal execution plans
Simple OpenStack Python bindings
Python logging handler for Logstash.
Decision trees library and more
Mirror of Apache Pig
This is the PIG ROLLUP repo
A possible implementation of a decision tree for SPARK
OpenStack Measurement Framework
Hadoop implementation of KNN graph building algorithms (Brute force, NNDescent, NNCtph, ...)
The Hadoop Fair Sojourn Protocol Scheduler
A set of tools to analyse Hadoop logs
This project deals with the implementation of k-means for multi-dimensional clustering.
Statistical Workload Injector for MapReduce - Project at UC Berkeley AMP Lab