Skip to content
master
Switch branches/tags
Code
This branch is up to date with master.
Contribute

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
bin
 
 
 
 
doc
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Hustle

A column oriented, embarrassingly distributed, relational event database.

Features

  • column oriented - super fast queries
  • events - write only semantics
  • distributed insert - designed for petabyte scale distributed datasets with massive write loads
  • compressed - bitmap indexes, lz4, and prefix trie compression
  • relational - join gigantic data sets
  • partitioned - smart shards
  • embarrassingly distributed (based on Disco)
  • embarrassingly fast (uses LMDB)
  • NoSQL - Python DSL
  • bulk append only semantics
  • highly available, horizontally scalable
  • REPL/CLI query interface

Example Query

select(impressions.ad_id, impressions.date, h_sum(pix.amount), h_count(),
       where=((impressions.date < '2014-01-13') & (impressions.ad_id == 30010),
               pix.date < '2014-01-13'),
       join=(impressions.site_id, pix.site_id),
       order_by=impressions.date)

Installation

After cloning this repo, here are some considerations:

  • you will need Python 2.7 or higher - note that it probably won't work on 2.6 (has to do with pickling lambdas...)
  • you need to install Disco 0.5 and its dependencies - get that working first
  • you need to install Hustle and its 'deps' thusly:
cd hustle
sudo ./bootstrap.sh

Please refer to the Installation Guide for more details

Documentation

Hustle User Guide

Hustle Mailing List

Credits

Special thanks to following open-source projects:

Build Status: Travis-CI :: Travis-CI

About

Fork of Hustle - Originally developed at Chango - A column oriented, embarrassingly distributed relational event database.

Resources

License

Packages

No packages published