GitHub

Latest version: 0.1.0 (12th Sep 2014)

What is it good for?

As data store for low-latency 'Big Data' apps
Fast analysis over 'Big Data' with low budget
Store, index and query huge amounts of data
Make your Hadoop outputs accessible to every application (e.g. aggregated statistics)
Provide billions of datasets in a very short time
Store terabytes of data on a single instance without any performance impact!
Only immutable data is supported, you cannot insert and update single datasets
Works well on AWS infrastructure even on provisionized EBS volumes
Data delivery management and versionizing

Features

Index your JSON data
Query over indexed and non-indexed data
Geospatial indexes
Range queries (between, greather than, less than and so on)
Data replication (to another database)
Sharding and replication (planned, not yet implemented)
Very fast imports (the limitation is the ethernet interface or disk)
Multithreaded search
High compression
No downtimes on import (data is available until next import is finished)
Fast rollbacks
Java Driver and R Connector
Data delivery management and versionizing

Core ideas of jumboDB

Process and index the data in a parallelized environment like Hadoop (you can also run it locally)
All data is immutable, because data usally gets replaced or extended with further data deliveries from Hadoop
Immutable data allows an easy parallelization in data search
Preorganized and sorted data is better searchable and results in faster responses
Sorted data allows grouped read actions
Sort your data by the major use case to speed up queries
Compression helps to increase disk speed
Don't keep all indexes in memory, because the data is too big!

Big Data for the masses!

Balancing performance and cost efficiency

Affordable Big Data Low IO requirements, efficient usage of disk space, low memory footprint
Fast disk access through compression Snappy achieves compression rates up to 5 times increasing disk IO efficiency and saving storage cost
Batch processing - delivery driven approach "Write once - read many" one batch of data is an atomic write with the rollback possibility
Supports JSON documents Schema flexibility for rapid application development
Power and scalability of Apache Hadoop For batch processing, aggregation and indexing of your data.(e.g. writes up to 500.000 JSON documents per second into the data store)
Low read latency for end-user apps Optimized querying even for large result sets through multithreading and efficient data streaming (e.g. 100.000 JSON documents returned in less than a second)
Hadoop Connector, Java Driver and R connector are available

Setup JumboDB

Please see the JumboDB Wiki https://github.com/comsysto/jumbodb/wiki

Licenses

The connectors are licensed under Apache License 2.0: http://www.apache.org/licenses/LICENSE-2.0.html

The database is licensed under Apache License 2.0: http://www.apache.org/licenses/LICENSE-2.0.html

Name		Name	Last commit message	Last commit date
Latest commit History 565 Commits
bundle		bundle
commons		commons
connectors		connectors
database		database
documentation		documentation
importer/hadoop-json		importer/hadoop-json
test		test
.gitignore		.gitignore
README.md		README.md
build.gradle		build.gradle
gradle.properties		gradle.properties
settings.gradle		settings.gradle

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

What is it good for?

Features

Core ideas of jumboDB

Big Data for the masses!

Balancing performance and cost efficiency

Setup JumboDB

Licenses

About

Releases 1

Packages

Contributors 4

Languages

comsysto/jumbodb

Folders and files

Latest commit

History

Repository files navigation

What is it good for?

Features

Core ideas of jumboDB

Big Data for the masses!

Balancing performance and cost efficiency

Setup JumboDB

Licenses

About

Resources

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 4

Languages

Packages