Skip to content

Overview

Randy Guck edited this page Aug 10, 2015 · 2 revisions

#Overview Doradus is a REST service that extends a Cassandra NoSQL database with a graph-based data model, advanced indexing and search features, and a REST API. The Doradus query language (DQL) extends Lucene full-text queries with graph navigation features such as link paths, quantifiers, and transitive searches.

#Architecture Doradus is a pure Java application that can run as a daemon, Windows service, or console application. The REST API is provided by an embedded Jetty server. Each instance is a pure "peer"; multiple instances can access the same Cassandra cluster. A common practice is to run one Doradus and one Cassandra instance on each node. Each Doradus instance can be configured to rotate requests through multiple Cassandra instances. Doradus currently accesses Cassandra nodes using either the Thrift API or CQL.

#Storage Services A Doradus database cluster can host multiple schemas, called applications. Each application chooses a storage service to manage its data. Doradus offers multiple storage services, which offer different storage and performance features to benefit different types of applications.

##OLAP Service The Doradus OLAP storage service is best suited for structured, immutable or semi-mutable data such as time-oriented data: log records, events, messages, etc. It offers very dense space storage and high-performance analytical queries. Application data is partitioned into cubes called shards, which are typically time-based (e.g., one day's data per shard). Field values are stored as arrays that are compressed, loaded and scanned in memory, and cached on an LRU basis. Query speed is very fast, typically millions of objects per second.

##Spider Service The Doradus Spider service is best suited for unstructured/semi-structured data or data that is highly mutable: document stores, message graphs, directories, etc. It uses indexing techniques such as trie trees to support fully-inverted tables. Spider offers fine-grained updates, immediate indexing, table-level sharding, and other features that benefit full text applications.

##Logging Service The Doradus Logging service is optimized for fast loading, dense storage, and efficient query processing of log-based data. Tests show that the Logging service can store up to 500K events/second on a single node, and a 115 million event database requires only 1.5GB of disk space. Logging is ideal for large data sets of immutable, variable structure, time-based log data that require powerful aggregate queries with uniform performance.

#Data Model and Query Language Doradus uses an object/graph model. Each application has its own schema, which defines its tables. The accessible unit of a table is an object, which has a unique ID. Objects can have scalar fields, such as text and integers, and link fields, which form bidirectional relationships between objects. Doradus ensures referential integrity of relationships. Doradus OLAP requires all tables and fields to be predefined in the schema. Doradus Spider allows dynamically-added tables and dynamically-defined fields per object.

The Doradus query language (DQL) supports Lucene-like full text clauses such as terms, phrases, and ranges. Using link fields, DQL also supports path expressions, which can use quantifiers, filters, and transitive searches. DQL can be used in object queries, which return matching objects, and aggregate queries, which perform metric calculations on selected objects, optionally grouped with an arbitrary number of levels.

All Doradus storage services offer automatic data aging.

Doradus is designed to leverage the advantages of NoSQL technology. It supports idempotent (repeatable) updates, dynamic expansion, automatic load balancing, automatic failover, and more.

Clone this wiki locally