Skip to content

Architecture (Spider)

JoeWinter edited this page Mar 23, 2015 · 2 revisions

[Table of Contents](https://github.com/dell-oss/Doradus/wiki/Spider Databases: Table-of-Contents) | Previous | Next
Doradus Spider Databases: Architecture


Doradus is a Java server application that leverages and extends the Cassandra NoSQL database. At a high level, it is a REST service that sits between applications and a Cassandra cluster, adding powerful features to—and hiding complexities in—the underlying database. This allows applications to leverage the benefits of NoSQL such as horizontal scalability, replication, and failover while enjoying rich features such as full text searching, bi-directional relationships, and powerful analytic queries.

An overview of Doradus architecture is depicted below:

Key components of this architecture are summarized below:

  • Apps: One or more applications access a Doradus server instance using a simple REST API. A JMX API is available to monitor Doradus and perform administrative functions.

  • DoradusServer: This core component controls server startup, shutdown, and services. Entry points are provided to run the server as a stand-alone application, as a Windows service (via procrun), or embedded within another application.

  • Services: Doradus’ architecture encapsulates functions within service modules. Services are initialized based on the server’s doradus.yaml configuration. Services provide functions such as the REST API (an embedded Jetty server), Schema processing, and physical DB access. A special class of storage services provide storage and access features for specific application types. Doradus currently provides two storage services:

    • OLAP Service: A Doradus database configured to use the OLAP storage service is termed a Doradus OLAP Database. OLAP uses online analytical processing techniques to provide dense storage and very fast processing of analytical queries. This service is ideal for applications that use immutable or semi-mutable time-series data.

    • Spider Service: A Doradus database configured to use the Spider storage service is termed a Doradus Spider Database. The Spider service supports schemaless applications, fully inverted indexing, fine-grained updates, table-level sharding, and other features that support applications that use highly mutable and/or variable data.

Doradus can be configured to use both storage services in a single instance.

  • Cassandra Cluster: Doradus currently uses the Apache Cassandra NoSQL database for persistence. Future releases are intended to use other data stores. Cassandra performs the "heavy lifting" in terms of persistent, replication, load balancing, replication, and more.

By default, Doradus operates in single-tenant mode, which means that all applications are stored in a single Cassandra keyspace. In multi-tenant mode, named tenants own one or more applications stored in a separate keyspace. Multi-tenant mode allows multiple applications to share a common Doradus cluster while providing data isolation and security. Full details on configuring and operating multi-tenant mode are described in the Doradus Administration documentation.

The minimal deployment configuration is a single Doradus instance and a single Cassandra instance running on the same machine. On Windows, these instances can be installed as services. The Doradus server can also be embedded in the same JVM as an application.

Multiple Doradus and Cassandra instances can be deployed to scale a cluster horizontally. An example of a Doradus/Cassandra multi-node cluster is shown below:

This example demonstrates several deployment features:

  • One Doradus instance and one Cassandra instance are typically deployed on each node.

  • Doradus instances are peers, hence an application can submit requests to any Doradus instance in the cluster.

  • Each Doradus instance is typically configured to use all network near Cassandra instances. This allows it to distribute requests to local Cassandra instances, providing automatic failover should a Cassandra instance fail.

  • Cassandra can be configured to know which nodes are in the same rack and which racks are in the same data center. With this knowledge, Cassandra uses replication strategies to balance network bandwidth and recoverability from node-, rack-, and data center-level failures.

Details on installing and configuring Doradus/Cassandra clusters are provided in the Doradus Administration document.

Clone this wiki locally