Skip to content

Spider Overview

JoeWinter edited this page Feb 17, 2015 · 2 revisions

[Table of Contents](https://github.com/dell-oss/Doradus/wiki/Spider Databases: Table-of-Contents) | [Previous](https://github.com/dell-oss/Doradus/wiki/Starting a Spider-Database) | [Next](https://github.com/dell-oss/Doradus/wiki/When Spider Works Best)
Spider Database Overview: Spider Overview


Doradus is a server application that runs on top of a NoSQL database, providing high-value features that make the database easier to use and more valuable to applications. Doradus currently runs on top of the Cassandra database, though its architecture allows it to use other persistence engines. The architecture also allows different *storage services* to be used, each offering indexing, storage, and query techniques that benefit specific application types.

Spider is a one of the available storage services: the other is the OLAP service. Spider supports the core Doradus data model and query language (DQL) and extends these with the following unique features:

  • Fully-inverted indexing: Spider offers field analyzers that can index all scalar fields of all objects. Dictionaries and term vectors are used for text fields, and trie trees are used for numeric and timestamp fields. These techniques provide efficient searching for full and partial values as well as range searching.

  • Stored-only fields: Indexing can selectively be disabled for scalar fields so they can be stored and retrieved without consuming indexing space.

  • Dynamic fields: Fields do not have to be predefined in the schema and can be added dynamically on a per-object basis. Dynamically-added fields are fully indexed. Objects in the same table can have different sets of fields. An object can even have millions of fields.

  • Query extensions: Spider fully supports DQL object and aggregate queries and provides extensions that leverage fully-inverted indexing. For example, in addition to field-specific term, phrase, and equality clauses, Spider allows "any field" clauses that perform these searches on all fields within a table, including dynamically-added fields.

  • Extended aggregate grouping: Spider extends Doradus aggregate queries with special compound and composite grouping features. Compound grouping allows one or more metric functions to be computed with multiple grouping sets using a single pass through the data. Composite grouping performs group-level "roll-up" metric computations for non-leaf groups.

  • Table-level sharding: Traditional inverted indexing techniques experience performance issues when the number of indexed objects grows to millions and beyond. Spider offers table-level, time-based sharding, which automatically splits indexing records into time-based shards, which maintain efficient search performance for time-oriented queries. Both scalar and link fields can leverage time-based sharding.

  • Table-level aging: Spider allows each table to define a timestamp aging field, which is used to automatically delete expired objects. Data aging is performed in background tasks with definable schedules.

Clone this wiki locally