diff --git a/3.7/indexing-persistent.md b/3.7/indexing-persistent.md index 50d34fc4ae..5dcb5e567f 100644 --- a/3.7/indexing-persistent.md +++ b/3.7/indexing-persistent.md @@ -184,5 +184,3 @@ the sort order of those which are returned can be wrong (whenever the persistent index is consulted). To fix persistent indexes after a language change, delete and re-create them. -Skiplist indexes are not affected, because they are not persisted and -automatically rebuilt on every server start. diff --git a/3.8/indexing-persistent.md b/3.8/indexing-persistent.md index 50d34fc4ae..5dcb5e567f 100644 --- a/3.8/indexing-persistent.md +++ b/3.8/indexing-persistent.md @@ -184,5 +184,3 @@ the sort order of those which are returned can be wrong (whenever the persistent index is consulted). To fix persistent indexes after a language change, delete and re-create them. -Skiplist indexes are not affected, because they are not persisted and -automatically rebuilt on every server start. diff --git a/3.9/http/indexes-multi-dim.md b/3.9/http/indexes-multi-dim.md new file mode 100644 index 0000000000..f59448d737 --- /dev/null +++ b/3.9/http/indexes-multi-dim.md @@ -0,0 +1,7 @@ +--- +layout: default +--- +Working with multi-dimensional Indexes +======================================= + +{% docublock post_api_index_zkd %} diff --git a/3.9/indexing-multi-dim.md b/3.9/indexing-multi-dim.md new file mode 100644 index 0000000000..cec81311f4 --- /dev/null +++ b/3.9/indexing-multi-dim.md @@ -0,0 +1,135 @@ +--- +layout: default +description: A multi dimensional index allows to efficiently intersect multiple range queries +--- +# Multi-dimensional indexes + +The multi-dimensional index type (also called ZKD) provided by ArangoDB can be +used to efficiently intersect multiple range queries. + +A multi-dimensional index is setup by setting the index type to `"zkd"`. +The `fields` attribute describes which fields are used as dimensions. +The value of each dimension has to be a numeric (double) value. + +## Querying documents within a 3D box + +Assume we have documents in a collection `points` of the form + +```json +{"x": 12.9, "y": -284.0, "z": 0.02} +``` + +and we want to query all documents that contained within a box defined by +`[x0, x1] * [y0, y1] * [z0, z1]`. + +To do so one creates a multi-dimensional index on the attributes `x`, `y` and +`z`, e.g. in _arangosh_: + +```js +db.collection.ensureIndex({ + type: "zkd", + fields: ["x", "y", "z"], + fieldValueTypes: "double" +}); +``` + +Unlike for other indexes the order of the fields does not matter. + +`fieldValueTypes` is required and the only allowed value is `"double"`. +Future extensions of the index will allow other types. + +Now we can use the index in a query: + +```js +FOR p IN points + FILTER x0 <= p.x && p.x <= x1 + FILTER y0 <= p.y && p.y <= y1 + FILTER z0 <= p.z && p.z <= z1 + RETURN p +``` + +## Possible range queries + +Having an index on a set of fields does not require you to specify a full range +for every field. For each field you can decide if you want to bound +it from both sides, from one side only (i.e. only an upper or lower bound) +or not bound it at all. + +Furthermore you can use any comparison operator. The index supports `<=` and `>=` +naturally, `==` will be translated to the bound `[c, c]`. Strict comparison +is translated to their non-strict counterparts and a post-filter is inserted. + +```js +FOR p IN points + FILTER 2 <= p.x && p.x < 9 + FILTER y0 >= 80 + FILTER p.z == 4 + RETURN p +``` + +## Example Use Case + +If you build a calendar using ArangoDB you could create a collection for each user +that contains her appointments. The documents would roughly look as follows: + +```json + { + "from": 345365, + "to": 678934, + "what": "Dentist", + } +``` + +`from`/`to` are the timestamps when an appointment starts/ends. Having an +multi-dimensional index on the fields `["from", "to"]` allows you to query +for all appointments within a given time range efficiently. + +### Finding all appointments within a time range + +Given a time range `[f, t]` we want to find all appointments `[from, to]` that +are completely contained in `[f, t]`. Those appointments clearly satisfy the +condition + +``` +f <= from and to <= t +``` + +Thus our query would be: + +```js +FOR app IN appointments + FILTER f <= app.from + FILTER app.to <= t + RETURN app +``` + +### Finding all appointments that intersect a time range + +Given a time range `[f, t]` we want to find all appointments `[from, to]` that +intersect `[f, t]`. Two intervals `[f, t]` and `[from, to]` intersect if +and only if + +``` +a_2 <= b_1 and a_1 <= b_2 +``` + +Thus our query would be: + +```js +FOR app IN appointments + FILTER f <= app.to + FILTER app.from <= t + RETURN app +``` + +## Limitations + +Currently there are a few limitations: + +- Using array expansions for attributes is not possible (e.g. `array[*].attr`) +- The `sparse` property is not supported. +- You can only index numeric values that are representable as IEEE-754 double. +- A high number of dimensions (more than 5) can impact the performance considerably. +- The performance can vary depending on the dataset. Densely packed points can + lead to a high number of seeks. This behavior is typical for indexing using + space filling curves. diff --git a/3.9/indexing-persistent.md b/3.9/indexing-persistent.md index 50d34fc4ae..5dcb5e567f 100644 --- a/3.9/indexing-persistent.md +++ b/3.9/indexing-persistent.md @@ -184,5 +184,3 @@ the sort order of those which are returned can be wrong (whenever the persistent index is consulted). To fix persistent indexes after a language change, delete and re-create them. -Skiplist indexes are not affected, because they are not persisted and -automatically rebuilt on every server start. diff --git a/3.9/indexing-which-index.md b/3.9/indexing-which-index.md index 1f8ee3ae92..02d1877228 100644 --- a/3.9/indexing-which-index.md +++ b/3.9/indexing-which-index.md @@ -94,6 +94,24 @@ different usage scenarios: result TTL indexes will likely not be used for filtering and sort operations in user-land AQL queries. +- **multi-dimensional index** (ZKD): a multi dimensional index allows to + efficiently intersect multiple range queries. Typical use cases are querying + intervals that intersect a given point or interval. For example, if intervals + are stored in documents like + + ```json + { "from": 12, "to": 45 } + ``` + + then you can create an index over `from, to` utilize it with this query: + + ```js + FOR i IN intervals FILTER i.from <= t && t <= i.to RETURN i + ``` + + Currently only floating-point numbers (doubles) are supported as underlying + type for each dimension. + - **Geo index**: the geo index provided by ArangoDB allows searching for documents within a radius around a two-dimensional earth coordinate (point), or to find documents with are closest to a point. Document coordinates can either @@ -110,7 +128,7 @@ different usage scenarios: and a SORT or FILTER statement is used in conjunction with the distance function. -- **fulltext index**: a fulltext index can be used to index all words contained in +- **fulltext index**: a fulltext index can be used to index all words contained in a specific attribute of all documents in a collection. Only words with a (specifiable) minimum length are indexed. Word tokenization is done using the word boundary analysis provided by libicu, which is taking into account diff --git a/3.9/indexing.md b/3.9/indexing.md index 2c50f23727..7144e28c0c 100644 --- a/3.9/indexing.md +++ b/3.9/indexing.md @@ -16,5 +16,6 @@ There are special sections for - [Persistent Indexes](indexing-persistent.html) - [TTL Indexes](indexing-ttl.html) - [Fulltext Indexes](indexing-fulltext.html) + - [Multi-dimensional Indexes](indexing-multi-dim.html) - [Geo-spatial Indexes](indexing-geo.html) - [Vertex-centric Indexes](indexing-vertex-centric.html) diff --git a/_data/3.9-http.yml b/_data/3.9-http.yml index 3d0aa9f272..337487bacf 100644 --- a/_data/3.9-http.yml +++ b/_data/3.9-http.yml @@ -84,6 +84,8 @@ href: indexes-persistent.html - text: TTL href: indexes-ttl.html + - text: Multi-dimensional + href: indexes-multi-dim.html - text: Geo-Spatial href: indexes-geo.html - text: Fulltext diff --git a/_data/3.9-manual.yml b/_data/3.9-manual.yml index 483201c871..e6b6668a62 100644 --- a/_data/3.9-manual.yml +++ b/_data/3.9-manual.yml @@ -293,6 +293,8 @@ href: indexing-ttl.html - text: Fulltext Indexes href: indexing-fulltext.html + - text: Multi-dimensional Indexes + href: indexing-multi-dim.html - text: Geo-spatial Indexes href: indexing-geo.html - text: Vertex Centric Indexes