From a1248a5d53b4b12842369b90ebb1dc9fc44b2b89 Mon Sep 17 00:00:00 2001 From: Simran Spiller Date: Mon, 13 Nov 2023 15:44:16 +0100 Subject: [PATCH 1/3] SEARCH parallelism --- .../3.12/aql/high-level-operations/for.md | 2 +- .../3.12/aql/high-level-operations/search.md | 101 +++++++++++++----- .../arangosearch/performance.md | 10 ++ .../version-3.12/whats-new-in-3-12.md | 12 +++ 4 files changed, 95 insertions(+), 30 deletions(-) diff --git a/site/content/3.12/aql/high-level-operations/for.md b/site/content/3.12/aql/high-level-operations/for.md index 6bead8b68e..089e058ce2 100644 --- a/site/content/3.12/aql/high-level-operations/for.md +++ b/site/content/3.12/aql/high-level-operations/for.md @@ -93,7 +93,7 @@ Also see [Combining queries with subqueries](../fundamentals/subqueries.md). ## Options For collections and Views, the `FOR` construct supports an optional `OPTIONS` -clause to modify behavior. The general syntax is: +clause to modify the behavior. The general syntax is as follows:
FOR variableName IN expression OPTIONS { option: value, ... }
diff --git a/site/content/3.12/aql/high-level-operations/search.md b/site/content/3.12/aql/high-level-operations/search.md index e3897dabb0..b92ecbd804 100644 --- a/site/content/3.12/aql/high-level-operations/search.md +++ b/site/content/3.12/aql/high-level-operations/search.md @@ -237,7 +237,7 @@ You can use the special `includeAllFields` [`arangosearch` View property](../../index-and-search/arangosearch/arangosearch-views-reference.md#link-properties) to index all (sub-)attributes of the source documents if desired. -## SEARCH with SORT +## `SEARCH` with `SORT` The documents emitted from a View can be sorted by attribute values with the standard [SORT() operation](sort.md), using one or multiple @@ -283,32 +283,19 @@ a score of `0` will be returned for all documents. ## Search Options -The `SEARCH` operation accepts an options object with the following attributes: - -- `collections` (array, _optional_): array of strings with collection names to - restrict the search to certain source collections -- `conditionOptimization` (string, _optional_): controls how search criteria - get optimized. Possible values: - - `"auto"` (default): convert conditions to disjunctive normal form (DNF) and - apply optimizations. Removes redundant or overlapping conditions, but can - take quite some time even for a low number of nested conditions. - - `"none"`: search the index without optimizing the conditions. - -- `countApproximate` (string, _optional_): controls how the total count of rows - is calculated if the `fullCount` option is enabled for a query or when - a `COLLECT WITH COUNT` clause is executed - - `"exact"` (default): rows are actually enumerated for a precise count. - - `"cost"`: a cost-based approximation is used. Does not enumerate rows and - returns an approximate result with O(1) complexity. Gives a precise result - if the `SEARCH` condition is empty or if it contains a single term query - only (e.g. `SEARCH doc.field == "value"`), the usual eventual consistency - of Views aside. - -**Examples** - -Given a View with three linked collections `coll1`, `coll2` and `coll3` it is -possible to return documents from the first two collections only and ignore the -third using the `collections` option: +The `SEARCH` operation supports an optional `OPTIONS` clause to modify the +behavior. The general syntax is as follows: + +
SEARCH expression OPTIONS { option: value, ... }
+ +### `collections` + +You can specify an array of strings with collection names to restrict the search +to certain source collections. + +Given a View with three linked collections `coll1`, `coll2`, and `coll3`, you +can return documents from the first two collections only and ignore the third +collection by setting the `collections` option to `["coll1", "coll2"]`: ```aql FOR doc IN viewName @@ -316,5 +303,61 @@ FOR doc IN viewName RETURN doc ``` -The search expression `true` matches all View documents. You can use any valid -expression here while limiting the scope to the chosen source collections. +The search expression `true` in the above example matches all View documents. +You can use any valid expression here while limiting the scope to the chosen +source collections. + +### `conditionOptimization` + +You can specify one of the following values for this option to control how +search criteria get optimized: + +- `"auto"` (default): convert conditions to disjunctive normal form (DNF) and + apply optimizations. Removes redundant or overlapping conditions, but can + take quite some time even for a low number of nested conditions. +- `"none"`: search the index without optimizing the conditions. + + +See [Optimizing View and inverted index query performance](../../index-and-search/arangosearch/performance.md#condition-optimization-options) +for an example. + +### `countApproximate` + +This option controls how the total count of rows is calculated if the `fullCount` +option is enabled for a query or when a `COLLECT WITH COUNT` clause is executed. +You can set it to one of the following values: + +- `"exact"` (default): rows are actually enumerated for a precise count. +- `"cost"`: a cost-based approximation is used. Does not enumerate rows and + returns an approximate result with O(1) complexity. Gives a precise result + if the `SEARCH` condition is empty or if it contains a single term query + only (e.g. `SEARCH doc.field == "value"`), the usual eventual consistency + of Views aside. + +See [Optimizing View and inverted index query performance](../../index-and-search/arangosearch/performance.md#count-approximation) +for an example. + +### `parallelism` + +A `SEARCH` operation can optionally process index segments in parallel using +multiple threads. This can speed up search queries but increases CPU and memory +utilization. + +If you omit the `parallelism` option or set it to a value of `1`, the search +execution is not parallelized. If the value is greater than `1`, then up to that +many worker threads can be used for concurrently processing index segments. +The maximum number of total parallel execution threads is defined by the +[`--arangosearch.execution-threads-limit` startup option](../../components/arangodb-server/options.md#--arangosearchexecution-threads-limit) +that defaults to twice the number of CPU cores. + +The `parallelism` option should be considered a hint. Not all search queries are +eligible. Queries also don't wait for the specified number of threads to be +available. They start immediately even if only single-threaded and may acquire +more threads later. + +```aql +FOR doc IN restaurantsView + SEARCH ANALYZER(GEO_INTERSECTS(rect, doc.geometry), "geojson") + OPTIONS { parallelism: 16 } + RETURN doc.geometry +``` diff --git a/site/content/3.12/index-and-search/arangosearch/performance.md b/site/content/3.12/index-and-search/arangosearch/performance.md index 7858925cdc..f5edc4120f 100644 --- a/site/content/3.12/index-and-search/arangosearch/performance.md +++ b/site/content/3.12/index-and-search/arangosearch/performance.md @@ -675,3 +675,13 @@ db._createView("articlesView", "search-alias", { indexes: [ { collection: "articles", index: "inv-idx" } ] }); ``` + +## Parallel index segment processing + +Introduced in: v3.12.0 + +You can speed up `SEARCH` queries against Views using the `parallelism` option +to process index segment using multiple threads. + +See [`SEARCH` operation in AQL](../../aql/high-level-operations/search.md#parallelism) +for details. diff --git a/site/content/3.12/release-notes/version-3.12/whats-new-in-3-12.md b/site/content/3.12/release-notes/version-3.12/whats-new-in-3-12.md index baa2d5586e..81cfba41b4 100644 --- a/site/content/3.12/release-notes/version-3.12/whats-new-in-3-12.md +++ b/site/content/3.12/release-notes/version-3.12/whats-new-in-3-12.md @@ -31,6 +31,18 @@ for examples. This feature is only available in the Enterprise Edition. +### `SEARCH` parallelization + +In search queries against Views, you can set the new `parallelism` option for +`SEARCH` operations to optionally process index segments in parallel using +multiple threads. This can speed up search queries. + +The new `--arangosearch.execution-threads-limit` startup option controls how +many threads can be used in total for search queries. + +See [`SEARCH` operation in AQL](../../aql/high-level-operations/search.md#parallelism) +for details. + ## Analyzers From 0c90d59367dc18ff765a587be120a0cc05973b67 Mon Sep 17 00:00:00 2001 From: Simran Spiller Date: Mon, 13 Nov 2023 15:58:08 +0100 Subject: [PATCH 2/3] Mention new default parallelism startup option --- site/content/3.12/aql/high-level-operations/search.md | 10 ++++++---- .../release-notes/version-3.12/whats-new-in-3-12.md | 2 ++ 2 files changed, 8 insertions(+), 4 deletions(-) diff --git a/site/content/3.12/aql/high-level-operations/search.md b/site/content/3.12/aql/high-level-operations/search.md index b92ecbd804..cd1408d54b 100644 --- a/site/content/3.12/aql/high-level-operations/search.md +++ b/site/content/3.12/aql/high-level-operations/search.md @@ -343,10 +343,12 @@ A `SEARCH` operation can optionally process index segments in parallel using multiple threads. This can speed up search queries but increases CPU and memory utilization. -If you omit the `parallelism` option or set it to a value of `1`, the search -execution is not parallelized. If the value is greater than `1`, then up to that -many worker threads can be used for concurrently processing index segments. -The maximum number of total parallel execution threads is defined by the +If you omit the `parallelism` option, then the default parallelism as defined by +the [`--arangosearch.default-parallelism` startup option](../../components/arangodb-server/options.md#--arangosearchdefault-parallelism) +is used. If you set it to a value of `1`, the search execution is not +parallelized. If the value is greater than `1`, then up to that many worker +threads can be used for concurrently processing index segments. The maximum +number of total parallel execution threads is defined by the [`--arangosearch.execution-threads-limit` startup option](../../components/arangodb-server/options.md#--arangosearchexecution-threads-limit) that defaults to twice the number of CPU cores. diff --git a/site/content/3.12/release-notes/version-3.12/whats-new-in-3-12.md b/site/content/3.12/release-notes/version-3.12/whats-new-in-3-12.md index 81cfba41b4..69872d3d21 100644 --- a/site/content/3.12/release-notes/version-3.12/whats-new-in-3-12.md +++ b/site/content/3.12/release-notes/version-3.12/whats-new-in-3-12.md @@ -37,6 +37,8 @@ In search queries against Views, you can set the new `parallelism` option for `SEARCH` operations to optionally process index segments in parallel using multiple threads. This can speed up search queries. +The default value for the `parallelism` option is defined by the new +`--arangosearch.default-parallelism` startup option that defaults to `1`. The new `--arangosearch.execution-threads-limit` startup option controls how many threads can be used in total for search queries. From 394f8659f329bc8922691773173e0dca59b0ebff Mon Sep 17 00:00:00 2001 From: Simran Spiller Date: Tue, 14 Nov 2023 15:52:18 +0100 Subject: [PATCH 3/3] Add metric --- .../release-notes/version-3.12/api-changes-in-3-12.md | 7 +++---- .../3.12/release-notes/version-3.12/whats-new-in-3-12.md | 8 +++++++- 2 files changed, 10 insertions(+), 5 deletions(-) diff --git a/site/content/3.12/release-notes/version-3.12/api-changes-in-3-12.md b/site/content/3.12/release-notes/version-3.12/api-changes-in-3-12.md index 12a459954a..4ba7bed735 100644 --- a/site/content/3.12/release-notes/version-3.12/api-changes-in-3-12.md +++ b/site/content/3.12/release-notes/version-3.12/api-changes-in-3-12.md @@ -115,11 +115,10 @@ produced no warnings. #### Metrics API -The metrics endpoint includes the following new metric: +The metrics endpoint includes the following new metrics: -| Label | Description | -|:------|:------------| -| `arangodb_aql_cursors_active` | Current number of active AQL query cursors. | +- `arangodb_aql_cursors_active` +- `arangodb_search_execution_threads_demand` ### Endpoints moved diff --git a/site/content/3.12/release-notes/version-3.12/whats-new-in-3-12.md b/site/content/3.12/release-notes/version-3.12/whats-new-in-3-12.md index 69872d3d21..3beec56ef1 100644 --- a/site/content/3.12/release-notes/version-3.12/whats-new-in-3-12.md +++ b/site/content/3.12/release-notes/version-3.12/whats-new-in-3-12.md @@ -39,8 +39,14 @@ multiple threads. This can speed up search queries. The default value for the `parallelism` option is defined by the new `--arangosearch.default-parallelism` startup option that defaults to `1`. + The new `--arangosearch.execution-threads-limit` startup option controls how -many threads can be used in total for search queries. +many threads can be used in total for search queries. The new +`arangodb_search_execution_threads_demand` metric reports the number of threads +that queries request. If it is below the configured thread limit, it coincides +with the number of active threads. If it exceeds the limit, some queries cannot +currently get the threads as requested and may have to use a single thread until +more become available. See [`SEARCH` operation in AQL](../../aql/high-level-operations/search.md#parallelism) for details.