From 07f81c772b34f8b0c7dcec1c3590a4a900971a10 Mon Sep 17 00:00:00 2001 From: Rakhi Prathap Date: Tue, 6 May 2025 11:52:23 +0530 Subject: [PATCH 01/19] AUS rough draft --- .../auto-update-statistics.adoc | 390 ++++++++++++++++++ modules/n1ql/partials/nav.adoc | 1 + 2 files changed, 391 insertions(+) create mode 100644 modules/n1ql/pages/n1ql-language-reference/auto-update-statistics.adoc diff --git a/modules/n1ql/pages/n1ql-language-reference/auto-update-statistics.adoc b/modules/n1ql/pages/n1ql-language-reference/auto-update-statistics.adoc new file mode 100644 index 000000000..1fed58918 --- /dev/null +++ b/modules/n1ql/pages/n1ql-language-reference/auto-update-statistics.adoc @@ -0,0 +1,390 @@ += Auto Update Statistics +:description: Auto Update Statistics is a solution to keep optimizer statistics up to date. + +[abstract] +{description} + +== Overview + +Optimizer statistics are generated when the UPDATE STATISTICS statement is executed and in 7.6.0 onwards, when an index is built. These statistics are used by the optimizer to generate optimal query plans. However, data can change over time and as a consequence, these statistics can become out of date which can cause sub-optimal plan generation. It is beneficial to routinely update optimizer statistics. + +Auto Update Statistics is a solution to keep optimizer statistics up to date. The statistics are automatically updated during a configured schedule. The system uses expiration policies to identify statistics that are stale and refreshes them by executing the required UPDATE STATISTICS statements. + +NOTE: The AUS system does not create optimizer statistics for expressions that do not already have statistics. It only maintains and updates statistics that have already been collected. +AUS only maintains statistics for expressions on index keys. It does not maintain statistics for expressions on non-indexed fields. + +== Prerequisites + +* The AUS task can only be performed on 8.0 Query nodes. + +* AUS can only be enabled in a cluster that has been fully migrated to 8.0. Or in a cluster that has 7.6.x nodes and 8.0 Query nodes. However, in such a cluster, the 7.6.x Query nodes will not perform any AUS tasks. + +* For Couchbase clusters that are migrating from pre-7.6.x versions to a cluster configuration like that above, the AUS task can only be enabled once the automatic migration of optimizer statistics has been migrated to the _query collection in the _system scope of the buckets. + +== How AUS Works + +* AUS is an opt-in feature that users can choose to enable. Users must explicitly opt-in and configure a schedule for when AUS should run. This allows users to schedule this “maintenance window” at a time that best suits their workloads. + +* The AUS system uses expiration policies to determine when statistics are out of date and require an update. + +* Every Query node in the cluster participates in the AUS task according to the same schedule. + +* The AUS task comprises two phases, the Evaluation phase and the Update Phase. + +* The Evaluation phase evaluates which statistics are stale based on the expiration policies. + +* The Update phase executes the appropriate UPDATE STATISTICS statements to refresh the statistics deemed as stale by the Evaluation phase. The AUS system maintains the original resolution at which the statistics were collected. + +-- +NOTE: In a Query node, the following occurs during the AUS scheduled window: +* The Query node chooses the collections to perform AUS operations on. No other Query node will perform AUS on these collections during this window. +* For each collection the Evaluation and Update phases are performed on the statistics gathered on expressions on fields in the collection. +-- + +=== Expiration Policy + +The AUS system uses expiration policies to determine when statistics are out of date and require an update. + +“Change Percentage" is the expiration policy used. It determines the threshold for how much data in an index must change before the statistics are determined as out of date and requiring an update. + +It is the percentage of changes to the items in an index that must be exceeded to trigger a statistic to be refreshed. + +If the percentage of change to the items in an index since the last time the statistic was collected exceeds the defined threshold, the statistic is flagged as stale by the AUS system. The AUS operation then will update those stale statistics. + +=== Evaluation Phase + +The Evaluation phase evaluates whether the existing statistics are stale based on the expiration policy. + +During the Evaluation phase, for every index, AUS calculates how much the items/data in the index have changed since the last time the optimizer statistics of the index key expressions were updated. If the percentage of change exceeds the configured threshold ( the "change_percentage"), the statistics for the index key expressions for that index, are marked as stale and requiring an update. + +=== Update Phase + +Once the Evaluation of a collection is completed, the Update phase will be executed. + +This phase will update the statistics flagged as stale in the previous phase by executing the appropriate UPDATE STATISTICS statements. + +The statistics update performed by AUS maintains the original resolution at which the statistic was collected. + +Once the AUS operation has been performed on all the statistics in all the buckets, the query node should schedule the next AUS run. + +If the scheduled window has ended but the AUS task is still not over, the task is aborted and the next AUS run is scheduled. + + +IMPORTANT: When AUS is enabled, the first scheduled task run can result in all the existing optimizer statistics being updated, regardless of the expiration policy evaluation. This is because the index change information might not have been recorded till the first scheduled run. + +== Managing AUS + +To enable AUS for the cluster, the user must explicitly opt-in and set a schedule. + +The AUS system provides the system:aus catalog that stores the global configurations. Users can opt-in and set a schedule for AUS by appropriately modifying the configurations set in the system:aus catalog. + +=== system:aus + +The system:aus catalog stores a single document that contains the global configurations of AUS. +Users can update this document to change these configurations. + +Only SELECT and UPDATE DMLs are allowed on this keyspace. + +NOTE: To execute SELECT on system:aus, the query_system_catalog is required. +To execute UPDATE on system:aus, the query_manage_system_catalog is required + +In the document, each attribute represents a particular global configuration. +The attribute names and the corresponding configuration that they represent are as follows: + +[cols="1a,3a,1a"] +|=== +| Name | Description | Schema + +| **enable** + +__required__ +| Whether AUS is enabled for the cluster or not. +Set the value to “true” to enable AUS. +If the attribute is set to true the schedule attribute must be set. + +*Default:* `false` + +| Boolean + +| **schedule** + +__optional__ + +|The schedule according to which AUS will be performed. + +If AUS is enabled i.e. the “enable” attribute set to “true”, then the schedule attribute is required. +Otherwise, it is not. + +| <> object + +| **change_percentage** + +__optional__ + +| The percentage of changes to the items in an index that must be exceeded to trigger a statistic to be refreshed. +Integer between 0 and 100. + +*Default:* `10` + +*Example:* `30` + +| Interger + +| **all_buckets** + +__required__ + + +| Whether AUS should be performed on all buckets or only those buckets whose metadata information is “loaded” on the Query node. + +*Default:* `false` + +| Boolean + +|=== + + +[[aus_schedule]] +==== Schedule +[cols="1a,4a,1a"] +|=== +| Name | Description | Schema + +| **start_time** + +__required__ + +| The start time of the AUS schedule. String representing the time in “HH:MM” format. +The schedule’s `start_time` attribute must be earlier than the schedule’s `end_time` attribute by at least 30 minutes. + +*Example:* `“01:30”` + +| String + +| **end_time** + +__required__ + +| The end time of the AUS schedule. String representing the time in “HH:MM” format. + +The schedule’s `end_time` attribute must be later than the schedule’s `start_time` attribute by at least 30 minutes. + +*Example:* `“01:30”` +| String + +| **days** + +__required__ + +| The days of the week the AUS schedule should run. +Valid Values in the array: Monday, Tuesday, Wednesday, Thursday, Friday, Saturday, Sunday + +*Example:* `[“Saturday”, “Sunday”]` + +| String array + +| **timezone** + +__optional__ + +| The timezone to consider the start and end times. String representing of the timezone in an IANA timezone representation. + +*Default:* `“UTC”` + +*Example:* `“Asia/Calcutta”` + +| String + +|=== + +When changing these global configurations, it is important to note the following: + +* *Enabling AUS*: If AUS was previously not enabled, and now enabled, then the next AUS task is scheduled. +* *Re-scheduling AUS*: Cancel any scheduled AUS task. Schedule a new AUS task accordingly. No running AUS tasks will be cancelled. +* *Other Setting Changes*: If the other global settings i.e. all_buckets, change_percentage is changed, From the next scheduled AUS run, the new values of the settings will be used. + +==== Example +==== +Using UPDATE Statement to enable AUS and set a schedule with some customizations: + +.Query +[source,sqlpp] +---- +UPDATE system:aus SET enable = true, change_percentage = 20, +schedule = { "start_time": "01:30", + "end_time": "04:30", + "timezone": "Asia/Calcutta", + "days": ["Monday", "Friday"] + }; +---- +==== + +=== system:aus_settings + +The system also provides mechanisms to customize certain AUS configurations at the bucket, scope and collection level. The system:aus_settings maintains these more granular configurations. + +The document id of a document in this keyspace must be the full path of the bucket/ scope/ collection name. + +All SQL++ DMLs are allowed on this keyspace. + +NOTE: To execute SELECT on system:aus_settings, the query_system_catalog is required. +To execute UPDATE, DELETE, INSERT, UPSERT on system:aus_settings, the query_manage_system_catalog is required. + +By default, this keyspace has no documents. The configurations of AUS for all keyspaces by default are what is set at the global level. + +A settings document must be explicitly inserted to customize AUS for the keyspace. + +In the settings document, each attribute represents a particular granular configuration. The attribute names and the corresponding configuration that they represent are as follows: + +[cols="1a,3a,1a"] +|=== +| Name | Description | Schema + +| **enable** + +__optional__ +| Whether AUS is enabled for the bucket/scope/collection or not. +Set to “false” to explicitly disable. + +If AUS is disabled at the global/ cluster level, then it cannot be enabled at the bucket/scope/collection level. +If AUS is disabled at a higher level, it cannot be overridden at a more granular level. +But, if AUS is enabled at a higher level, it can be overridden at a more granular level. + +Example to illustrate this: + +If AUS is disabled for a bucket - it is disabled for all scopes and collections within it. It cannot be overridden at the scope or collection level. +If AUS is enabled for a bucket - it can be overridden at the scope and collection level. + +| Boolean + +| **change_percentage** + +__optional__ + +| The percentage of changes to the items in an index that must be exceeded to trigger a statistic to be refreshed. +Integer between 0 and 100. + +If set at a bucket-level, this is the change percentage value considered for all scopes and collections within the bucket, unless overridden at a lower level. +If set at a scope-level, this is the change percentage value considered for all collections within the scope unless overridden at a lower level. + +*Example:* `30` + +| Integer + +| **update_statistics_timeout** + +__optional__ + +| A number representing a duration in seconds. +The command times out when this timeout period is reached. If omitted, a default timeout value is calculated based on the number of samples used. + +If set for a keyspace, this would be set as the timeout for every UPDATE STATISTICS statement executed by AUS for that keyspace. + +If set at a bucket-level, this is the update_statistics_timeout value considered for all scopes and collections within the bucket, unless overridden at a lower level. + +If set at a scope-level, this is the update_statistics_timeout value considered for all collections within the scope unless overridden at a lower level. + +| Number + +|=== + +==== Example +==== +A sample query to add a scope-level setting. These settings would apply to all collections within the scope unless overridden at a collection level. particular collection would look like this: + +.Query +[source,sqlpp] +---- +INSERT INTO system:aus_settings ( KEY, VALUE ) + VALUES ({ "default:bucket1.scope1", {"change_percentage": 20 } ) +---- +==== + +== Scheduling AUS Tasks +When AUS is enabled and a schedule set, every Query node in the cluster participates in AUS. As a result, each Query node will have its own AUS task assigned, and this task will perform AUS operations on that specific node. + +The system:tasks_cache catalog keeps a record of recent tasks including the AUS tasks across all Query nodes. + +== Viewing AUS Tasks +The system:tasks_cache catalog maintains the list of recent tasks. For every AUS task, each Query node will have an entry in the system:tasks_cache. View all recent AUS tasks by querying the system:tasks_cache keyspace. + +The AUS task entries have the “class” field set to “auto_update_statistics”. + +View all recent AUS tasks: +[source,sqlpp] +---- +SELECT * FROM system:tasks_cache WHERE class = "auto_update_statistics"; +---- + +To find the next scheduled AUS tasks: +[source,sqlpp] +---- +SELECT * FROM system:tasks_cache WHERE class = "auto_update_statistics" AND state = "scheduled"; +---- + +To view the recent AUS tasks on a particular node, filter by the ”node” attribute: +[source,sqlpp] +---- +SELECT * FROM system:tasks_cache WHERE class = "auto_update_statistics" AND state = "scheduled" and node = "127.0.0.1:8091"; +---- + +The entries for completed AUS tasks will have information about the AUS task, which keyspaces had their statistics updated, if any errors occurred, etc. + +=== Example +==== +Sample task entry for a successful AUS task on a query node: + +.Query +[source,sqlpp] +---- +{ + "tasks_cache": { + "class": "auto_update_statistics", + "delay": "21.707164s", + "id": "4b90bb39-ca1b-55f1-84f0-4d3137c88bf8", + "name": "bc1ab6e9-9f33-4a8f-86ad-40d74c50af5f", + "queryContext": "", + "results": { + "configuration": { + "all_buckets": true, + "change_percentage": 20, + "end_time": "2024-11-19 20:00:00 +0530 IST", + "internal_version": 1, + "start_time": "2024-11-19 19:16:00 +0530 IST" + }, + + "keyspaces_updated": [ + "default:bucket1.scope1.customers" + ] + }, + + "startTime": "2024-11-19T19:16:00.001+05:30", + "state": "completed", + "stopTime": "2024-11-19T19:16:03.154+05:30", + "subClass": "", + "submitTime": "2024-11-19T19:15:38.292+05:30" + } +} +---- +==== + +Attributes: --> Should this be added to the system:tasks_cache doc? +* keyspaces_updated: A list of keyspaces that had statistics that had to be updated. +* configuration: the task configuration with which this task was executed. + +For more information about system:tasks_cache and its attributes, see xref:n1ql:n1ql-intro/sysinfo.adoc#sys-tasks-cache[]. + +== Aborting a Running AUS Task +To cancel the execution of a running AUS task, execute an appropriate DELETE statement against the system:tasks_cache. + +To cancel all running AUS tasks: + +[source,sqlpp] +---- +DELETE FROM system:tasks_cache WHERE class = "auto_update_statistics" AND state = "running"; +---- + +Use appropriate filters in the WHERE clause to be selective of which tasks to cancel or delete. To cancel the running AUS task only only on a particular node: + +[source,sqlpp] +---- +DELETE FROM system:tasks_cache WHERE class = "auto_update_statistics" AND state = "running" AND node = "127.0.0.1:8091"; +---- + +When the DELETE statement affects an AUS task entry in the “scheduled” or “running” state, the scheduled/ running task is cancelled and the next AUS task is automatically scheduled. + +Remember to add appropriate filters in the DELETE statement so that the statement affects the intended task entry. Otherwise inadvertently other tasks can get cancelled or the task history gets deleted. + +== AUS Load +When the AUS task is executing, it can cause an increased load on the Query node. This is because of all the activity of evaluating the statistics and executing statistics refresh. That is why it is important that the schedule for AUS is set to best suit the workloads of the cluster. + +Additionally, an attempt is made to ensure that during the scheduled window, the AUS task is not started when the load of the Query node is too high. If the load is too high, the task is not started and the next AUS task is scheduled. \ No newline at end of file diff --git a/modules/n1ql/partials/nav.adoc b/modules/n1ql/partials/nav.adoc index adc7e548e..799032379 100644 --- a/modules/n1ql/partials/nav.adoc +++ b/modules/n1ql/partials/nav.adoc @@ -28,6 +28,7 @@ ** xref:n1ql:advanced.adoc[] *** xref:n1ql:n1ql-language-reference/cost-based-optimizer.adoc[] *** xref:guides:cbo.adoc[] + *** xref:n1ql:n1ql-language-reference/auto-update-statistics.adoc[] *** xref:n1ql:n1ql-language-reference/transactions.adoc[] *** xref:guides:transactions.adoc[] *** xref:n1ql:n1ql-language-reference/flex-indexes.adoc[] From 07b482cb870f5fbe203d1edb12c539562ba08ae9 Mon Sep 17 00:00:00 2001 From: Rakhi Prathap Date: Tue, 6 May 2025 12:01:32 +0530 Subject: [PATCH 02/19] Update system info table to add aus catalogs --- modules/n1ql/pages/n1ql-intro/sysinfo.adoc | 2 ++ 1 file changed, 2 insertions(+) diff --git a/modules/n1ql/pages/n1ql-intro/sysinfo.adoc b/modules/n1ql/pages/n1ql-intro/sysinfo.adoc index 55d137636..6c076dfd5 100644 --- a/modules/n1ql/pages/n1ql-intro/sysinfo.adoc +++ b/modules/n1ql/pages/n1ql-intro/sysinfo.adoc @@ -73,6 +73,8 @@ a| [%hardbreaks] <> <> <> +xref:n1ql:n1ql-language-reference/auto-update-statistics.adoc#systemaus[system:aus] +xref:n1ql:n1ql-language-reference/auto-update-statistics.adoc#systemaus_settings[system:aus_settings] |=== == Authentication and Client Privileges From 90a148aa94c690570d6ecec76b1ce3cb6e4b4fe5 Mon Sep 17 00:00:00 2001 From: Rakhi Prathap Date: Thu, 8 May 2025 15:14:27 +0530 Subject: [PATCH 03/19] Rough draft --- modules/n1ql/pages/advanced.adoc | 1 + 1 file changed, 1 insertion(+) diff --git a/modules/n1ql/pages/advanced.adoc b/modules/n1ql/pages/advanced.adoc index 8ceae0f26..1500723bc 100644 --- a/modules/n1ql/pages/advanced.adoc +++ b/modules/n1ql/pages/advanced.adoc @@ -24,6 +24,7 @@ The cost-based optimizer takes into account the cost of memory, CPU, network tra * xref:n1ql:n1ql-language-reference/cost-based-optimizer.adoc[] * xref:guides:cbo.adoc[] +* xref:n1ql:n1ql-language-reference/auto-update-statistics.adoc[] == {sqlpp} Support for Couchbase Transactions From 861f574c5291c6e8185305f92f83989164cc40c5 Mon Sep 17 00:00:00 2001 From: Rakhi Prathap Date: Fri, 9 May 2025 15:19:58 +0530 Subject: [PATCH 04/19] Rewrite intro --- .../auto-update-statistics.adoc | 20 +++++++++++++++++++ 1 file changed, 20 insertions(+) diff --git a/modules/n1ql/pages/n1ql-language-reference/auto-update-statistics.adoc b/modules/n1ql/pages/n1ql-language-reference/auto-update-statistics.adoc index 1fed58918..997d3077c 100644 --- a/modules/n1ql/pages/n1ql-language-reference/auto-update-statistics.adoc +++ b/modules/n1ql/pages/n1ql-language-reference/auto-update-statistics.adoc @@ -6,6 +6,26 @@ == Overview +Auto Update Statistics (AUS) is a feature that enables you to keep the optimizer statistics up to date by automatically updating them at scheduled intervals. + +Optimizer statistics are used by the Cost Based Optimizer to generate optimal query plans. +They are created when you run the UPDATE STATISTICS statement or when an index is built (available from 7.6.0 onwards). +However, over time, as your data changes, these statistics can become outdated. +When that happens, the Cost Based Optimizer may produce suboptimal query plans, which can negatively impact the query performance. + +AUS ensures that statistics remain current and checks for outdated statistics regularly. +The statistics are automatically updated during a configured schedule. + +AUS uses expiration policies to identify the outdated statistics and refreshes them by using UPDATE STATISTICS. + +NOTE: ++ +* AUS only updates statistics that already exist. It does not create new ones. +* Also, it focuses on maintaining statistics for expressions based on index keys. +It does not maintain statistics for expressions on non-indexed fields. + + + Optimizer statistics are generated when the UPDATE STATISTICS statement is executed and in 7.6.0 onwards, when an index is built. These statistics are used by the optimizer to generate optimal query plans. However, data can change over time and as a consequence, these statistics can become out of date which can cause sub-optimal plan generation. It is beneficial to routinely update optimizer statistics. Auto Update Statistics is a solution to keep optimizer statistics up to date. The statistics are automatically updated during a configured schedule. The system uses expiration policies to identify statistics that are stale and refreshes them by executing the required UPDATE STATISTICS statements. From 03b6ca29b98fcaaeb123cd33164f5865255d54a0 Mon Sep 17 00:00:00 2001 From: Rakhi Prathap Date: Tue, 20 May 2025 15:31:20 +0530 Subject: [PATCH 05/19] Update content --- .../auto-update-statistics.adoc | 106 +++++++++++------- 1 file changed, 64 insertions(+), 42 deletions(-) diff --git a/modules/n1ql/pages/n1ql-language-reference/auto-update-statistics.adoc b/modules/n1ql/pages/n1ql-language-reference/auto-update-statistics.adoc index 997d3077c..44e489faa 100644 --- a/modules/n1ql/pages/n1ql-language-reference/auto-update-statistics.adoc +++ b/modules/n1ql/pages/n1ql-language-reference/auto-update-statistics.adoc @@ -6,66 +6,62 @@ == Overview -Auto Update Statistics (AUS) is a feature that enables you to keep the optimizer statistics up to date by automatically updating them at scheduled intervals. +Auto Update Statistics (AUS) is a feature that helps keep the optimizer statistics up to date by automatically identifying and refreshing outdated statistics. -Optimizer statistics are used by the Cost Based Optimizer to generate optimal query plans. -They are created when you run the UPDATE STATISTICS statement or when an index is built (available from 7.6.0 onwards). -However, over time, as your data changes, these statistics can become outdated. -When that happens, the Cost Based Optimizer may produce suboptimal query plans, which can negatively impact the query performance. +NOTE: AUS is only available in Couchbase Server 8.0 and later. For more information, see <>. -AUS ensures that statistics remain current and checks for outdated statistics regularly. -The statistics are automatically updated during a configured schedule. +Optimizer statistics are used by the xref:n1ql:n1ql-language-reference/cost-based-optimizer.adoc[Cost Based Optimizer] to generate optimal query plans. +These statistics are created when you run the xref:n1ql:n1ql-language-reference/updatestatistics.adoc[] statement or build an index (available from 7.6.0 onwards). +However, as data changes over time, the statistics can become stale, leading to sub-optimal query plans and reduced query performance. -AUS uses expiration policies to identify the outdated statistics and refreshes them by using UPDATE STATISTICS. +AUS addresses this challenge by running a scheduled task on the Query nodes. +The task regularly checks for outdated statistics using expiration policies and refreshes them by executing UPDATE STATISTICS. +This way, AUS ensures that the optimizer statistics are always up to date and the CBO has the current information. -NOTE: -+ -* AUS only updates statistics that already exist. It does not create new ones. -* Also, it focuses on maintaining statistics for expressions based on index keys. -It does not maintain statistics for expressions on non-indexed fields. +Some important points to note about AUS: +* **Updates, not creates**: AUS only updates existing statistics; it does not create new ones. +* **Index-key focussed**: AUS primarily maintains statistics for expressions based on index keys, and not for expressions on non-indexed fields. +[#availability] +== Availability -Optimizer statistics are generated when the UPDATE STATISTICS statement is executed and in 7.6.0 onwards, when an index is built. These statistics are used by the optimizer to generate optimal query plans. However, data can change over time and as a consequence, these statistics can become out of date which can cause sub-optimal plan generation. It is beneficial to routinely update optimizer statistics. +AUS is only available on Query nodes running version 8.0 and later. -Auto Update Statistics is a solution to keep optimizer statistics up to date. The statistics are automatically updated during a configured schedule. The system uses expiration policies to identify statistics that are stale and refreshes them by executing the required UPDATE STATISTICS statements. +* You can enable AUS in a cluster that has been fully migrated to 8.0, or in a cluster that includes both 7.6.x and 8.0 Query nodes. +In such mixed clusters, the 7.6.x Query nodes will not perform any AUS tasks. -NOTE: The AUS system does not create optimizer statistics for expressions that do not already have statistics. It only maintains and updates statistics that have already been collected. -AUS only maintains statistics for expressions on index keys. It does not maintain statistics for expressions on non-indexed fields. - -== Prerequisites - -* The AUS task can only be performed on 8.0 Query nodes. - -* AUS can only be enabled in a cluster that has been fully migrated to 8.0. Or in a cluster that has 7.6.x nodes and 8.0 Query nodes. However, in such a cluster, the 7.6.x Query nodes will not perform any AUS tasks. - -* For Couchbase clusters that are migrating from pre-7.6.x versions to a cluster configuration like that above, the AUS task can only be enabled once the automatic migration of optimizer statistics has been migrated to the _query collection in the _system scope of the buckets. +* For clusters migrating from pre-7.6.x versions (to a cluster configuration described above), the AUS task can only be enabled once the automatic migration of optimizer statistics to the `_query` collection in the `_system` scope of the buckets has been completed. == How AUS Works -* AUS is an opt-in feature that users can choose to enable. Users must explicitly opt-in and configure a schedule for when AUS should run. This allows users to schedule this “maintenance window” at a time that best suits their workloads. +AUS is an opt-in feature. +You must explicitly enable it and configure a schedule for when it should run. +This allows you to align the AUS task with your cluster's workload patterns and run it at a time that minimizes the impact on performance. -* The AUS system uses expiration policies to determine when statistics are out of date and require an update. +The AUS task consists of two phases: -* Every Query node in the cluster participates in the AUS task according to the same schedule. +. **Evaluation Phase**: +In this phase, AUS evaluates which statistics are out of date based on the expiration policies. -* The AUS task comprises two phases, the Evaluation phase and the Update Phase. +. **Update phase**: +After the evaluation, this phase executes the necessary UPDATE STATISTICS statements to refresh the statistics identified as stale. +The AUS system ensures that the refreshed statistics maintains the same level of details as the original statistics. -* The Evaluation phase evaluates which statistics are stale based on the expiration policies. +Every Query node in the cluster participates in the AUS task following the same configured schedule. +During the scheduled AUS window, each Query node performs the following: -* The Update phase executes the appropriate UPDATE STATISTICS statements to refresh the statistics deemed as stale by the Evaluation phase. The AUS system maintains the original resolution at which the statistics were collected. +* The Query node selects the specific collections to perform AUS operations on. +This ensures that no other Query node processes the same collection during this window. --- -NOTE: In a Query node, the following occurs during the AUS scheduled window: -* The Query node chooses the collections to perform AUS operations on. No other Query node will perform AUS on these collections during this window. -* For each collection the Evaluation and Update phases are performed on the statistics gathered on expressions on fields in the collection. --- +* For each selected collection, the Evaluation and Update phases are executed on the statistics gathered on expressions based on fields within that collection. === Expiration Policy -The AUS system uses expiration policies to determine when statistics are out of date and require an update. +AUS uses expiration policies to determine when statistics are out of date and require an update. + +“Change Percentage" is the expiration policy used. It determines the threshold for how much data in an index must change before the statistics are determined as out of date and requiring an update. -“Change Percentage" is the expiration policy used. It determines the threshold for how much data in an index must change before the statistics are determined as out of date and requiring an update. It is the percentage of changes to the items in an index that must be exceeded to trigger a statistic to be refreshed. @@ -92,7 +88,7 @@ If the scheduled window has ended but the AUS task is still not over, the task i IMPORTANT: When AUS is enabled, the first scheduled task run can result in all the existing optimizer statistics being updated, regardless of the expiration policy evaluation. This is because the index change information might not have been recorded till the first scheduled run. -== Managing AUS +== Enabling and Scheduling AUS To enable AUS for the cluster, the user must explicitly opt-in and set a schedule. @@ -404,7 +400,33 @@ When the DELETE statement affects an AUS task entry in the “scheduled” or Remember to add appropriate filters in the DELETE statement so that the statement affects the intended task entry. Otherwise inadvertently other tasks can get cancelled or the task history gets deleted. -== AUS Load -When the AUS task is executing, it can cause an increased load on the Query node. This is because of all the activity of evaluating the statistics and executing statistics refresh. That is why it is important that the schedule for AUS is set to best suit the workloads of the cluster. +== Manage AUS Load + +When the AUS task is executing, it can cause an increased load on the Query node. +This is because of all the activity of evaluating the statistics and executing statistics refresh. +That is why it is important that the schedule for AUS is set to best suit the workloads of the cluster. + +Additionally, an attempt is made to ensure that during the scheduled window, the AUS task is not started when the load of the Query node is too high. +If the load is too high, the task is not started and the next AUS task is scheduled. + + + + + + + +== Understanding How AUS Works + +AUS is an opt-in feature that you can choose to enable or disable. +You must explicitly enable it and schedule it to run at a time that best suits the workloads of the cluster. + +Once enabled, AUS operates as follows: +* During the scheduled window, each Query node in the cluster evaluates the statistics for the collections assigned to it. +All Query nodes in the cluster participate in the AUS task according to the same schedule. +* Each AUS task comprises two phases: +** Evaluation phase +*** Evaluates which statistics are stale based on the expiration policies. +** Update phase +*** Executes the appropriate UPDATE STATISTICS statements to refresh the statistics deemed as stale by the Evaluation phase. +The AUS system maintains the original resolution at which the statistics were collected. -Additionally, an attempt is made to ensure that during the scheduled window, the AUS task is not started when the load of the Query node is too high. If the load is too high, the task is not started and the next AUS task is scheduled. \ No newline at end of file From 87be093853c62b73d9b418c4877f7abf20c7d11d Mon Sep 17 00:00:00 2001 From: Rakhi Prathap Date: Thu, 22 May 2025 10:23:59 +0530 Subject: [PATCH 06/19] Update content --- .../auto-update-statistics.adoc | 178 ++++++++---------- 1 file changed, 79 insertions(+), 99 deletions(-) diff --git a/modules/n1ql/pages/n1ql-language-reference/auto-update-statistics.adoc b/modules/n1ql/pages/n1ql-language-reference/auto-update-statistics.adoc index 44e489faa..0f397fb69 100644 --- a/modules/n1ql/pages/n1ql-language-reference/auto-update-statistics.adoc +++ b/modules/n1ql/pages/n1ql-language-reference/auto-update-statistics.adoc @@ -1,5 +1,5 @@ = Auto Update Statistics -:description: Auto Update Statistics is a solution to keep optimizer statistics up to date. +:description: Auto Update Statistics (AUS) is a solution designed to keep the optimizer statistics up to date. [abstract] {description} @@ -8,104 +8,101 @@ Auto Update Statistics (AUS) is a feature that helps keep the optimizer statistics up to date by automatically identifying and refreshing outdated statistics. -NOTE: AUS is only available in Couchbase Server 8.0 and later. For more information, see <>. - Optimizer statistics are used by the xref:n1ql:n1ql-language-reference/cost-based-optimizer.adoc[Cost Based Optimizer] to generate optimal query plans. These statistics are created when you run the xref:n1ql:n1ql-language-reference/updatestatistics.adoc[] statement or build an index (available from 7.6.0 onwards). However, as data changes over time, the statistics can become stale, leading to sub-optimal query plans and reduced query performance. -AUS addresses this challenge by running a scheduled task on the Query nodes. -The task regularly checks for outdated statistics using expiration policies and refreshes them by executing UPDATE STATISTICS. -This way, AUS ensures that the optimizer statistics are always up to date and the CBO has the current information. - -Some important points to note about AUS: +AUS addresses this challenge by running a scheduled task on the query nodes that regularly checks for outdated statistics using expiration policies and refreshes them by executing xref:n1ql:n1ql-language-reference/updatestatistics.adoc[]. +// This way, AUS ensures that the optimizer statistics are always up to date and the CBO has the current information. -* **Updates, not creates**: AUS only updates existing statistics; it does not create new ones. -* **Index-key focussed**: AUS primarily maintains statistics for expressions based on index keys, and not for expressions on non-indexed fields. +[NOTE] +==== +* AUS is only available in Couchbase Server 8.0 and later. For more information, see <>. +* AUS only updates existing statistics; it does not create new ones. +* AUS primarily maintains statistics for expressions based on index keys, and not for expressions on non-indexed fields. +==== [#availability] == Availability -AUS is only available on Query nodes running version 8.0 and later. +AUS is only available on query nodes running version 8.0 and later. -* You can enable AUS in a cluster that has been fully migrated to 8.0, or in a cluster that includes both 7.6.x and 8.0 Query nodes. -In such mixed clusters, the 7.6.x Query nodes will not perform any AUS tasks. +* You can enable AUS in a cluster that has been fully migrated to 8.0, or in a cluster that includes both 7.6.x and 8.0 query nodes. +In such mixed clusters, the 7.6.x query nodes will not perform any AUS tasks. -* For clusters migrating from pre-7.6.x versions (to a cluster configuration described above), the AUS task can only be enabled once the automatic migration of optimizer statistics to the `_query` collection in the `_system` scope of the buckets has been completed. +* For clusters migrating from pre-7.6.x versions (to a configuration described above), the AUS task can only be enabled once the automatic migration of optimizer statistics to the `_query` collection in the `_system` scope of the buckets has been completed. == How AUS Works AUS is an opt-in feature. You must explicitly enable it and configure a schedule for when it should run. -This allows you to align the AUS task with your cluster's workload patterns and run it at a time that minimizes the impact on performance. +This allows you to align the AUS task with your cluster's workloads and run it at a time that minimizes the impact on performance. -The AUS task consists of two phases: +An AUS task consists of two phases: . **Evaluation Phase**: -In this phase, AUS evaluates which statistics are out of date based on the expiration policies. +In this phase, AUS evaluates whether the exisiting statistics are stale based on the <>. +For every index, AUS calculates the percentage of change to measure how much the data in the index has changed since the last update of the optimizer statistics for the index key expressions. +If this percentage exceeds the configured `change_percentage` threshold, the statistics for the index's key expressions are marked as stale and require an update. -. **Update phase**: -After the evaluation, this phase executes the necessary UPDATE STATISTICS statements to refresh the statistics identified as stale. +. **Update Phase**: +After the evaluation, this phase executes the necessary xref:n1ql:n1ql-language-reference/updatestatistics.adoc[] statements to refresh the statistics identified as stale. The AUS system ensures that the refreshed statistics maintains the same level of details as the original statistics. -Every Query node in the cluster participates in the AUS task following the same configured schedule. -During the scheduled AUS window, each Query node performs the following: - -* The Query node selects the specific collections to perform AUS operations on. -This ensures that no other Query node processes the same collection during this window. - -* For each selected collection, the Evaluation and Update phases are executed on the statistics gathered on expressions based on fields within that collection. - -=== Expiration Policy +//Once the Evaluation of a collection is completed, the Update phase will be executed. -AUS uses expiration policies to determine when statistics are out of date and require an update. +//This phase will update the statistics flagged as stale in the previous phase by executing the appropriate UPDATE STATISTICS statements. -“Change Percentage" is the expiration policy used. It determines the threshold for how much data in an index must change before the statistics are determined as out of date and requiring an update. +//The statistics update performed by AUS maintains the original resolution at which the statistic was collected. +Every query node in the cluster participates in the AUS task following the same configured schedule. +During the scheduled AUS window, each query node performs the following: -It is the percentage of changes to the items in an index that must be exceeded to trigger a statistic to be refreshed. +* The query node selects the specific collections to perform AUS operations on. +This ensures that no other query node processes the same collection during this window. -If the percentage of change to the items in an index since the last time the statistic was collected exceeds the defined threshold, the statistic is flagged as stale by the AUS system. The AUS operation then will update those stale statistics. - -=== Evaluation Phase - -The Evaluation phase evaluates whether the existing statistics are stale based on the expiration policy. - -During the Evaluation phase, for every index, AUS calculates how much the items/data in the index have changed since the last time the optimizer statistics of the index key expressions were updated. If the percentage of change exceeds the configured threshold ( the "change_percentage"), the statistics for the index key expressions for that index, are marked as stale and requiring an update. - -=== Update Phase +* For each selected collection, the Evaluation and Update phases are executed on the statistics gathered on expressions based on fields within that collection. -Once the Evaluation of a collection is completed, the Update phase will be executed. +* Once the AUS operation is complete for all statistics in all buckets, the query node schedules the next AUS run. -This phase will update the statistics flagged as stale in the previous phase by executing the appropriate UPDATE STATISTICS statements. +* If the scheduled window ends before the AUS task is complete, the task is aborted and the next AUS run is scheduled. -The statistics update performed by AUS maintains the original resolution at which the statistic was collected. +IMPORTANT: When AUS is enabled, the very first scheduled task run might update all existing optimizer statistics, regardless of the expiration policy evaluation. +This is because the index change information might not have been recorded prior to this initial run. -Once the AUS operation has been performed on all the statistics in all the buckets, the query node should schedule the next AUS run. +__Add a flowchart here to illustrate the AUS process__ -If the scheduled window has ended but the AUS task is still not over, the task is aborted and the next AUS run is scheduled. +[#expiration_policy] +=== Expiration Policy +AUS uses expiration policies to determine when statistics are outdated and require an update. +The policy is based on the percentage of changes to data within an index. +It defines the threshold for how much data in an index must change before the statistics are considered outdated. -IMPORTANT: When AUS is enabled, the first scheduled task run can result in all the existing optimizer statistics being updated, regardless of the expiration policy evaluation. This is because the index change information might not have been recorded till the first scheduled run. +If the percentage of changed data (since the last time the statistic was collected) exceeds the defined threshold, AUS flags the statistic as stale. +The subsequent AUS operation then updates this statistic. -== Enabling and Scheduling AUS +== Enable and Schedule AUS -To enable AUS for the cluster, the user must explicitly opt-in and set a schedule. +To use AUS for your cluster, you must first enable it and configure a schedule. -The AUS system provides the system:aus catalog that stores the global configurations. Users can opt-in and set a schedule for AUS by appropriately modifying the configurations set in the system:aus catalog. +AUS stores its global configurations in the `system:aus` catalog. +You can enable AUS and set its schedule by modifying the appropraite configurations within this catalog. === system:aus -The system:aus catalog stores a single document that contains the global configurations of AUS. -Users can update this document to change these configurations. +The `system:aus` catalog contains a single document that holds all the global configurations of AUS. +You can update this document to modify the settings. -Only SELECT and UPDATE DMLs are allowed on this keyspace. - -NOTE: To execute SELECT on system:aus, the query_system_catalog is required. -To execute UPDATE on system:aus, the query_manage_system_catalog is required +[NOTE] +==== +* Only SELECT and UPDATE DMLs are allowed on this keyspace. +* To execute SELECT on `system:aus`, you need the `query_system_catalog` role. +* To execute UPDATE on `system:aus`, you need the `query_manage_system_catalog` role. +==== -In the document, each attribute represents a particular global configuration. -The attribute names and the corresponding configuration that they represent are as follows: +Each attribute in the document represents a particular global configuration. +The following are the attribute names and the configurations they represent: [cols="1a,3a,1a"] |=== @@ -113,9 +110,10 @@ The attribute names and the corresponding configuration that they represent are | **enable** + __required__ -| Whether AUS is enabled for the cluster or not. -Set the value to “true” to enable AUS. -If the attribute is set to true the schedule attribute must be set. +| Indicates whether AUS is enabled for the cluster or not. + +Set this attribute to `true` to enable AUS. +If set to `true`, then the `schedule` attribute must be set. *Default:* `false` @@ -124,30 +122,30 @@ If the attribute is set to true the schedule attribute must be set. | **schedule** + __optional__ -|The schedule according to which AUS will be performed. +| Defines the schedule for AUS operations. -If AUS is enabled i.e. the “enable” attribute set to “true”, then the schedule attribute is required. -Otherwise, it is not. +This attribute is required only if `enable` is set to `true`. | <> object | **change_percentage** + __optional__ -| The percentage of changes to the items in an index that must be exceeded to trigger a statistic to be refreshed. -Integer between 0 and 100. +| The percentage of change to items within an index that must be exceeded for the statistics to be refereshed. +This is the threshold for determining whether the statistics are stale or not. -*Default:* `10` +The value must be an integer between `0` and `100`. -*Example:* `30` +For example, a value of `30` means that if 30% or more of the items in an index have changed, the statistics for that index will be refreshed. + +*Default:* `10` | Interger | **all_buckets** + __required__ - -| Whether AUS should be performed on all buckets or only those buckets whose metadata information is “loaded” on the Query node. +| Indicates whether AUS should be performed on all buckets or only those buckets whose metadata information is loaded on the query node. *Default:* `false` @@ -165,8 +163,8 @@ __required__ | **start_time** + __required__ -| The start time of the AUS schedule. String representing the time in “HH:MM” format. -The schedule’s `start_time` attribute must be earlier than the schedule’s `end_time` attribute by at least 30 minutes. +| The start time of the AUS schedule in “HH:MM” format. +The `start_time` must be at least 30 minutes earlier than the `end_time`. *Example:* `“01:30”` @@ -175,18 +173,18 @@ The schedule’s `start_time` attribute must be earlier than the schedule’s `e | **end_time** + __required__ -| The end time of the AUS schedule. String representing the time in “HH:MM” format. +| The end time of the AUS schedule in “HH:MM” format. -The schedule’s `end_time` attribute must be later than the schedule’s `start_time` attribute by at least 30 minutes. +The `end_time` must be at least 30 minutes later than the `start_time`. -*Example:* `“01:30”` +*Example:* `“05:30”` | String | **days** + __required__ -| The days of the week the AUS schedule should run. -Valid Values in the array: Monday, Tuesday, Wednesday, Thursday, Friday, Saturday, Sunday +| An array of strings specifying the days on which the AUS schedule runs. +Valid values include: Monday, Tuesday, Wednesday, Thursday, Friday, Saturday, Sunday. *Example:* `[“Saturday”, “Sunday”]` @@ -195,17 +193,18 @@ Valid Values in the array: Monday, Tuesday, Wednesday, Thursday, Friday, Saturda | **timezone** + __optional__ -| The timezone to consider the start and end times. String representing of the timezone in an IANA timezone representation. +| The timezone that applies to the schedule's start and end times. +The value must be a valid IANA timezone string. *Default:* `“UTC”` -*Example:* `“Asia/Calcutta”` +*Example:* `US/Pacific”` | String |=== -When changing these global configurations, it is important to note the following: +When changing the global configurations, consider the following: * *Enabling AUS*: If AUS was previously not enabled, and now enabled, then the next AUS task is scheduled. * *Re-scheduling AUS*: Cancel any scheduled AUS task. Schedule a new AUS task accordingly. No running AUS tasks will be cancelled. @@ -400,7 +399,7 @@ When the DELETE statement affects an AUS task entry in the “scheduled” or Remember to add appropriate filters in the DELETE statement so that the statement affects the intended task entry. Otherwise inadvertently other tasks can get cancelled or the task history gets deleted. -== Manage AUS Load +== Managing AUS Load When the AUS task is executing, it can cause an increased load on the Query node. This is because of all the activity of evaluating the statistics and executing statistics refresh. @@ -411,22 +410,3 @@ If the load is too high, the task is not started and the next AUS task is schedu - - - - -== Understanding How AUS Works - -AUS is an opt-in feature that you can choose to enable or disable. -You must explicitly enable it and schedule it to run at a time that best suits the workloads of the cluster. - -Once enabled, AUS operates as follows: -* During the scheduled window, each Query node in the cluster evaluates the statistics for the collections assigned to it. -All Query nodes in the cluster participate in the AUS task according to the same schedule. -* Each AUS task comprises two phases: -** Evaluation phase -*** Evaluates which statistics are stale based on the expiration policies. -** Update phase -*** Executes the appropriate UPDATE STATISTICS statements to refresh the statistics deemed as stale by the Evaluation phase. -The AUS system maintains the original resolution at which the statistics were collected. - From 7c5dfb3c9ac6b4a5eda10c4712b1aaac2081b7c5 Mon Sep 17 00:00:00 2001 From: Rakhi Prathap Date: Mon, 26 May 2025 15:10:47 +0530 Subject: [PATCH 07/19] Update content --- .../auto-update-statistics.adoc | 174 +++++++++++------- 1 file changed, 110 insertions(+), 64 deletions(-) diff --git a/modules/n1ql/pages/n1ql-language-reference/auto-update-statistics.adoc b/modules/n1ql/pages/n1ql-language-reference/auto-update-statistics.adoc index 0f397fb69..e24b1cb1e 100644 --- a/modules/n1ql/pages/n1ql-language-reference/auto-update-statistics.adoc +++ b/modules/n1ql/pages/n1ql-language-reference/auto-update-statistics.adoc @@ -84,11 +84,19 @@ The subsequent AUS operation then updates this statistic. == Enable and Schedule AUS -To use AUS for your cluster, you must first enable it and configure a schedule. +To start using AUS for your cluster, you will need to enable it and configure a schedule. -AUS stores its global configurations in the `system:aus` catalog. -You can enable AUS and set its schedule by modifying the appropraite configurations within this catalog. +AUS stores its global configurations in the <> catalog. +You can enable AUS and set its schedule by modifying the relevant configurations within this catalog. +If you need more granluar control, you can use the <> catalog to customize certain AUS configurations at the bucket, scope, and collection levels. + +Once AUS is enabled and a schedule is set, every query node in the cluster participates in AUS. +Each query node gets its own AUS task, which performs AUS operations on that specific node. +You can find a record of recent AUS tasks across all query nodes in the `system:tasks_cache` catalog. +For more information, see <>. + +[#system_aus] === system:aus The `system:aus` catalog contains a single document that holds all the global configurations of AUS. @@ -206,13 +214,14 @@ The value must be a valid IANA timezone string. When changing the global configurations, consider the following: -* *Enabling AUS*: If AUS was previously not enabled, and now enabled, then the next AUS task is scheduled. -* *Re-scheduling AUS*: Cancel any scheduled AUS task. Schedule a new AUS task accordingly. No running AUS tasks will be cancelled. -* *Other Setting Changes*: If the other global settings i.e. all_buckets, change_percentage is changed, From the next scheduled AUS run, the new values of the settings will be used. +* *Enabling AUS*: If AUS was previously disabled and is now enabled, the next AUS task will be scheduled immediately. +* *Rescheduling AUS*: Currently scheduled AUS task will be canceled, and a new AUS task will be scheduled according to the updated schedule. +Running AUS tasks will not be canceled. +* *Other Settings*: If other global settings such as `all_buckets` or `change_percentage` are modified, the new values will be applied during the next scheduled AUS run. ==== Example ==== -Using UPDATE Statement to enable AUS and set a schedule with some customizations: +Using the UPDATE statement to enable AUS and set a schedule with some customizations: .Query [source,sqlpp] @@ -226,51 +235,53 @@ schedule = { "start_time": "01:30", ---- ==== +[#system_aus_settings] === system:aus_settings -The system also provides mechanisms to customize certain AUS configurations at the bucket, scope and collection level. The system:aus_settings maintains these more granular configurations. - -The document id of a document in this keyspace must be the full path of the bucket/ scope/ collection name. - -All SQL++ DMLs are allowed on this keyspace. - -NOTE: To execute SELECT on system:aus_settings, the query_system_catalog is required. -To execute UPDATE, DELETE, INSERT, UPSERT on system:aus_settings, the query_manage_system_catalog is required. +The `system:aus_settings` catalog stores granular configuration settings for AUS. +These settings can be applied at the bucket, scope, and collection levels. -By default, this keyspace has no documents. The configurations of AUS for all keyspaces by default are what is set at the global level. +By default, this catalog has no documents, and the AUS settings for all keyspaces inherit the configurations defined at the global level. +In other words, unless you explicitly configure AUS for a specific keyspace, it will use the global AUS settings. -A settings document must be explicitly inserted to customize AUS for the keyspace. +To customize AUS for a specific keyspace, you must insert a settings document into the `system:aus_settings` catalog. +The document ID of a document in this keyspace must be the full path of the bucket, scope, and collection. -In the settings document, each attribute represents a particular granular configuration. The attribute names and the corresponding configuration that they represent are as follows: +Each attribute in the document represents a particular granular configuration. +The following are the attribute names and the configurations they represent: -[cols="1a,3a,1a"] +[cols="1a,4a,1a"] |=== | Name | Description | Schema | **enable** + __optional__ -| Whether AUS is enabled for the bucket/scope/collection or not. -Set to “false” to explicitly disable. +| Indicates whether AUS is enabled for the bucket, scope, collection. + +Set it to `false` to explicitly disable AUS. -If AUS is disabled at the global/ cluster level, then it cannot be enabled at the bucket/scope/collection level. -If AUS is disabled at a higher level, it cannot be overridden at a more granular level. -But, if AUS is enabled at a higher level, it can be overridden at a more granular level. +AUS settings are hierarchical and follow the order: cluster > bucket > scope > collection. + +If AUS is disabled at higher level, it cannot be enabled at a more granular level. +However, if AUS is enabled at a higher level, it can be disabled at a more granular level. -Example to illustrate this: - -If AUS is disabled for a bucket - it is disabled for all scopes and collections within it. It cannot be overridden at the scope or collection level. -If AUS is enabled for a bucket - it can be overridden at the scope and collection level. +For example, +-- +* If AUS is disabled for a bucket, it is automatically disabled for all scopes and collections within it. +The setting cannot be overridden at the scope or collection level. +* If AUS is enabled for a bucket, it can be overridden at the scope and collection level. +-- | Boolean | **change_percentage** + __optional__ -| The percentage of changes to the items in an index that must be exceeded to trigger a statistic to be refreshed. -Integer between 0 and 100. +| The percentage of change to items within an index that must be exceeded for the statistics to be refereshed. -If set at a bucket-level, this is the change percentage value considered for all scopes and collections within the bucket, unless overridden at a lower level. -If set at a scope-level, this is the change percentage value considered for all collections within the scope unless overridden at a lower level. +The value must be an integer between `0` and `100`. + +If set at a bucket level, this value applies to all scopes and collections within the bucket, unless overridden at a lower level. + +If set at a scope level, this value applies to all collections within the scope, unless overridden at a lower level. *Example:* `30` @@ -279,22 +290,32 @@ If set at a scope-level, this is the change percentage value considered for all | **update_statistics_timeout** + __optional__ -| A number representing a duration in seconds. -The command times out when this timeout period is reached. If omitted, a default timeout value is calculated based on the number of samples used. +| The timeout period for the UPDATE STATISTICS command. +It is a number representing a duration in seconds. -If set for a keyspace, this would be set as the timeout for every UPDATE STATISTICS statement executed by AUS for that keyspace. +If the command does not complete within this duration, it times out. +If omitted, a default timeout value is calculated based on the number of samples used. -If set at a bucket-level, this is the update_statistics_timeout value considered for all scopes and collections within the bucket, unless overridden at a lower level. +If set for a keyspace, this timeout applies to every UPDATE STATISTICS statement that AUS executes for that keyspace. -If set at a scope-level, this is the update_statistics_timeout value considered for all collections within the scope unless overridden at a lower level. +If set at a bucket level, this timeout applies to all scopes and collections withing the bucket, unless a different value is set at a lower level. + +If set at a scope level, this timeout applies to all collections within the scope, unless a different value is set at a collection level. | Number |=== +[NOTE] +==== +* All SQL++ DMLs are allowed on this keyspace. +* To execute SELECT on `system:aus_settings`, you need the `query_system_catalog` role. +* To execute UPDATE, DELETE, INSERT, and UPSERT on `system:aus_settings`, you need the `query_manage_system_catalog` role. +==== + ==== Example ==== -A sample query to add a scope-level setting. These settings would apply to all collections within the scope unless overridden at a collection level. particular collection would look like this: +Query to add a scope level setting that applies to all collections within the scope. .Query [source,sqlpp] @@ -304,41 +325,55 @@ INSERT INTO system:aus_settings ( KEY, VALUE ) ---- ==== -== Scheduling AUS Tasks -When AUS is enabled and a schedule set, every Query node in the cluster participates in AUS. As a result, each Query node will have its own AUS task assigned, and this task will perform AUS operations on that specific node. +== Monitor AUS Tasks -The system:tasks_cache catalog keeps a record of recent tasks including the AUS tasks across all Query nodes. +The `system:tasks_cache` catalog stores information about all recent tasks executed in a cluster, including the AUS tasks. +For each AUS task, every involved query node maintains an entry within this catalog. +// You can view the recent AUS tasks, their status, and other details by querying this catalog. +AUS task entries can be specifically identified by the `class` field, which is set to `auto_update_statistics`. -== Viewing AUS Tasks -The system:tasks_cache catalog maintains the list of recent tasks. For every AUS task, each Query node will have an entry in the system:tasks_cache. View all recent AUS tasks by querying the system:tasks_cache keyspace. +=== View Recent AUS Tasks -The AUS task entries have the “class” field set to “auto_update_statistics”. +To view all recent AUS tasks, use the following query: -View all recent AUS tasks: [source,sqlpp] ---- SELECT * FROM system:tasks_cache WHERE class = "auto_update_statistics"; ---- - -To find the next scheduled AUS tasks: + +This query returns all AUS entries regardless of their state (scheduled, running, completed, etc.). +To get the details of completed tasks, see <>. + +=== Find Scheduled AUS Tasks + +To identify AUS tasks that are scheduled to run, you can filter the entries using the `state` attribute. + [source,sqlpp] ---- SELECT * FROM system:tasks_cache WHERE class = "auto_update_statistics" AND state = "scheduled"; ---- + +=== View AUS Tasks on a Particular Node -To view the recent AUS tasks on a particular node, filter by the ”node” attribute: +To view recent AUS tasks on a particular node, filter by the `node` attribute. + [source,sqlpp] ---- -SELECT * FROM system:tasks_cache WHERE class = "auto_update_statistics" AND state = "scheduled" and node = "127.0.0.1:8091"; +SELECT * FROM system:tasks_cache WHERE class = "auto_update_statistics" + AND state = "scheduled" + AND node = "127.0.0.1:8091"; // Replace with the actual node address ---- -The entries for completed AUS tasks will have information about the AUS task, which keyspaces had their statistics updated, if any errors occurred, etc. +[#view_completed_aus_tasks] +=== View Completed AUS Tasks + +The entries for completed AUS tasks have information specifically about tasks that have finished execution. +These entries include details such as the task ID, start time, end time, which keyspaces had their statistics updated, and whether any errors occurred during the task execution. -=== Example +==== Example ==== Sample task entry for a successful AUS task on a query node: -.Query [source,sqlpp] ---- { @@ -372,34 +407,45 @@ Sample task entry for a successful AUS task on a query node: ---- ==== -Attributes: --> Should this be added to the system:tasks_cache doc? -* keyspaces_updated: A list of keyspaces that had statistics that had to be updated. -* configuration: the task configuration with which this task was executed. +For more information about `system:tasks_cache` and its attributes, see xref:n1ql:n1ql-intro/sysinfo.adoc#sys-tasks-cache[Monitor Cached Tasks]. +In addition to the attributes listed there, the AUS task entries also include the following attributes: + +* `keyspaces_updated`: A list of keyspaces that had their statistics updated during the AUS task execution. +* `configuration`: The configuration with which the AUS task was executed. -For more information about system:tasks_cache and its attributes, see xref:n1ql:n1ql-intro/sysinfo.adoc#sys-tasks-cache[]. +== Cancel AUS Tasks -== Aborting a Running AUS Task -To cancel the execution of a running AUS task, execute an appropriate DELETE statement against the system:tasks_cache. +You can cancel a running AUS task by deleting its entry from the `system:tasks_cache` catalog. -To cancel all running AUS tasks: +To cancel all running AUS tasks, use the following DELETE statement: [source,sqlpp] ---- DELETE FROM system:tasks_cache WHERE class = "auto_update_statistics" AND state = "running"; ---- - -Use appropriate filters in the WHERE clause to be selective of which tasks to cancel or delete. To cancel the running AUS task only only on a particular node: + +CAUTION: Use appropriate filters in the WHERE clause to be selective of which tasks to cancel or delete. +Remember to add appropriate filters in the DELETE statement so that the statement affects the intended task entry. Otherwise inadvertently other tasks can get cancelled or the task history gets deleted. + + +To cancel a running AUS task on a specific node, include the node's address in the WHERE clause: [source,sqlpp] ---- -DELETE FROM system:tasks_cache WHERE class = "auto_update_statistics" AND state = "running" AND node = "127.0.0.1:8091"; +DELETE FROM system:tasks_cache WHERE class = "auto_update_statistics" + AND state = "running" + AND node = "127.0.0.1:8091"; // Replace with the actual node address ---- +When you delete an AUS task that is in a `scheduled` or `running` state, the task is cancelled. + + + and the next AUS task is automatically scheduled. When the DELETE statement affects an AUS task entry in the “scheduled” or “running” state, the scheduled/ running task is cancelled and the next AUS task is automatically scheduled. -Remember to add appropriate filters in the DELETE statement so that the statement affects the intended task entry. Otherwise inadvertently other tasks can get cancelled or the task history gets deleted. -== Managing AUS Load + +== Manage AUS Load When the AUS task is executing, it can cause an increased load on the Query node. This is because of all the activity of evaluating the statistics and executing statistics refresh. From 1039bd285752983259b42404454b7ee0578cb8fc Mon Sep 17 00:00:00 2001 From: Rakhi Prathap Date: Tue, 27 May 2025 10:11:09 +0530 Subject: [PATCH 08/19] Update content --- .../auto-update-statistics.adoc | 102 +++++++++++++++--- 1 file changed, 86 insertions(+), 16 deletions(-) diff --git a/modules/n1ql/pages/n1ql-language-reference/auto-update-statistics.adoc b/modules/n1ql/pages/n1ql-language-reference/auto-update-statistics.adoc index e24b1cb1e..53606adec 100644 --- a/modules/n1ql/pages/n1ql-language-reference/auto-update-statistics.adoc +++ b/modules/n1ql/pages/n1ql-language-reference/auto-update-statistics.adoc @@ -70,6 +70,34 @@ This ensures that no other query node processes the same collection during this IMPORTANT: When AUS is enabled, the very first scheduled task run might update all existing optimizer statistics, regardless of the expiration policy evaluation. This is because the index change information might not have been recorded prior to this initial run. +[plantuml#fig-aus-process,aus-process,svg] +.... +@startuml + +skinparam defaultTextAlignment center + +() " " as start +() " " as end + +rectangle "Query\nNode" { + rectangle "Scheduled\nTask" as ScheduledTask + rectangle "Evaluation\nPhase" as EvaluationPhase + rectangle "Update\nPhase" as UpdatePhase +} + +database "Statistics\nCatalog" as StatisticsCatalog + +start ..r..> ScheduledTask +ScheduledTask -> EvaluationPhase +EvaluationPhase -> UpdatePhase +EvaluationPhase --> StatisticsCatalog : "Check for\nstale statistics\n(based on expiration policy)" +UpdatePhase --> StatisticsCatalog : "Update stale\nstatistics" +UpdatePhase ..r..> end + +@enduml +.... + + __Add a flowchart here to illustrate the AUS process__ [#expiration_policy] @@ -424,35 +452,77 @@ To cancel all running AUS tasks, use the following DELETE statement: DELETE FROM system:tasks_cache WHERE class = "auto_update_statistics" AND state = "running"; ---- -CAUTION: Use appropriate filters in the WHERE clause to be selective of which tasks to cancel or delete. -Remember to add appropriate filters in the DELETE statement so that the statement affects the intended task entry. Otherwise inadvertently other tasks can get cancelled or the task history gets deleted. - - To cancel a running AUS task on a specific node, include the node's address in the WHERE clause: [source,sqlpp] ---- -DELETE FROM system:tasks_cache WHERE class = "auto_update_statistics" - AND state = "running" - AND node = "127.0.0.1:8091"; // Replace with the actual node address +DELETE FROM system:tasks_cache + WHERE class = "auto_update_statistics" + AND state = "running" + AND node = "127.0.0.1:8091"; // Replace with the actual node address ---- -When you delete an AUS task that is in a `scheduled` or `running` state, the task is cancelled. +When you delete an AUS task that is in the `scheduled` or `running` state, AUS cancels the task and schedules the next one automatically. +CAUTION: It is important to include appropriate WHERE clauses to specify exactly which tasks you want to cancel. +Make sure your filters target only the intended tasks, otherwise they might inadvertently cancel other tasks or delete task history. - and the next AUS task is automatically scheduled. -When the DELETE statement affects an AUS task entry in the “scheduled” or “running” state, the scheduled/ running task is cancelled and the next AUS task is automatically scheduled. +== Manage AUS Load +When an AUS task runs, it can increase the load on the query node due to the evaluation and refresh of statistics. +Therefore, it is important to schedule AUS to best suit the workloads of the cluster. +To prevent excessive load, the AUS task will not start if the query node's load is too high during the scheduled window. +In such cases, the task is skipped, and the next AUS task is scheduled. + +== Related Links + +* xref:n1ql:n1ql-language-reference/cost-based-optimizer.adoc[Cost Based Optimizer] +* xref:n1ql:n1ql-language-reference/updatestatistics.adoc[] +* xref:n1ql:n1ql-intro/sysinfo.adoc[System Catalogs and Information] -== Manage AUS Load -When the AUS task is executing, it can cause an increased load on the Query node. -This is because of all the activity of evaluating the statistics and executing statistics refresh. -That is why it is important that the schedule for AUS is set to best suit the workloads of the cluster. +//// -Additionally, an attempt is made to ensure that during the scheduled window, the AUS task is not started when the load of the Query node is too high. -If the load is too high, the task is not started and the next AUS task is scheduled. +.Default filters, order, and pagination +[plantuml#fig-uncovered,uncovered-query,svg] +.... +@startuml +skinparam defaultTextAlignment center +() " " as start +() " " as end + +rectangle "Index\nService"{ + rectangle "Index Scan" as IndexScan +} + +rectangle "Query\nService"{ + rectangle Fetch + rectangle Filter + rectangle Order + rectangle Offset + rectangle Limit +} +note bottom of Filter + Filters, order, and pagination + performed after fetch +end note + +database "Data\nService" as Data + +start ..r..> IndexScan +IndexScan -> Fetch +Fetch -> Filter +Filter -> Order +Order -> Offset +Offset -> Limit +Fetch --> Data +Fetch <-- Data +Limit ..r..> end + +@enduml +.... +//// \ No newline at end of file From d682cc5b97ca894b443de9c9f8afaac739107491 Mon Sep 17 00:00:00 2001 From: Rakhi Prathap Date: Tue, 27 May 2025 12:52:52 +0530 Subject: [PATCH 09/19] Add PlantUML diagram --- modules/n1ql/pages/n1ql-intro/sysinfo.adoc | 4 +- .../auto-update-statistics.adoc | 72 +++++-------------- 2 files changed, 19 insertions(+), 57 deletions(-) diff --git a/modules/n1ql/pages/n1ql-intro/sysinfo.adoc b/modules/n1ql/pages/n1ql-intro/sysinfo.adoc index 6c076dfd5..ee51044f0 100644 --- a/modules/n1ql/pages/n1ql-intro/sysinfo.adoc +++ b/modules/n1ql/pages/n1ql-intro/sysinfo.adoc @@ -73,8 +73,8 @@ a| [%hardbreaks] <> <> <> -xref:n1ql:n1ql-language-reference/auto-update-statistics.adoc#systemaus[system:aus] -xref:n1ql:n1ql-language-reference/auto-update-statistics.adoc#systemaus_settings[system:aus_settings] +xref:n1ql:n1ql-language-reference/auto-update-statistics.adoc#system_aus[system:aus] +xref:n1ql:n1ql-language-reference/auto-update-statistics.adoc#system_aus_settings[system:aus_settings] |=== == Authentication and Client Privileges diff --git a/modules/n1ql/pages/n1ql-language-reference/auto-update-statistics.adoc b/modules/n1ql/pages/n1ql-language-reference/auto-update-statistics.adoc index 53606adec..6d6275643 100644 --- a/modules/n1ql/pages/n1ql-language-reference/auto-update-statistics.adoc +++ b/modules/n1ql/pages/n1ql-language-reference/auto-update-statistics.adoc @@ -70,6 +70,7 @@ This ensures that no other query node processes the same collection during this IMPORTANT: When AUS is enabled, the very first scheduled task run might update all existing optimizer statistics, regardless of the expiration policy evaluation. This is because the index change information might not have been recorded prior to this initial run. +.AUS process flow showing the evaluation and update phases [plantuml#fig-aus-process,aus-process,svg] .... @startuml @@ -79,27 +80,34 @@ skinparam defaultTextAlignment center () " " as start () " " as end +rectangle "AUS\nEnabled?" as AusEnabled + rectangle "Query\nNode" { - rectangle "Scheduled\nTask" as ScheduledTask + rectangle "Scheduled\nAUS Task" as ScheduledTask + rectangle "Collection\nSelection" as CollectionSelection rectangle "Evaluation\nPhase" as EvaluationPhase rectangle "Update\nPhase" as UpdatePhase } -database "Statistics\nCatalog" as StatisticsCatalog +database "Optimizer\nStatistics" as OptimizerStatistics + +note bottom of CollectionSelection + Selects collections for AUS +end note + -start ..r..> ScheduledTask -ScheduledTask -> EvaluationPhase +start ..r..> AusEnabled : "Start \nAUS Process" +AusEnabled -> ScheduledTask : "Yes" +ScheduledTask -> CollectionSelection +CollectionSelection -> EvaluationPhase EvaluationPhase -> UpdatePhase -EvaluationPhase --> StatisticsCatalog : "Check for\nstale statistics\n(based on expiration policy)" -UpdatePhase --> StatisticsCatalog : "Update stale\nstatistics" +EvaluationPhase --> OptimizerStatistics : "Check for\nstale statistics\nbased on expiration policy" +UpdatePhase --> OptimizerStatistics : "Update stale\nstatistics" UpdatePhase ..r..> end @enduml .... - -__Add a flowchart here to illustrate the AUS process__ - [#expiration_policy] === Expiration Policy @@ -480,49 +488,3 @@ In such cases, the task is skipped, and the next AUS task is scheduled. * xref:n1ql:n1ql-language-reference/cost-based-optimizer.adoc[Cost Based Optimizer] * xref:n1ql:n1ql-language-reference/updatestatistics.adoc[] * xref:n1ql:n1ql-intro/sysinfo.adoc[System Catalogs and Information] - - -//// - -.Default filters, order, and pagination -[plantuml#fig-uncovered,uncovered-query,svg] -.... -@startuml - -skinparam defaultTextAlignment center - -() " " as start -() " " as end - -rectangle "Index\nService"{ - rectangle "Index Scan" as IndexScan -} - -rectangle "Query\nService"{ - rectangle Fetch - rectangle Filter - rectangle Order - rectangle Offset - rectangle Limit -} - -note bottom of Filter - Filters, order, and pagination - performed after fetch -end note - -database "Data\nService" as Data - -start ..r..> IndexScan -IndexScan -> Fetch -Fetch -> Filter -Filter -> Order -Order -> Offset -Offset -> Limit -Fetch --> Data -Fetch <-- Data -Limit ..r..> end - -@enduml -.... -//// \ No newline at end of file From 805c2d55117e682fc154b879feca95462a2f21cb Mon Sep 17 00:00:00 2001 From: Rakhi Prathap Date: Thu, 29 May 2025 11:30:35 +0530 Subject: [PATCH 10/19] Update CBO pages --- modules/guides/pages/cbo.adoc | 3 +++ .../n1ql-language-reference/auto-update-statistics.adoc | 4 ++-- .../pages/n1ql-language-reference/cost-based-optimizer.adoc | 5 +++++ 3 files changed, 10 insertions(+), 2 deletions(-) diff --git a/modules/guides/pages/cbo.adoc b/modules/guides/pages/cbo.adoc index eb1c7633d..dcf016fe5 100644 --- a/modules/guides/pages/cbo.adoc +++ b/modules/guides/pages/cbo.adoc @@ -92,6 +92,8 @@ For more information and examples, refer to xref:n1ql:n1ql-manage/query-settings Before you can use the Cost-Based Optimizer with a query, you must first gather the statistics that it needs. The Query service automatically gathers statistics whenever an index is created or built, and you can update statistics at any time. +You can also configure a scheduled task to automatically check and update statistics using xref:n1ql:n1ql-language-reference/auto-update-statistics.adoc[Auto Updates Statistics (AUS)]. +During the scheduled window, AUS evaluates the existing statistics and updates them if they are outdated. include::n1ql:page$n1ql-language-reference/statistics-expressions.adoc[tags=overview] @@ -220,6 +222,7 @@ Explanation: Reference: * xref:n1ql:n1ql-language-reference/updatestatistics.adoc[UPDATE STATISTICS] +* xref:n1ql:n1ql-language-reference/auto-update-statistics.adoc[Auto Updates Statistics] Administrator guides: diff --git a/modules/n1ql/pages/n1ql-language-reference/auto-update-statistics.adoc b/modules/n1ql/pages/n1ql-language-reference/auto-update-statistics.adoc index 6d6275643..6325d211a 100644 --- a/modules/n1ql/pages/n1ql-language-reference/auto-update-statistics.adoc +++ b/modules/n1ql/pages/n1ql-language-reference/auto-update-statistics.adoc @@ -101,8 +101,8 @@ AusEnabled -> ScheduledTask : "Yes" ScheduledTask -> CollectionSelection CollectionSelection -> EvaluationPhase EvaluationPhase -> UpdatePhase -EvaluationPhase --> OptimizerStatistics : "Check for\nstale statistics\nbased on expiration policy" -UpdatePhase --> OptimizerStatistics : "Update stale\nstatistics" +EvaluationPhase --> OptimizerStatistics : "Checks for\nstale statistics\nbased on expiration policy" +UpdatePhase --> OptimizerStatistics : "Updates stale\nstatistics" UpdatePhase ..r..> end @enduml diff --git a/modules/n1ql/pages/n1ql-language-reference/cost-based-optimizer.adoc b/modules/n1ql/pages/n1ql-language-reference/cost-based-optimizer.adoc index 8fa756ad7..576aac694 100644 --- a/modules/n1ql/pages/n1ql-language-reference/cost-based-optimizer.adoc +++ b/modules/n1ql/pages/n1ql-language-reference/cost-based-optimizer.adoc @@ -29,6 +29,7 @@ :query-service-architecture: {query-service}#query-service-architecture :query-execution: {query-service}#query-execution :query-settings: xref:manage:manage-settings/general-settings.adoc#query-settings +:auto-update-statistics: xref:n1ql:n1ql-language-reference/auto-update-statistics.adoc [abstract] {description} @@ -156,6 +157,9 @@ Before you can use the cost-based optimizer with a query, you must first gather In Couchbase Server 7.6 and later, the Query service automatically gathers statistics whenever an index is created or built. You can use the {updatestatistics}[UPDATE STATISTICS] statement to gather statistics at any time. +To keep optimizer statistics up to date, an opt-in feature called {auto-update-statistics}[Auto Update Statistics (AUS)] is available starting with Couchbase Server 8.0. +When enabled, AUS automatically identifies and refreshes outdated statistics, ensuring that the cost-based optimizer always has the latest information for generating query plans. + If the cost-based optimizer cannot properly calculate cost information for any step of a query plan, e.g. due to lack of the necessary optimizer statistics, the Query service falls back on the {query-service-architecture}[rules-based {sqlpp} optimizer] to generate a query plan. The cost-based optimizer uses the following statistics. @@ -393,4 +397,5 @@ Refer to the documentation for the {updatestatistics}[UPDATE STATISTICS] stateme * {updatestatistics}[UPDATE STATISTICS] statement * {optimizer-hints}[] overview +* {auto-update-statistics}[Auto Update Statistics] * Blog post: https://blog.couchbase.com/?p=7384&preview=true[Cost Based Optimizer for Couchbase N1QL^] \ No newline at end of file From 80e0258f4012e236e4624807eda1313443c3bde4 Mon Sep 17 00:00:00 2001 From: Rakhi Prathap Date: Thu, 29 May 2025 11:38:20 +0530 Subject: [PATCH 11/19] Update HEAD.yml for preview --- preview/HEAD.yml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/preview/HEAD.yml b/preview/HEAD.yml index 510379449..770ab0187 100644 --- a/preview/HEAD.yml +++ b/preview/HEAD.yml @@ -1,5 +1,5 @@ sources: docs-server: - branches: [release/7.6] + branches: [release/8.0] override: startPage: server:introduction:intro.adoc From dd9013bf39de7a68d62a8594850e97b866cce753 Mon Sep 17 00:00:00 2001 From: Rakhi Prathap Date: Fri, 30 May 2025 15:17:41 +0530 Subject: [PATCH 12/19] Add a few more changes --- modules/guides/pages/cbo.adoc | 2 + modules/n1ql/pages/n1ql-intro/sysinfo.adoc | 14 ++- .../auto-update-statistics.adoc | 115 ++++++++---------- 3 files changed, 68 insertions(+), 63 deletions(-) diff --git a/modules/guides/pages/cbo.adoc b/modules/guides/pages/cbo.adoc index dcf016fe5..b4a97bee9 100644 --- a/modules/guides/pages/cbo.adoc +++ b/modules/guides/pages/cbo.adoc @@ -92,8 +92,10 @@ For more information and examples, refer to xref:n1ql:n1ql-manage/query-settings Before you can use the Cost-Based Optimizer with a query, you must first gather the statistics that it needs. The Query service automatically gathers statistics whenever an index is created or built, and you can update statistics at any time. + You can also configure a scheduled task to automatically check and update statistics using xref:n1ql:n1ql-language-reference/auto-update-statistics.adoc[Auto Updates Statistics (AUS)]. During the scheduled window, AUS evaluates the existing statistics and updates them if they are outdated. +For more information on how to enable this feature and set the schedule, see xref:n1ql:n1ql-language-reference/auto-update-statistics.adoc#enable-and-schedule-aus[Enable and Schedule AUS]. include::n1ql:page$n1ql-language-reference/statistics-expressions.adoc[tags=overview] diff --git a/modules/n1ql/pages/n1ql-intro/sysinfo.adoc b/modules/n1ql/pages/n1ql-intro/sysinfo.adoc index ee51044f0..2b0678b80 100644 --- a/modules/n1ql/pages/n1ql-intro/sysinfo.adoc +++ b/modules/n1ql/pages/n1ql-intro/sysinfo.adoc @@ -1358,6 +1358,8 @@ This catalog contains the following attributes: __required__ |The class of the task. +For tasks related to xref:n1ql:n1ql-language-reference/auto-update-statistics.adoc[Auto Update Statistics (AUS)], the class is `auto_update_statistics`. + *Example*: ``advisor`` |string @@ -1385,7 +1387,17 @@ __required__ __required__ |The state of the task. -*Values*: `scheduled`, `cancelled`, `completed` +Possible values are: +-- +* `scheduled`: The task is scheduled and yet to run. +* `deleting`: The scheduled task is in the process of being cancelled. +* `cancelled`: The task was cancelled before it began executing. +* `running`: The task is currently executing. +* `aborting`: The running task is in the process of being aborted. +* `aborted`: The task was aborted while it was running. +* `completed`: The task completed successfully without being cancelled or aborted. +-- + |string |**subClass** + diff --git a/modules/n1ql/pages/n1ql-language-reference/auto-update-statistics.adoc b/modules/n1ql/pages/n1ql-language-reference/auto-update-statistics.adoc index 6325d211a..76a1f0a18 100644 --- a/modules/n1ql/pages/n1ql-language-reference/auto-update-statistics.adoc +++ b/modules/n1ql/pages/n1ql-language-reference/auto-update-statistics.adoc @@ -1,19 +1,19 @@ = Auto Update Statistics -:description: Auto Update Statistics (AUS) is a solution designed to keep the optimizer statistics up to date. +:description: Auto Update Statistics (AUS) automatically refreshes optimizer statistics, ensuring accurate and cost-effective query plans. [abstract] {description} == Overview -Auto Update Statistics (AUS) is a feature that helps keep the optimizer statistics up to date by automatically identifying and refreshing outdated statistics. +Auto Update Statistics (AUS) is a feature that keeps the optimizer statistics up to date by automatically identifying and refreshing outdated statistics. -Optimizer statistics are used by the xref:n1ql:n1ql-language-reference/cost-based-optimizer.adoc[Cost Based Optimizer] to generate optimal query plans. -These statistics are created when you run the xref:n1ql:n1ql-language-reference/updatestatistics.adoc[] statement or build an index (available from 7.6.0 onwards). +Optimizer statistics are crucial as they help the xref:n1ql:n1ql-language-reference/cost-based-optimizer.adoc[Cost Based Optimizer] generate optimal query plans. +These statistics are initially created when you run the xref:n1ql:n1ql-language-reference/updatestatistics.adoc[] statement or build an index (available from 7.6.0 onwards). However, as data changes over time, the statistics can become stale, leading to sub-optimal query plans and reduced query performance. -AUS addresses this challenge by running a scheduled task on the query nodes that regularly checks for outdated statistics using expiration policies and refreshes them by executing xref:n1ql:n1ql-language-reference/updatestatistics.adoc[]. -// This way, AUS ensures that the optimizer statistics are always up to date and the CBO has the current information. +To handle this, AUS executes a scheduled task on each query node in the cluster. +This task evaluates statistics based on expiration policies to identify outdated ones and then refreshes them by running the xref:n1ql:n1ql-language-reference/updatestatistics.adoc[] statement. [NOTE] ==== @@ -34,41 +34,32 @@ In such mixed clusters, the 7.6.x query nodes will not perform any AUS tasks. == How AUS Works -AUS is an opt-in feature. -You must explicitly enable it and configure a schedule for when it should run. -This allows you to align the AUS task with your cluster's workloads and run it at a time that minimizes the impact on performance. +AUS is an opt-in feature that you must explicitly enable and schedule. -An AUS task consists of two phases: +Once AUS is enabled and a schedule is set, every query node in the cluster participates in AUS. +Each query node gets its own AUS task, which consists of two phases: . **Evaluation Phase**: -In this phase, AUS evaluates whether the exisiting statistics are stale based on the <>. -For every index, AUS calculates the percentage of change to measure how much the data in the index has changed since the last update of the optimizer statistics for the index key expressions. -If this percentage exceeds the configured `change_percentage` threshold, the statistics for the index's key expressions are marked as stale and require an update. +In this phase, AUS evaluates whether exisiting statistics are stale based on the <>. +For each index, AUS assess how much data has changed since the last update of the optimizer statistics for the index's key expressions. +If the percentage of change exceeds the defined threshold in the <>, the statistics are marked as stale. . **Update Phase**: -After the evaluation, this phase executes the necessary xref:n1ql:n1ql-language-reference/updatestatistics.adoc[] statements to refresh the statistics identified as stale. -The AUS system ensures that the refreshed statistics maintains the same level of details as the original statistics. - -//Once the Evaluation of a collection is completed, the Update phase will be executed. - -//This phase will update the statistics flagged as stale in the previous phase by executing the appropriate UPDATE STATISTICS statements. +After the evaluation, AUS executes xref:n1ql:n1ql-language-reference/updatestatistics.adoc[] statements to refresh the statistics identified as stale. +It ensures that the refreshed statistics provide the same level of detail as the original statistics. -//The statistics update performed by AUS maintains the original resolution at which the statistic was collected. +IMPORTANT: When AUS is first enabled, the initial task run might update all existing optimizer statistics, regardless of the expiration policy evaluation. +This is because the index change information might not have been recorded prior to this first run. -Every query node in the cluster participates in the AUS task following the same configured schedule. During the scheduled AUS window, each query node performs the following: -* The query node selects the specific collections to perform AUS operations on. -This ensures that no other query node processes the same collection during this window. +* It selects the specific collections for AUS processing, ensuring that no other query node updates the same collection during this window. -* For each selected collection, the Evaluation and Update phases are executed on the statistics gathered on expressions based on fields within that collection. +* For each selected collection, the evaluation and update phases are executed on the statistics gathered from expressions based on fields within that collection. -* Once the AUS operation is complete for all statistics in all buckets, the query node schedules the next AUS run. +* Once AUS completes processing all statistics in all buckets, the query node schedules the next AUS run. -* If the scheduled window ends before the AUS task is complete, the task is aborted and the next AUS run is scheduled. - -IMPORTANT: When AUS is enabled, the very first scheduled task run might update all existing optimizer statistics, regardless of the expiration policy evaluation. -This is because the index change information might not have been recorded prior to this initial run. +* If the scheduled window ends before the AUS task finishes, the task is aborted and the next AUS run is scheduled. .AUS process flow showing the evaluation and update phases [plantuml#fig-aus-process,aus-process,svg] @@ -91,11 +82,6 @@ rectangle "Query\nNode" { database "Optimizer\nStatistics" as OptimizerStatistics -note bottom of CollectionSelection - Selects collections for AUS -end note - - start ..r..> AusEnabled : "Start \nAUS Process" AusEnabled -> ScheduledTask : "Yes" ScheduledTask -> CollectionSelection @@ -113,24 +99,24 @@ UpdatePhase ..r..> end AUS uses expiration policies to determine when statistics are outdated and require an update. The policy is based on the percentage of changes to data within an index. -It defines the threshold for how much data in an index must change before the statistics are considered outdated. +You can configure this value using the `change_percentage` attribute in the <> or <> catalogs. +It defines how much data in an index must change before the statistics are considered outdated. -If the percentage of changed data (since the last time the statistic was collected) exceeds the defined threshold, AUS flags the statistic as stale. -The subsequent AUS operation then updates this statistic. +If the percentage of changed data since the last statistics collection exceeds the defined threshold, AUS flags the statistics as stale. +The subsequent AUS operation then updates these statistics. == Enable and Schedule AUS -To start using AUS for your cluster, you will need to enable it and configure a schedule. +To start using AUS for your cluster, you need to enable it and configure a schedule. +You can configure AUS to run during off-peak hours or at specific times that align with your workload patterns. -AUS stores its global configurations in the <> catalog. +AUS maintains its global configurations in the <> catalog. You can enable AUS and set its schedule by modifying the relevant configurations within this catalog. -If you need more granluar control, you can use the <> catalog to customize certain AUS configurations at the bucket, scope, and collection levels. +If you need more granluar control, use the <> catalog to customize certain AUS configurations at the bucket, scope, and collection levels. -Once AUS is enabled and a schedule is set, every query node in the cluster participates in AUS. -Each query node gets its own AUS task, which performs AUS operations on that specific node. -You can find a record of recent AUS tasks across all query nodes in the `system:tasks_cache` catalog. -For more information, see <>. +For a historical record of recent AUS tasks across all query nodes, use the xref:n1ql:n1ql-intro/sysinfo.adoc#sys-tasks-cache[system:tasks_cache] catalog. +For more information, see <>. [#system_aus] === system:aus @@ -148,7 +134,7 @@ You can update this document to modify the settings. Each attribute in the document represents a particular global configuration. The following are the attribute names and the configurations they represent: -[cols="1a,3a,1a"] +[cols="1a,4a,1a"] |=== | Name | Description | Schema @@ -156,7 +142,7 @@ The following are the attribute names and the configurations they represent: __required__ | Indicates whether AUS is enabled for the cluster or not. -Set this attribute to `true` to enable AUS. +Set this attribute to `true` to enable AUS. + If set to `true`, then the `schedule` attribute must be set. *Default:* `false` @@ -180,7 +166,7 @@ This is the threshold for determining whether the statistics are stale or not. The value must be an integer between `0` and `100`. -For example, a value of `30` means that if 30% or more of the items in an index have changed, the statistics for that index will be refreshed. +For example, a value of `30` means that if 30% or more of the items in an index have changed, the statistics for that index are considered stale and will be refreshed. *Default:* `10` @@ -228,7 +214,8 @@ The `end_time` must be at least 30 minutes later than the `start_time`. __required__ | An array of strings specifying the days on which the AUS schedule runs. -Valid values include: Monday, Tuesday, Wednesday, Thursday, Friday, Saturday, Sunday. + +Valid values include: `Monday`, `Tuesday`, `Wednesday`, `Thursday`, `Friday`, `Saturday`, `Sunday`. *Example:* `[“Saturday”, “Sunday”]` @@ -248,16 +235,16 @@ The value must be a valid IANA timezone string. |=== -When changing the global configurations, consider the following: +When changing the global configurations, it is important to consider the following: * *Enabling AUS*: If AUS was previously disabled and is now enabled, the next AUS task will be scheduled immediately. -* *Rescheduling AUS*: Currently scheduled AUS task will be canceled, and a new AUS task will be scheduled according to the updated schedule. -Running AUS tasks will not be canceled. +* *Rescheduling AUS*: Currently scheduled AUS task will be cancelled, and a new AUS task will be scheduled according to the updated schedule. +Running AUS tasks will not be cancelled. * *Other Settings*: If other global settings such as `all_buckets` or `change_percentage` are modified, the new values will be applied during the next scheduled AUS run. ==== Example ==== -Using the UPDATE statement to enable AUS and set a schedule with some customizations: +A sample UPDATE statement to enable AUS and set a schedule with some customizations: .Query [source,sqlpp] @@ -278,7 +265,7 @@ The `system:aus_settings` catalog stores granular configuration settings for AUS These settings can be applied at the bucket, scope, and collection levels. By default, this catalog has no documents, and the AUS settings for all keyspaces inherit the configurations defined at the global level. -In other words, unless you explicitly configure AUS for a specific keyspace, it will use the global AUS settings. +In other words, unless you explicitly configure AUS for a specific keyspace, it will use the global AUS settings defined in <>. To customize AUS for a specific keyspace, you must insert a settings document into the `system:aus_settings` catalog. The document ID of a document in this keyspace must be the full path of the bucket, scope, and collection. @@ -293,7 +280,7 @@ The following are the attribute names and the configurations they represent: | **enable** + __optional__ | Indicates whether AUS is enabled for the bucket, scope, collection. + -Set it to `false` to explicitly disable AUS. +Set it to `true` to enable AUS for the keyspace. AUS settings are hierarchical and follow the order: cluster > bucket > scope > collection. + If AUS is disabled at higher level, it cannot be enabled at a more granular level. @@ -326,15 +313,15 @@ If set at a scope level, this value applies to all collections within the scope, | **update_statistics_timeout** + __optional__ -| The timeout period for the UPDATE STATISTICS command. +| The timeout period for the xref:n1ql:n1ql-language-reference/updatestatistics.adoc[] command. It is a number representing a duration in seconds. If the command does not complete within this duration, it times out. If omitted, a default timeout value is calculated based on the number of samples used. -If set for a keyspace, this timeout applies to every UPDATE STATISTICS statement that AUS executes for that keyspace. +If set for a keyspace, this timeout applies to every xref:n1ql:n1ql-language-reference/updatestatistics.adoc[] statement that AUS executes for that keyspace. -If set at a bucket level, this timeout applies to all scopes and collections withing the bucket, unless a different value is set at a lower level. +If set at a bucket level, this timeout applies to all scopes and collections within the bucket, unless a different value is set at a lower level. If set at a scope level, this timeout applies to all collections within the scope, unless a different value is set at a collection level. @@ -351,7 +338,7 @@ If set at a scope level, this timeout applies to all collections within the scop ==== Example ==== -Query to add a scope level setting that applies to all collections within the scope. +A sample query to add a scope level setting that applies to all collections within the scope. .Query [source,sqlpp] @@ -361,13 +348,14 @@ INSERT INTO system:aus_settings ( KEY, VALUE ) ---- ==== +[#monitor_aus_tasks] == Monitor AUS Tasks The `system:tasks_cache` catalog stores information about all recent tasks executed in a cluster, including the AUS tasks. For each AUS task, every involved query node maintains an entry within this catalog. -// You can view the recent AUS tasks, their status, and other details by querying this catalog. AUS task entries can be specifically identified by the `class` field, which is set to `auto_update_statistics`. +[#view_aus_tasks] === View Recent AUS Tasks To view all recent AUS tasks, use the following query: @@ -408,7 +396,7 @@ These entries include details such as the task ID, start time, end time, which k ==== Example ==== -Sample task entry for a successful AUS task on a query node: +A sample task entry for a successful AUS task on a query node: [source,sqlpp] ---- @@ -444,11 +432,14 @@ Sample task entry for a successful AUS task on a query node: ==== For more information about `system:tasks_cache` and its attributes, see xref:n1ql:n1ql-intro/sysinfo.adoc#sys-tasks-cache[Monitor Cached Tasks]. + In addition to the attributes listed there, the AUS task entries also include the following attributes: * `keyspaces_updated`: A list of keyspaces that had their statistics updated during the AUS task execution. * `configuration`: The configuration with which the AUS task was executed. +NOTE: You can also retrieve the AUS task history from the `query.log`. + == Cancel AUS Tasks You can cancel a running AUS task by deleting its entry from the `system:tasks_cache` catalog. @@ -477,8 +468,8 @@ Make sure your filters target only the intended tasks, otherwise they might inad == Manage AUS Load -When an AUS task runs, it can increase the load on the query node due to the evaluation and refresh of statistics. -Therefore, it is important to schedule AUS to best suit the workloads of the cluster. +When an AUS task runs, it can increase the load on the query node due to the evaluation and updation of statistics. +Therefore to minimize performance impact, it is important to schedule AUS to best suit the workloads of your cluster. To prevent excessive load, the AUS task will not start if the query node's load is too high during the scheduled window. In such cases, the task is skipped, and the next AUS task is scheduled. From 90de111691aabec22d849fd3f8d72b4b8f3e778b Mon Sep 17 00:00:00 2001 From: Rakhi Prathap Date: Fri, 30 May 2025 15:20:19 +0530 Subject: [PATCH 13/19] Add a few more changes --- .../pages/n1ql-language-reference/auto-update-statistics.adoc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/modules/n1ql/pages/n1ql-language-reference/auto-update-statistics.adoc b/modules/n1ql/pages/n1ql-language-reference/auto-update-statistics.adoc index 76a1f0a18..1447f63fc 100644 --- a/modules/n1ql/pages/n1ql-language-reference/auto-update-statistics.adoc +++ b/modules/n1ql/pages/n1ql-language-reference/auto-update-statistics.adoc @@ -478,4 +478,4 @@ In such cases, the task is skipped, and the next AUS task is scheduled. * xref:n1ql:n1ql-language-reference/cost-based-optimizer.adoc[Cost Based Optimizer] * xref:n1ql:n1ql-language-reference/updatestatistics.adoc[] -* xref:n1ql:n1ql-intro/sysinfo.adoc[System Catalogs and Information] +* xref:n1ql:n1ql-intro/sysinfo.adoc[System Catalogs] From f9c1202e303951ff9fcf0cfe1210a699b65aa81f Mon Sep 17 00:00:00 2001 From: Rakhi Prathap Date: Thu, 12 Jun 2025 10:38:55 +0530 Subject: [PATCH 14/19] Add a new section on canceling next scheduled AUS tasks --- .../auto-update-statistics.adoc | 54 +++++++++++++++++-- 1 file changed, 50 insertions(+), 4 deletions(-) diff --git a/modules/n1ql/pages/n1ql-language-reference/auto-update-statistics.adoc b/modules/n1ql/pages/n1ql-language-reference/auto-update-statistics.adoc index 1447f63fc..ac5c7015e 100644 --- a/modules/n1ql/pages/n1ql-language-reference/auto-update-statistics.adoc +++ b/modules/n1ql/pages/n1ql-language-reference/auto-update-statistics.adoc @@ -442,7 +442,19 @@ NOTE: You can also retrieve the AUS task history from the `query.log`. == Cancel AUS Tasks -You can cancel a running AUS task by deleting its entry from the `system:tasks_cache` catalog. +You can cancel AUS tasks that are currently running or scheduled to run. + +* <> +* <> + +CAUTION: When cancelling AUS tasks, it is important to include appropriate WHERE clauses to specify exactly which tasks you want to cancel. +Make sure your filters target only the intended tasks, otherwise they might inadvertently cancel other tasks or delete task history. + +[#cancel_running_aus_tasks] +=== Cancel Running AUS Tasks + +To cancel a running AUS task, delete its entry from the `system:tasks_cache` catalog. +When you delete a task that is in the `scheduled` or `running` state, AUS cancels the task and schedules the next one automatically. To cancel all running AUS tasks, use the following DELETE statement: @@ -461,10 +473,44 @@ DELETE FROM system:tasks_cache AND node = "127.0.0.1:8091"; // Replace with the actual node address ---- -When you delete an AUS task that is in the `scheduled` or `running` state, AUS cancels the task and schedules the next one automatically. +=== Cancel Next Scheduled AUS Tasks + +To cancel the next scheduled AUS task, temporarily modify the schedule in the `system:aus` catalog using an UPDATE statement. +This way you can skip a specific AUS run without changing the overall configuration. + +==== Temporarily Update the Schedule + +First, identify the scheduled AUS task you want to skip or cancel. +Then, update the schedule to temporarily exclude the day or time. + +For example, if your AUS tasks run on Monday, Wednesday, and Friday, and you want to cancel the upcoming Monday run: + +[source,sqlpp] +---- +UPDATE system:aus SET schedule.days = ["Wednesday", "Friday"]; +---- + +==== Revert the Schedule + +After the day and time that the cancelled task would have run has passed, you can revert the schedule to its original settings. +This allows your AUS tasks resume their regular schedule for all subsequent runs. + +For example, to restore the Monday, Wednesday, and Friday schedule after the skipped Monday run: + +[source,sqlpp] +---- +UPDATE system:aus SET schedule.days = ["Monday", "Wednesday", "Friday"]; +---- + + + + + + + + + -CAUTION: It is important to include appropriate WHERE clauses to specify exactly which tasks you want to cancel. -Make sure your filters target only the intended tasks, otherwise they might inadvertently cancel other tasks or delete task history. == Manage AUS Load From 88aeeca40c4358f057f47f413e99315543a3ddb7 Mon Sep 17 00:00:00 2001 From: Rakhi Prathap Date: Thu, 12 Jun 2025 13:31:19 +0530 Subject: [PATCH 15/19] Add a new global setting, create_missing_statistics --- .../auto-update-statistics.adoc | 41 +++++++++++++------ 1 file changed, 29 insertions(+), 12 deletions(-) diff --git a/modules/n1ql/pages/n1ql-language-reference/auto-update-statistics.adoc b/modules/n1ql/pages/n1ql-language-reference/auto-update-statistics.adoc index ac5c7015e..081779599 100644 --- a/modules/n1ql/pages/n1ql-language-reference/auto-update-statistics.adoc +++ b/modules/n1ql/pages/n1ql-language-reference/auto-update-statistics.adoc @@ -14,13 +14,11 @@ However, as data changes over time, the statistics can become stale, leading to To handle this, AUS executes a scheduled task on each query node in the cluster. This task evaluates statistics based on expiration policies to identify outdated ones and then refreshes them by running the xref:n1ql:n1ql-language-reference/updatestatistics.adoc[] statement. +It primarily maintains statistics for expressions based on index keys, and not for expressions on non-indexed fields. -[NOTE] -==== -* AUS is only available in Couchbase Server 8.0 and later. For more information, see <>. -* AUS only updates existing statistics; it does not create new ones. -* AUS primarily maintains statistics for expressions based on index keys, and not for expressions on non-indexed fields. -==== +Optionally, AUS can also create optimizer statistics for indexed expressions that do not already have them. + +NOTE: AUS is only available in Couchbase Server 8.0 and later. For more information, see <>. [#availability] == Availability @@ -42,11 +40,18 @@ Each query node gets its own AUS task, which consists of two phases: . **Evaluation Phase**: In this phase, AUS evaluates whether exisiting statistics are stale based on the <>. For each index, AUS assess how much data has changed since the last update of the optimizer statistics for the index's key expressions. -If the percentage of change exceeds the defined threshold in the <>, the statistics are marked as stale. +If the percentage of change exceeds the defined threshold in the <>, the statistics are marked as stale. + ++ +Additionally, if configured to do so, this phase also identifies any indexed expressions that currently lack statistics and flags them for creation. +You can control this setting using the `create_missing_statistics` attribute in the <> catalog. . **Update Phase**: After the evaluation, AUS executes xref:n1ql:n1ql-language-reference/updatestatistics.adoc[] statements to refresh the statistics identified as stale. -It ensures that the refreshed statistics provide the same level of detail as the original statistics. +When updating the exisitng statistics, AUS ensures that the refreshed statistics maintain the original xref:n1ql:n1ql-language-reference/cost-based-optimizer.adoc#resolution[resolution] at which they were collected. ++ +Also, if the `create_missing_statistics` option is set to `true`, AUS creates new optimizer statistics for indexed expressions that were flagged as missing during the evaluation phase. +The new statistics are created with the default xref:n1ql:n1ql-language-reference/cost-based-optimizer.adoc#resolution[resolution]. IMPORTANT: When AUS is first enabled, the initial task run might update all existing optimizer statistics, regardless of the expiration policy evaluation. This is because the index change information might not have been recorded prior to this first run. @@ -87,8 +92,8 @@ AusEnabled -> ScheduledTask : "Yes" ScheduledTask -> CollectionSelection CollectionSelection -> EvaluationPhase EvaluationPhase -> UpdatePhase -EvaluationPhase --> OptimizerStatistics : "Checks for\nstale statistics\nbased on expiration policy" -UpdatePhase --> OptimizerStatistics : "Updates stale\nstatistics" +EvaluationPhase --> OptimizerStatistics +UpdatePhase --> OptimizerStatistics UpdatePhase ..r..> end @enduml @@ -142,8 +147,8 @@ The following are the attribute names and the configurations they represent: __required__ | Indicates whether AUS is enabled for the cluster or not. -Set this attribute to `true` to enable AUS. + -If set to `true`, then the `schedule` attribute must be set. +To enable AUS, set this attribute to `true`. + +If set to `true`, then the `schedule` attribute must also be set. *Default:* `false` @@ -181,6 +186,18 @@ __required__ | Boolean +| **create_missing_statistics** + +__required__ + +| Indicates whether AUS should create statistics that are missing. + +If set to `true`, AUS creates statistics for indexed expressions that do not have any existing statistics. +The statistics will be created using the default value for the xref:n1ql:n1ql-language-reference/cost-based-optimizer.adoc#resolution[resolution] property. + +*Default:* `false` + +| Boolean + |=== From 4c8f951060e9dd817ca01925f00f0d936256d3fe Mon Sep 17 00:00:00 2001 From: Rakhi Prathap Date: Thu, 12 Jun 2025 15:11:50 +0530 Subject: [PATCH 16/19] Some formatting and rearrangment --- .../auto-update-statistics.adoc | 52 ++++++++++--------- 1 file changed, 27 insertions(+), 25 deletions(-) diff --git a/modules/n1ql/pages/n1ql-language-reference/auto-update-statistics.adoc b/modules/n1ql/pages/n1ql-language-reference/auto-update-statistics.adoc index 081779599..f006ba896 100644 --- a/modules/n1ql/pages/n1ql-language-reference/auto-update-statistics.adoc +++ b/modules/n1ql/pages/n1ql-language-reference/auto-update-statistics.adoc @@ -33,36 +33,18 @@ In such mixed clusters, the 7.6.x query nodes will not perform any AUS tasks. == How AUS Works AUS is an opt-in feature that you must explicitly enable and schedule. +Once it is enabled and a schedule is set, all query nodes in the cluster participate in AUS, according to the same schedule. -Once AUS is enabled and a schedule is set, every query node in the cluster participates in AUS. -Each query node gets its own AUS task, which consists of two phases: +=== AUS Task Execution -. **Evaluation Phase**: -In this phase, AUS evaluates whether exisiting statistics are stale based on the <>. -For each index, AUS assess how much data has changed since the last update of the optimizer statistics for the index's key expressions. -If the percentage of change exceeds the defined threshold in the <>, the statistics are marked as stale. +Each node receives its own AUS task, which performs the following actions during its scheduled window: -+ -Additionally, if configured to do so, this phase also identifies any indexed expressions that currently lack statistics and flags them for creation. -You can control this setting using the `create_missing_statistics` attribute in the <> catalog. +* The query node first selects specific collections for AUS processing, ensuring that no other query node updates the same collection during this period. -. **Update Phase**: -After the evaluation, AUS executes xref:n1ql:n1ql-language-reference/updatestatistics.adoc[] statements to refresh the statistics identified as stale. -When updating the exisitng statistics, AUS ensures that the refreshed statistics maintain the original xref:n1ql:n1ql-language-reference/cost-based-optimizer.adoc#resolution[resolution] at which they were collected. -+ -Also, if the `create_missing_statistics` option is set to `true`, AUS creates new optimizer statistics for indexed expressions that were flagged as missing during the evaluation phase. -The new statistics are created with the default xref:n1ql:n1ql-language-reference/cost-based-optimizer.adoc#resolution[resolution]. +* Each selected collection then goes through two phases: <> and <>. +These phases process statistics gathered from expressions based on fields within that collection. -IMPORTANT: When AUS is first enabled, the initial task run might update all existing optimizer statistics, regardless of the expiration policy evaluation. -This is because the index change information might not have been recorded prior to this first run. - -During the scheduled AUS window, each query node performs the following: - -* It selects the specific collections for AUS processing, ensuring that no other query node updates the same collection during this window. - -* For each selected collection, the evaluation and update phases are executed on the statistics gathered from expressions based on fields within that collection. - -* Once AUS completes processing all statistics in all buckets, the query node schedules the next AUS run. +* After AUS completes processing all statistics in all buckets, the query node schedules the next AUS run. * If the scheduled window ends before the AUS task finishes, the task is aborted and the next AUS run is scheduled. @@ -99,6 +81,26 @@ UpdatePhase ..r..> end @enduml .... +=== Evaluation Phase + +In this phase, AUS evaluates whether exisiting statistics are stale based on the <>. +For each index, AUS assess how much data has changed since the last update of the optimizer statistics for the index's key expressions. +If the percentage of change exceeds the defined threshold in the <>, the statistics are marked as stale. + +Additionally, if configured to do so, this phase also identifies any indexed expressions that currently lack statistics and flags them for creation. +You can control this setting using the `create_missing_statistics` attribute in the <> catalog. + +=== Update Phase + +After the evaluation, AUS executes xref:n1ql:n1ql-language-reference/updatestatistics.adoc[] statements to refresh the statistics identified as stale. +When updating the exisitng statistics, AUS ensures that the refreshed statistics maintain the original xref:n1ql:n1ql-language-reference/cost-based-optimizer.adoc#resolution[resolution] at which they were collected. + +Also, if the `create_missing_statistics` option is set to `true`, AUS creates new optimizer statistics for indexed expressions that were flagged as missing during the evaluation phase. +The new statistics are created with the default xref:n1ql:n1ql-language-reference/cost-based-optimizer.adoc#resolution[resolution]. + +IMPORTANT: When AUS is first enabled, the initial task run might update all existing optimizer statistics, regardless of the expiration policy evaluation. +This is because the index change information might not have been recorded prior to this first run. + [#expiration_policy] === Expiration Policy From 91257c2928cb57fd937b03a42f9fc89f7d8489ea Mon Sep 17 00:00:00 2001 From: Rakhi Prathap Date: Mon, 30 Jun 2025 17:11:31 +0530 Subject: [PATCH 17/19] engg feedback + language and typo fixes --- modules/guides/pages/cbo.adoc | 4 +- .../auto-update-statistics.adoc | 69 ++++++++----------- 2 files changed, 32 insertions(+), 41 deletions(-) diff --git a/modules/guides/pages/cbo.adoc b/modules/guides/pages/cbo.adoc index b4a97bee9..f2ea115fc 100644 --- a/modules/guides/pages/cbo.adoc +++ b/modules/guides/pages/cbo.adoc @@ -93,7 +93,7 @@ For more information and examples, refer to xref:n1ql:n1ql-manage/query-settings Before you can use the Cost-Based Optimizer with a query, you must first gather the statistics that it needs. The Query service automatically gathers statistics whenever an index is created or built, and you can update statistics at any time. -You can also configure a scheduled task to automatically check and update statistics using xref:n1ql:n1ql-language-reference/auto-update-statistics.adoc[Auto Updates Statistics (AUS)]. +You can also configure a scheduled task to automatically check and update statistics using xref:n1ql:n1ql-language-reference/auto-update-statistics.adoc[Auto Update Statistics (AUS)]. During the scheduled window, AUS evaluates the existing statistics and updates them if they are outdated. For more information on how to enable this feature and set the schedule, see xref:n1ql:n1ql-language-reference/auto-update-statistics.adoc#enable-and-schedule-aus[Enable and Schedule AUS]. @@ -224,7 +224,7 @@ Explanation: Reference: * xref:n1ql:n1ql-language-reference/updatestatistics.adoc[UPDATE STATISTICS] -* xref:n1ql:n1ql-language-reference/auto-update-statistics.adoc[Auto Updates Statistics] +* xref:n1ql:n1ql-language-reference/auto-update-statistics.adoc[Auto Update Statistics] Administrator guides: diff --git a/modules/n1ql/pages/n1ql-language-reference/auto-update-statistics.adoc b/modules/n1ql/pages/n1ql-language-reference/auto-update-statistics.adoc index f006ba896..74dbfd041 100644 --- a/modules/n1ql/pages/n1ql-language-reference/auto-update-statistics.adoc +++ b/modules/n1ql/pages/n1ql-language-reference/auto-update-statistics.adoc @@ -14,11 +14,11 @@ However, as data changes over time, the statistics can become stale, leading to To handle this, AUS executes a scheduled task on each query node in the cluster. This task evaluates statistics based on expiration policies to identify outdated ones and then refreshes them by running the xref:n1ql:n1ql-language-reference/updatestatistics.adoc[] statement. -It primarily maintains statistics for expressions based on index keys, and not for expressions on non-indexed fields. +AUS can also optionally generate statistics for indexed expressions that do not already have them. -Optionally, AUS can also create optimizer statistics for indexed expressions that do not already have them. - -NOTE: AUS is only available in Couchbase Server 8.0 and later. For more information, see <>. +NOTE: AUS maintains statistics only for expressions on index keys, and only for those indexed using the Plasma storage engine. +It does not support Memory-Optimized indexes. +For more information about these index storage types, see xref:indexes:storage-modes.adoc[]. [#availability] == Availability @@ -83,7 +83,7 @@ UpdatePhase ..r..> end === Evaluation Phase -In this phase, AUS evaluates whether exisiting statistics are stale based on the <>. +In this phase, AUS evaluates whether existing statistics are stale based on the <>. For each index, AUS assess how much data has changed since the last update of the optimizer statistics for the index's key expressions. If the percentage of change exceeds the defined threshold in the <>, the statistics are marked as stale. @@ -93,7 +93,7 @@ You can control this setting using the `create_missing_statistics` attribute in === Update Phase After the evaluation, AUS executes xref:n1ql:n1ql-language-reference/updatestatistics.adoc[] statements to refresh the statistics identified as stale. -When updating the exisitng statistics, AUS ensures that the refreshed statistics maintain the original xref:n1ql:n1ql-language-reference/cost-based-optimizer.adoc#resolution[resolution] at which they were collected. +When updating the existing statistics, AUS ensures that the refreshed statistics maintain the original xref:n1ql:n1ql-language-reference/cost-based-optimizer.adoc#resolution[resolution] at which they were collected. Also, if the `create_missing_statistics` option is set to `true`, AUS creates new optimizer statistics for indexed expressions that were flagged as missing during the evaluation phase. The new statistics are created with the default xref:n1ql:n1ql-language-reference/cost-based-optimizer.adoc#resolution[resolution]. @@ -120,7 +120,7 @@ You can configure AUS to run during off-peak hours or at specific times that ali AUS maintains its global configurations in the <> catalog. You can enable AUS and set its schedule by modifying the relevant configurations within this catalog. -If you need more granluar control, use the <> catalog to customize certain AUS configurations at the bucket, scope, and collection levels. +If you need more granular control, use the <> catalog to customize certain AUS configurations at the bucket, scope, and collection levels. For a historical record of recent AUS tasks across all query nodes, use the xref:n1ql:n1ql-intro/sysinfo.adoc#sys-tasks-cache[system:tasks_cache] catalog. For more information, see <>. @@ -166,9 +166,9 @@ This attribute is required only if `enable` is set to `true`. | <> object | **change_percentage** + -__optional__ +__required__ -| The percentage of change to items within an index that must be exceeded for the statistics to be refereshed. +| The percentage of change to items within an index that must be exceeded for the statistics to be refreshed. This is the threshold for determining whether the statistics are stale or not. The value must be an integer between `0` and `100`. @@ -177,7 +177,7 @@ For example, a value of `30` means that if 30% or more of the items in an index *Default:* `10` -| Interger +| Integer | **all_buckets** + __required__ @@ -212,21 +212,21 @@ The statistics will be created using the default value for the xref:n1ql:n1ql-la | **start_time** + __required__ -| The start time of the AUS schedule in “HH:MM” format. +| The start time of the AUS schedule in "HH:MM" format. The `start_time` must be at least 30 minutes earlier than the `end_time`. -*Example:* `“01:30”` +*Example:* `"01:30"` | String | **end_time** + __required__ -| The end time of the AUS schedule in “HH:MM” format. +| The end time of the AUS schedule in "HH:MM" format. The `end_time` must be at least 30 minutes later than the `start_time`. -*Example:* `“05:30”` +*Example:* `"05:30"` | String | **days** + @@ -236,7 +236,7 @@ __required__ Valid values include: `Monday`, `Tuesday`, `Wednesday`, `Thursday`, `Friday`, `Saturday`, `Sunday`. -*Example:* `[“Saturday”, “Sunday”]` +*Example:* `["Saturday", "Sunday"]` | String array @@ -246,9 +246,9 @@ __optional__ | The timezone that applies to the schedule's start and end times. The value must be a valid IANA timezone string. -*Default:* `“UTC”` +*Default:* `"UTC"` -*Example:* `US/Pacific”` +*Example:* `"US/Pacific"` | String @@ -317,7 +317,7 @@ The setting cannot be overridden at the scope or collection level. | **change_percentage** + __optional__ -| The percentage of change to items within an index that must be exceeded for the statistics to be refereshed. +| The percentage of change to items within an index that must be exceeded for the statistics to be refreshed. The value must be an integer between `0` and `100`. @@ -362,8 +362,8 @@ A sample query to add a scope level setting that applies to all collections with .Query [source,sqlpp] ---- -INSERT INTO system:aus_settings ( KEY, VALUE ) - VALUES ({ "default:bucket1.scope1", {"change_percentage": 20 } ) +INSERT INTO system:aus_settings ( KEY, VALUE ) + VALUES ( "default:bucket1.scope1", {"change_percentage": 20} ); ---- ==== @@ -492,15 +492,16 @@ DELETE FROM system:tasks_cache AND node = "127.0.0.1:8091"; // Replace with the actual node address ---- +[#cancel_next_scheduled_aus_tasks] === Cancel Next Scheduled AUS Tasks -To cancel the next scheduled AUS task, temporarily modify the schedule in the `system:aus` catalog using an UPDATE statement. -This way you can skip a specific AUS run without changing the overall configuration. +To cancel an upcoming scheduled AUS task, you need to temporarily modify its schedule in the `system:aus` catalog. +After the scheduled time has passed, you can revert it to its original schedule. ==== Temporarily Update the Schedule -First, identify the scheduled AUS task you want to skip or cancel. -Then, update the schedule to temporarily exclude the day or time. +First, identify the specific AUS task you want to skip or cancel. +Then, use an UPDATE statement to exclude the day or time from its schedule. For example, if your AUS tasks run on Monday, Wednesday, and Friday, and you want to cancel the upcoming Monday run: @@ -511,30 +512,20 @@ UPDATE system:aus SET schedule.days = ["Wednesday", "Friday"]; ==== Revert the Schedule -After the day and time that the cancelled task would have run has passed, you can revert the schedule to its original settings. -This allows your AUS tasks resume their regular schedule for all subsequent runs. +After the day and time for the cancelled task have passed, you can revert the schedule to its original settings. +This allows your AUS tasks to resume their regular schedule for all subsequent runs. -For example, to restore the Monday, Wednesday, and Friday schedule after the skipped Monday run: +For example, to restore the Monday, Wednesday, and Friday schedule after skipping the Monday run: [source,sqlpp] ---- UPDATE system:aus SET schedule.days = ["Monday", "Wednesday", "Friday"]; ---- - - - - - - - - - - == Manage AUS Load -When an AUS task runs, it can increase the load on the query node due to the evaluation and updation of statistics. -Therefore to minimize performance impact, it is important to schedule AUS to best suit the workloads of your cluster. +When an AUS task runs, it can increase the load on the query node as it evaluates and updates statistics. +Therefore, to minimize performance impact, it is important to schedule AUS to best suit the workloads of your cluster. To prevent excessive load, the AUS task will not start if the query node's load is too high during the scheduled window. In such cases, the task is skipped, and the next AUS task is scheduled. From 7a9b63caea3abeea398bfc0b691c1de48aedb356 Mon Sep 17 00:00:00 2001 From: rakhi-prathap Date: Fri, 18 Jul 2025 09:37:48 +0530 Subject: [PATCH 18/19] Apply suggestions from code review Co-authored-by: Simon Dew <39966290+simon-dew@users.noreply.github.com> --- .../n1ql-language-reference/auto-update-statistics.adoc | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/modules/n1ql/pages/n1ql-language-reference/auto-update-statistics.adoc b/modules/n1ql/pages/n1ql-language-reference/auto-update-statistics.adoc index 74dbfd041..f0fc2c529 100644 --- a/modules/n1ql/pages/n1ql-language-reference/auto-update-statistics.adoc +++ b/modules/n1ql/pages/n1ql-language-reference/auto-update-statistics.adoc @@ -1,4 +1,7 @@ = Auto Update Statistics +:page-status: Couchbase Server 8.0 +:page-edition: Enterprise Edition +:page-toclevels: 2 :description: Auto Update Statistics (AUS) automatically refreshes optimizer statistics, ensuring accurate and cost-effective query plans. [abstract] @@ -417,7 +420,7 @@ These entries include details such as the task ID, start time, end time, which k ==== A sample task entry for a successful AUS task on a query node: -[source,sqlpp] +[source,json] ---- { "tasks_cache": { From b50ac07979628f7a21e2d7519042da6618b3d7f8 Mon Sep 17 00:00:00 2001 From: Rakhi Prathap Date: Fri, 18 Jul 2025 09:56:53 +0530 Subject: [PATCH 19/19] Updated the availability section --- .../pages/n1ql-language-reference/auto-update-statistics.adoc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/modules/n1ql/pages/n1ql-language-reference/auto-update-statistics.adoc b/modules/n1ql/pages/n1ql-language-reference/auto-update-statistics.adoc index f0fc2c529..ee718b729 100644 --- a/modules/n1ql/pages/n1ql-language-reference/auto-update-statistics.adoc +++ b/modules/n1ql/pages/n1ql-language-reference/auto-update-statistics.adoc @@ -26,7 +26,7 @@ For more information about these index storage types, see xref:indexes:storage-m [#availability] == Availability -AUS is only available on query nodes running version 8.0 and later. +AUS is available only in the Couchbase Enterprise Edition and on query nodes running version 8.0 or later. * You can enable AUS in a cluster that has been fully migrated to 8.0, or in a cluster that includes both 7.6.x and 8.0 query nodes. In such mixed clusters, the 7.6.x query nodes will not perform any AUS tasks.