Skip to content

Commit 4aedaa4

Browse files
DOC-13695: Add partition by hash clause in CREATE PRIMARY INDEX (#467)
* Update syntax * Update syntax diagram * Add PARTITION BY HASH to CREATE PRIMARY INDEX * Index partitioning: apply Vale style rules * Add primary indexes and vector indexes to Index partitioning * Reformat index partitioning * Titles as actions * Increase ToC depth --------- Co-authored-by: rakhi-prathap <rakhi.prathap@couchbase.com>
1 parent 10f3107 commit 4aedaa4

File tree

4 files changed

+126
-91
lines changed

4 files changed

+126
-91
lines changed
962 Bytes
Loading

modules/n1ql/pages/n1ql-language-reference/createprimaryindex.adoc

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,7 @@ Primary indexes contain a full set of keys in a given keyspace.
1313
:logical-hierarchy: xref:n1ql-intro/queriesandresults.adoc#logical-hierarchy
1414
:querying-indexes: xref:n1ql-intro/sysinfo.adoc#querying-indexes
1515
:index-replication: xref:indexes:index-replication.adoc#index-replication
16+
:index-partitioning: xref:n1ql-language-reference/index-partitioning.adoc
1617
:query-settings: xref:n1ql:n1ql-manage/query-settings.adoc
1718

1819
// TEMP
@@ -71,6 +72,9 @@ When querying, if the index name contains a `&num;` or `&lowbar;` character, you
7172
keyspace-ref:: [Required] Specifies the keyspace where the index is created.
7273
See <<keyspace-ref>>.
7374

75+
index-partition:: (Optional) Specifies index partitions.
76+
See <<index-partition>>.
77+
7478
index-using:: (Optional) Specifies the index type.
7579
See <<index-using>>.
7680

@@ -160,6 +164,13 @@ collection::
160164
For example, `airline` indicates the `airline` collection, assuming the query context is set.
161165
====
162166

167+
[[index-partition]]
168+
=== PARTITION BY HASH Clause
169+
170+
Used to partition the index.
171+
Index partitioning helps increase the query performance by dividing and spreading a large index of documents across multiple nodes, horizontally scaling out an index as needed.
172+
For more information, see {index-partitioning}[Index Partitioning].
173+
163174
[[index-using]]
164175
=== USING Clause
165176

@@ -242,6 +253,9 @@ If the value of this property is not less than the number of index nodes in the
242253
|Integer
243254
|===
244255

256+
Partitioned indexes support further options.
257+
See {index-partitioning}[].
258+
245259
== Usage
246260

247261
// Nothing

modules/n1ql/pages/n1ql-language-reference/index-partitioning.adoc

Lines changed: 111 additions & 90 deletions
Original file line numberDiff line numberDiff line change
@@ -1,15 +1,20 @@
11
= Index Partitioning
22
:imagesdir: ../../assets/images
3-
:description: Index Partitioning enables you to increase aggregate query performance by dividing and spreading a large index of documents across multiple nodes, horizontally scaling out an index as needed.
3+
:description: Index partitioning enables you to increase aggregate query performance by dividing and spreading a large index of documents across multiple nodes, horizontally scaling out an index as needed.
44
:page-topic-type: reference
5+
:page-toclevels: 2
56

67
:createindex: xref:n1ql-language-reference/createindex.adoc
8+
:createprimaryindex: xref:n1ql-language-reference/createprimaryindex.adoc
9+
:createvectorindex: xref:n1ql-language-reference/createvectorindex.adoc
710
:gbap: xref:n1ql-language-reference/groupby-aggregate-performance.adoc
811
:index-with: {createindex}#index-with
12+
:primary-index-with: {createprimaryindex}#primary-index-with
13+
:vector-index-with: {createvectorindex}#vector-index-with
914
:rebalancing-the-index-service: xref:clusters:scale-database.adoc#rebalance
1015

1116
{description}
12-
The system partitions the index across a number of index nodes using a hash partitioning strategy in a way that is transparent to queries.
17+
The system partitions the index across a number of index nodes using a hash partitioning strategy in a way that's transparent to queries.
1318

1419
[#idx-partition-intro]
1520
--
@@ -23,18 +28,17 @@ Benefits of a partitioned index include:
2328
////
2429
Partitioned indexes are displayed in the Capella UI with a `partitioned` indicator:
2530
26-
image::manage:manage-indexes/index-indicators.png[]
31+
image::manage:manage-indexes/index-indicators.png["The Couchbase Web Console, with partitioned indexes marked."]
2732
////
2833

29-
For further details, refer to xref:clusters:index-service/manage-indexes.adoc[].
34+
For more information, see xref:clusters:index-service/manage-indexes.adoc[].
3035

3136
== Syntax
3237

33-
To create a partitioned index, the overall syntax is the same as for a global secondary index.
38+
To create a partitioned index, the overall syntax is the same as for a primary index or global secondary index.
3439
The distinguishing feature is the use of the PARTITION BY HASH clause to specify the partitions.
3540

36-
Refer to the {createindex}[CREATE INDEX] statement for details of the syntax.
37-
41+
For more information, see {createprimaryindex}[], {createindex}[], or {createvectorindex}[].
3842

3943
[[index-partition,index-partition]]
4044
=== PARTITION BY HASH Clause
@@ -44,11 +48,12 @@ Refer to the {createindex}[CREATE INDEX] statement for details of the syntax.
4448
include::partial$grammar/ddl.ebnf[tag=index-partition]
4549
----
4650

47-
image::n1ql-language-reference/index-partition.png["Syntax diagram: refer to source code listing", align=left]
51+
image::n1ql-language-reference/index-partition.png["Syntax diagram: see source code listing", align=left]
4852

49-
_partition-key-expr_::
53+
[horizontal]
54+
partition-key-expr::
5055
A field or an expression over a field representing a partition key.
51-
For details and examples, refer to <<partition-keys>>.
56+
For details and examples, see <<partition-keys>>.
5257

5358
[[index-with,index-with]]
5459
=== WITH Clause
@@ -58,53 +63,96 @@ For details and examples, refer to <<partition-keys>>.
5863
include::partial$grammar/ddl.ebnf[tag=index-with]
5964
----
6065

61-
image::n1ql-language-reference/index-with.png["Syntax diagram: refer to source code listing", align=left]
66+
image::n1ql-language-reference/index-with.png["Syntax diagram: see source code listing", align=left]
6267

6368
When creating a partitioned index, you can use the WITH clause to specify additional options for the partitions.
6469

65-
_expr_::
66-
An object with the following properties:
70+
[horizontal#index-with-args]
71+
expr::
72+
An object with the following properties.
73+
74+
[options="header", cols="1a,4a,1a"]
75+
|===
76+
| Name | Description | Schema
77+
78+
| **num_partition** +
79+
__optional__
80+
| The number of partitions to divide the index into.
81+
For more information, see <<Number of Partitions>>.
6782

68-
num_partition;;
69-
[Optional] An integer that defines the number of partitions to divide into.
70-
The default value is 8.
71-
For more details and examples, refer to <<Number of Partitions>>.
83+
**Default:** `8`
84+
| Integer
7285

73-
nodes;;
74-
[Optional] An array of strings, specifying a list of nodes.
75-
The node list to restrict the set of nodes available for placement.
76-
Refer to the {index-with}[CREATE INDEX] statement for details of the syntax.
77-
For more details and examples, refer to <<Partition Placement>>.
86+
| **nodes** +
87+
__optional__
88+
| A list of nodes to restrict the set of nodes available for placement.
89+
For more information, see <<Partition Placement>>.
7890

79-
defer_build;;
80-
[Optional] Boolean.
81-
When set to true, the index creation operation queues the task for building the index, but immediately pauses the building of the index.
82-
Refer to the {index-with}[CREATE INDEX] statement for more details.
91+
For details of the syntax, see {primary-index-with}[CREATE PRIMARY INDEX], {index-with}[CREATE INDEX], or {vector-index-with}[CREATE VECTOR INDEX].
92+
| String array
8393

84-
num_replica;;
85-
[Optional] An integer specifying the number of replicas of the partitioned index to create.
94+
| **defer_build** +
95+
__optional__
96+
| When set to true, the index creation operation queues the task for building the index, but immediately pauses the building of the index.
97+
98+
For more information, see {primary-index-with}[CREATE PRIMARY INDEX], {index-with}[CREATE INDEX], or {vector-index-with}[CREATE VECTOR INDEX].
99+
| Boolean
100+
101+
| **num_replica** +
102+
__optional__
103+
| The number of replicas of the partitioned index to create.
86104
If this integer is greater than or equal to the number of index nodes in the cluster, then the index creation will fail.
87-
Refer to the {index-with}[CREATE INDEX] statement for more details.
88105

89-
secKeySize;;
90-
[Optional] An integer, specifying the average length of the combined index keys.
91-
For more details and examples, refer to <<sizing-hints>>.
106+
For more information, see {primary-index-with}[CREATE PRIMARY INDEX], {index-with}[CREATE INDEX], or {vector-index-with}[CREATE VECTOR INDEX].
107+
| Integer
108+
109+
| **secKeySize** +
110+
__optional__
111+
| A sizing hint, specifying the average length of the combined index keys.
112+
For more information, see <<sizing-hints>>.
113+
114+
**Example:** `20`
115+
| Integer
116+
117+
| **docKeySize** +
118+
__optional__
119+
| A sizing hint, specifying the average length of the document key `meta().id`.
120+
For more information, see <<sizing-hints>>.
121+
122+
**Example:** `20`
123+
|Integer
124+
125+
| **arrSize** +
126+
__optional__
127+
| A sizing hint, specifying the average length of the array fields.
128+
Non-array fields will be ignored.
129+
For more information, see <<sizing-hints>>.
130+
131+
**Example:** `10`
132+
| Integer
133+
134+
| **numDoc** +
135+
__optional__
136+
| A sizing hint, specifying the number of documents in the index.
137+
For more information, see <<sizing-hints>>.
92138

93-
docKeySize;;
94-
[Optional] An integer, specifying the average length of the document key.
95-
For more details and examples, refer to <<sizing-hints>>.
139+
**Example:** `7303`
140+
| Integer
96141

97-
arrSize;;
98-
[Optional] An integer, specifying the average length of the array fields.
99-
For more details and examples, refer to <<sizing-hints>>.
142+
| **residentRatio** +
143+
__optional__
144+
| A sizing hint, specifying the resident ratio of the index.
145+
The resident ratio is the memory usage of the index, as a percentage of its estimated data size.
146+
For more information, see <<sizing-hints>>.
100147

101-
numDoc;;
102-
[Optional] An integer, specifying the number of documents in the index.
103-
For more details and examples, refer to <<sizing-hints>>.
148+
Couchbase recommends setting this property to `10` or higher, to avoid index build failures and other issues.
104149

105-
residentRatio;;
106-
[Optional] An integer, specifying the resident ratio of the index.
107-
For more details and examples, refer to <<sizing-hints>>.
150+
**Example:** `50`
151+
| Integer
152+
|===
153+
154+
Composite Vector indexes and Hyperscale Vector indexes support further options.
155+
See {index-with}[CREATE INDEX] or {vector-index-with}[CREATE VECTOR INDEX].
108156

109157
[[partition-keys]]
110158
== Partition Keys
@@ -113,13 +161,13 @@ Partition keys are made up of one or more terms, with each term being the docume
113161
The partition keys are hashed to generate a partition ID for each document.
114162
The partition ID is then used to identify the partition in which the document's index keys would reside.
115163

116-
The partition keys should be immutable, that is, its values shouldn't change once the document is created.
164+
The partition keys should be immutable: their values should not change once the document is created.
117165
For example, in the `landmark` keyspace, the field named `activity` almost never changes, and is therefore a good candidate for partition key.
118166
If the partition keys have changed, then the corresponding document should be deleted and recreated with the new partition keys.
119167

120168
Each term in the partition keys can be any JSON data type: number, string, boolean, array, object, or NULL.
121169
If a term in the partition keys is missing in the document, the term will have a {sqlpp} MISSING value.
122-
Partition keys do not support {sqlpp} array expressions, e.g. `ARRAY` \... `FOR` \... `IN`.
170+
Partition keys do not support {sqlpp} array expressions, such as `ARRAY` \... `FOR` \... `IN`.
123171

124172
The following table lists some examples of partition keys.
125173

@@ -198,7 +246,7 @@ CREATE INDEX idx ON route
198246
// * NULL value
199247

200248
[#doc-keys-as-partition-key]
201-
== Using Document Keys as Partition Key
249+
== Use Document Keys as Partition Key
202250

203251
The simplest way to create a partitioned index is to use the document key as the partition key.
204252

@@ -223,7 +271,7 @@ With [.cmd]`meta().id` as the partition key, the index keys are evenly distribut
223271
Every query will gather the qualifying index keys from all the partitions.
224272

225273
[#partition-keys-range-query]
226-
== Choosing Partition Keys for Range Query
274+
== Choose Partition Keys for Range Query
227275

228276
An application has the option to choose the partition key that can minimize latency on a range query for a partitioned index.
229277
For example, let's say a query has an equality predicate based on the field `sourceairport` and `destinationairport`.
@@ -298,7 +346,7 @@ ORDER BY airline
298346
====
299347

300348
As with equality predicate in the previous examples, the query engine can select qualifying partitions using an IN clause with matching partitioned keys.
301-
The following example scans at most three partitions with `sourceairport "SFO"`, `"SJC"`, or `"OAK"`.
349+
The following example scans at most 3 partitions with `sourceairport "SFO"`, `"SJC"`, or `"OAK"`.
302350

303351
.Create a partitioned index with partition keys matching query IN clause
304352
====
@@ -398,12 +446,12 @@ CREATE INDEX idx ON route
398446
During index rebalancing, the rebalancer takes into account the data skew among the partitions using runtime statistics.
399447
It tries to even out resource utilization across the index service nodes by moving the partitions across the nodes when possible.
400448

401-
== Choosing Partition Keys for Aggregate Query
449+
== Choose Partition Keys for Aggregate Query
402450

403451
As with a range query, when an index is partitioned by document key, an aggregate query can gather the qualifying index keys from all the partitions before performing aggregation in the query engine.
404-
Whenever aggregate pushdown optimization is allowed, the query engine will push down "partial aggregate" calculation to each partition.
452+
Whenever aggregate pushdown optimization is allowed, the query engine will push down partial aggregate calculation to each partition.
405453
The query engine then computes the final aggregate result from the partial aggregates across all the partitions.
406-
For more details on aggregate query optimization, refer to {gbap}[Group By and Aggregate Performance].
454+
For more information on aggregate query optimization, see {gbap}[Group By and Aggregate Performance].
407455

408456
[.server]
409457
include::ROOT:partial$query-context.adoc[tag=section]
@@ -425,7 +473,7 @@ GROUP BY sourceairport, destinationairport;
425473
----
426474
====
427475

428-
The choice of partition keys can also improve aggregate query performance when the query engine can push down the "full aggregate" calculation to the index node.
476+
The choice of partition keys can also improve aggregate query performance by enabling the query engine to push down the full aggregate calculation to the index node.
429477
In this case, the query engine does not have to recompute the final aggregate result from the index nodes.
430478
In addition, certain pushdown optimizations can only be enabled when a full aggregate result is expected from the index node.
431479
To enable a full aggregate computation, the index must be created with the following requirements:
@@ -501,40 +549,11 @@ NOTE: To avoid any downtime, before removing the partitioned index, first create
501549
[[sizing-hints]]
502550
=== Sizing Hints
503551

504-
You can optionally provide sizing hints too.
552+
You can optionally provide sizing hints to help place the partitions.
505553
Given the sizing hints, the planner uses a formula to estimate the memory and CPU usage of the index.
506554
Based on the estimated memory and CPU usage, the planner tries to place the partitions according to the free resources available to each index node.
507555

508-
.Sizing Hints
509-
[cols="2,5,2"]
510-
|===
511-
| Optional Sizing Hint | Description | Example
512-
513-
| *secKeySize*
514-
| The average length of the combined index keys.
515-
| `20`
516-
517-
| *docKeySize*
518-
| The average length of the document key `meta().id`.
519-
| `20`
520-
521-
| *arrSize*
522-
| The average length of the array field.
523-
Non-array fields will be ignored.
524-
| `10`
525-
526-
| *numDoc*
527-
| The number of documents in the index.
528-
| `7303`
529-
530-
| *residentRatio*
531-
| The memory usage of the index, as a percentage of its estimated data size.
532-
| `50`
533-
|===
534-
535-
NOTE: Couchbase recommends setting the residentRatio property value over 10 to avoid issues, for example, index build failures.
536-
537-
To provide sizing estimation, you can use a command similar to the following examples.
556+
For a list of sizing hints and example values, see <<index-with,WITH Clause>>.
538557

539558
[.server]
540559
include::ROOT:partial$query-context.adoc[tag=section]
@@ -584,26 +603,27 @@ When an index node fails, any in-flight query requests (serviced by the failed n
584603
Any new query requests requiring the lost partition are then serviced by the partitions in the replica.
585604

586605
[[rebalancing]]
587-
== Rebalancing
606+
== Rebalance
588607

589608
When new index nodes are added or removed from the cluster, the rebalance operation attempts to move the index partitions across available index nodes in order to balance resource consumptions.
590609
At the time of rebalancing, the rebalance operation gathers statistics from each index.
591610
These statistics are fed to an optimization algorithm to determine the possible placement of each partition in order to minimize the variation of resource consumption across index nodes.
592611

593612
The rebalancer will only attempt to balance resource consumption on a best try basis.
594-
There are situations where the resource consumption cannot be fully balanced.
613+
In some situations, the resource consumption cannot be fully balanced.
595614
For example:
596615

597616
* The index service will not try to move the index if the cost to move an index across nodes is too high.
598617
* A cluster has a mix of non-partitioned indexes and partitioned indexes.
599-
* There is data skew in the partitions.
618+
* The partitions contain skewed data.
600619

601620
ifdef::flag-devex-rest-api[]
602621
The index redistribution setting enables you to specify how Couchbase Capella redistributes indexes automatically on rebalance.
603622
endif::flag-devex-rest-api[]
604623
For more information, see {rebalancing-the-index-service}[Rebalance].
605624

606-
== Repairing Failed Partitions
625+
[[repairing-failed-partitions]]
626+
== Repair Failed Partitions
607627

608628
When an index node fails, the index partitions on that node will be lost.
609629
The lost partitions can be recovered or repaired when:
@@ -615,13 +635,14 @@ The lost partitions cannot be repaired when the number of remaining nodes is les
615635

616636
== Performance Considerations
617637

638+
// Nothing
639+
618640
=== Max Parallelism
619641

620642
Along with aggregate pushdown optimization, an application can further enhance the aggregate query performance by computing aggregation in parallel for each partition in the index service.
621643
This can be controlled by specifying the parameter `max_parallelism` when issuing a query.
622644
In Couchbase Capella, `max_parallelism` is set by default to match the number of partitions of the index.
623-
Note that when `max_parallelism` is set to the default value, the index service uses more CPU and memory since the query traffic is increased.
624-
645+
When `max_parallelism` is set to the default value, the index service uses more CPU and memory since the query traffic is increased.
625646

626647
=== OFFSET Pushdown
627648

0 commit comments

Comments
 (0)