[Backport] backport ux bug fixes to 25 (#13533)

* Web console: add arrayOfDoublesSketch and other small fixes (#13486) * add padding and keywords * add arrayOfDoubles * Update docs/development/extensions-core/datasketches-tuple.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/development/extensions-core/datasketches-tuple.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/development/extensions-core/datasketches-tuple.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/development/extensions-core/datasketches-tuple.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/development/extensions-core/datasketches-tuple.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * partiton int * fix docs Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Web console: improve compaction status display (#13523) * improve compaction status display * even more accurate * fix snapshot * MSQ: Improve TooManyBuckets error message, improve error docs. (#13525) 1) Edited the TooManyBuckets error message to mention PARTITIONED BY instead of segmentGranularity. 2) Added error-code-specific anchors in the docs. 3) Add information to various error codes in the docs about common causes and solutions. * update error anchors (#13527) * update snapshot Co-authored-by: Charles Smith <techdocsmith@gmail.com> Co-authored-by: Gian Merlino <gianmerlino@gmail.com>
apache · Dec 10, 2022 · 348c9f6 · 348c9f6
1 parent 977792d
commit 348c9f6
Show file tree

Hide file tree

Showing 29 changed files with 372 additions and 120 deletions.
diff --git a/docs/development/extensions-core/datasketches-tuple.md b/docs/development/extensions-core/datasketches-tuple.md
@@ -39,19 +39,52 @@ druid.extensions.loadList=["druid-datasketches"]
   "name" : <output_name>,
   "fieldName" : <metric_name>,
   "nominalEntries": <number>,
-  "numberOfValues" : <number>,
-  "metricColumns" : <array of strings>
+  "metricColumns" : <array of strings>,
+  "numberOfValues" : <number>
  }
 ```
 
 |property|description|required?|
 |--------|-----------|---------|
 |type|This String should always be "arrayOfDoublesSketch"|yes|
-|name|A String for the output (result) name of the calculation.|yes|
+|name|String representing the output column to store sketch values.|yes|
 |fieldName|A String for the name of the input field.|yes|
 |nominalEntries|Parameter that determines the accuracy and size of the sketch. Higher k means higher accuracy but more space to store sketches. Must be a power of 2. See the [Theta sketch accuracy](https://datasketches.apache.org/docs/Theta/ThetaErrorTable) for details. |no, defaults to 16384|
-|numberOfValues|Number of values associated with each distinct key. |no, defaults to 1|
-|metricColumns|If building sketches from raw data, an array of names of the input columns containing numeric values to be associated with each distinct key.|no, defaults to empty array|
+|metricColumns|When building sketches from raw data, an array input column that contain numeric values to associate with each distinct key. If not provided, assumes `fieldName` is an `arrayOfDoublesSketch`|no, if not provided `fieldName` is assumed to be an arrayOfDoublesSketch|
+|numberOfValues|Number of values associated with each distinct key. |no, defaults to the length of `metricColumns` if provided and 1 otherwise|
+
+You can use the `arrayOfDoublesSketch` aggregator to:
+
+- Build a sketch from raw data. In this case, set `metricColumns` to an array.
+- Build a sketch from an existing `ArrayOfDoubles` sketch . In this case, leave `metricColumns` unset and set the `fieldName` to an `ArrayOfDoubles` sketch with `numberOfValues` doubles. At ingestion time, you must base64 encode `ArrayOfDoubles`  sketches at ingestion time.
+
+#### Example on top of raw data
+
+Compute a theta of unique users. For each user store the `added` and `deleted` scores. The new sketch column will be called `users_theta`.
+
+```json
+{
+  "type": "arrayOfDoublesSketch",
+  "name": "users_theta",
+  "fieldName": "user",
+  "nominalEntries": 16384,
+  "metricColumns": ["added", "deleted"],
+}
+```
+
+#### Example ingesting a precomputed sketch column
+
+Ingest a sketch column called `user_sketches` that has a base64 encoded value of two doubles in its array and store it in a column called `users_theta`.
+
+```json
+{
+  "type": "arrayOfDoublesSketch",
+  "name": "users_theta",
+  "fieldName": "user_sketches",
+  "nominalEntries": 16384,
+  "numberOfValues": 2,
+}
+```
 
 ### Post Aggregators
 

diff --git a/docs/multi-stage-query/concepts.md b/docs/multi-stage-query/concepts.md
@@ -233,7 +233,8 @@ happens:
 The [`maxNumTasks`](./reference.md#context-parameters) query parameter determines the maximum number of tasks your
 query will use, including the one `query_controller` task. Generally, queries perform better with more workers. The
 lowest possible value of `maxNumTasks` is two (one worker and one controller). Do not set this higher than the number of
-free slots available in your cluster; doing so will result in a [TaskStartTimeout](reference.md#error-codes) error.
+free slots available in your cluster; doing so will result in a [TaskStartTimeout](reference.md#error_TaskStartTimeout)
+error.
 
 When [reading external data](#extern), EXTERN can read multiple files in parallel across
 different worker tasks. However, EXTERN does not split individual files across multiple worker tasks. If you have a

diff --git a/docs/multi-stage-query/known-issues.md b/docs/multi-stage-query/known-issues.md
@@ -33,16 +33,18 @@ sidebar_label: Known issues
 
 - Worker task stage outputs are stored in the working directory given by `druid.indexer.task.baseDir`. Stages that
 generate a large amount of output data may exhaust all available disk space. In this case, the query fails with
-an [UnknownError](./reference.md#error-codes) with a message including "No space left on device".
+an [UnknownError](./reference.md#error_UnknownError) with a message including "No space left on device".
 
 ## SELECT
 
 - SELECT from a Druid datasource does not include unpublished real-time data.
 
 - GROUPING SETS and UNION ALL are not implemented. Queries using these features return a
-  [QueryNotSupported](reference.md#error-codes) error. 
+  [QueryNotSupported](reference.md#error_QueryNotSupported) error.
 
-- For some COUNT DISTINCT queries, you'll encounter a [QueryNotSupported](reference.md#error-codes) error that includes `Must not have 'subtotalsSpec'` as one of its causes. This is caused by the planner attempting to use GROUPING SETs, which are not implemented.
+- For some COUNT DISTINCT queries, you'll encounter a [QueryNotSupported](reference.md#error_QueryNotSupported) error
+  that includes `Must not have 'subtotalsSpec'` as one of its causes. This is caused by the planner attempting to use
+  GROUPING SETs, which are not implemented.
 
 - The numeric varieties of the EARLIEST and LATEST aggregators do not work properly. Attempting to use the numeric
   varieties of these aggregators lead to an error like