Skip to content

Commit

Permalink
[Backport] backport ux bug fixes to 25 (#13533)
Browse files Browse the repository at this point in the history
* Web console: add arrayOfDoublesSketch and other small fixes (#13486)
* add padding and keywords
* add arrayOfDoubles
* Update docs/development/extensions-core/datasketches-tuple.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/development/extensions-core/datasketches-tuple.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/development/extensions-core/datasketches-tuple.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/development/extensions-core/datasketches-tuple.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Update docs/development/extensions-core/datasketches-tuple.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* partiton int
* fix docs
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Web console: improve compaction status display (#13523)
* improve compaction status display
* even more accurate
* fix snapshot
* MSQ: Improve TooManyBuckets error message, improve error docs. (#13525)
1) Edited the TooManyBuckets error message to mention PARTITIONED BY
   instead of segmentGranularity.
2) Added error-code-specific anchors in the docs.
3) Add information to various error codes in the docs about common
   causes and solutions.
* update error anchors (#13527)
* update snapshot
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
Co-authored-by: Gian Merlino <gianmerlino@gmail.com>
  • Loading branch information
vogievetsky committed Dec 10, 2022
1 parent 977792d commit 348c9f6
Show file tree
Hide file tree
Showing 29 changed files with 372 additions and 120 deletions.
43 changes: 38 additions & 5 deletions docs/development/extensions-core/datasketches-tuple.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,19 +39,52 @@ druid.extensions.loadList=["druid-datasketches"]
"name" : <output_name>,
"fieldName" : <metric_name>,
"nominalEntries": <number>,
"numberOfValues" : <number>,
"metricColumns" : <array of strings>
"metricColumns" : <array of strings>,
"numberOfValues" : <number>
}
```

|property|description|required?|
|--------|-----------|---------|
|type|This String should always be "arrayOfDoublesSketch"|yes|
|name|A String for the output (result) name of the calculation.|yes|
|name|String representing the output column to store sketch values.|yes|
|fieldName|A String for the name of the input field.|yes|
|nominalEntries|Parameter that determines the accuracy and size of the sketch. Higher k means higher accuracy but more space to store sketches. Must be a power of 2. See the [Theta sketch accuracy](https://datasketches.apache.org/docs/Theta/ThetaErrorTable) for details. |no, defaults to 16384|
|numberOfValues|Number of values associated with each distinct key. |no, defaults to 1|
|metricColumns|If building sketches from raw data, an array of names of the input columns containing numeric values to be associated with each distinct key.|no, defaults to empty array|
|metricColumns|When building sketches from raw data, an array input column that contain numeric values to associate with each distinct key. If not provided, assumes `fieldName` is an `arrayOfDoublesSketch`|no, if not provided `fieldName` is assumed to be an arrayOfDoublesSketch|
|numberOfValues|Number of values associated with each distinct key. |no, defaults to the length of `metricColumns` if provided and 1 otherwise|

You can use the `arrayOfDoublesSketch` aggregator to:

- Build a sketch from raw data. In this case, set `metricColumns` to an array.
- Build a sketch from an existing `ArrayOfDoubles` sketch . In this case, leave `metricColumns` unset and set the `fieldName` to an `ArrayOfDoubles` sketch with `numberOfValues` doubles. At ingestion time, you must base64 encode `ArrayOfDoubles` sketches at ingestion time.

#### Example on top of raw data

Compute a theta of unique users. For each user store the `added` and `deleted` scores. The new sketch column will be called `users_theta`.

```json
{
"type": "arrayOfDoublesSketch",
"name": "users_theta",
"fieldName": "user",
"nominalEntries": 16384,
"metricColumns": ["added", "deleted"],
}
```

#### Example ingesting a precomputed sketch column

Ingest a sketch column called `user_sketches` that has a base64 encoded value of two doubles in its array and store it in a column called `users_theta`.

```json
{
"type": "arrayOfDoublesSketch",
"name": "users_theta",
"fieldName": "user_sketches",
"nominalEntries": 16384,
"numberOfValues": 2,
}
```

### Post Aggregators

Expand Down
3 changes: 2 additions & 1 deletion docs/multi-stage-query/concepts.md
Original file line number Diff line number Diff line change
Expand Up @@ -233,7 +233,8 @@ happens:
The [`maxNumTasks`](./reference.md#context-parameters) query parameter determines the maximum number of tasks your
query will use, including the one `query_controller` task. Generally, queries perform better with more workers. The
lowest possible value of `maxNumTasks` is two (one worker and one controller). Do not set this higher than the number of
free slots available in your cluster; doing so will result in a [TaskStartTimeout](reference.md#error-codes) error.
free slots available in your cluster; doing so will result in a [TaskStartTimeout](reference.md#error_TaskStartTimeout)
error.

When [reading external data](#extern), EXTERN can read multiple files in parallel across
different worker tasks. However, EXTERN does not split individual files across multiple worker tasks. If you have a
Expand Down
8 changes: 5 additions & 3 deletions docs/multi-stage-query/known-issues.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,16 +33,18 @@ sidebar_label: Known issues

- Worker task stage outputs are stored in the working directory given by `druid.indexer.task.baseDir`. Stages that
generate a large amount of output data may exhaust all available disk space. In this case, the query fails with
an [UnknownError](./reference.md#error-codes) with a message including "No space left on device".
an [UnknownError](./reference.md#error_UnknownError) with a message including "No space left on device".

## SELECT

- SELECT from a Druid datasource does not include unpublished real-time data.

- GROUPING SETS and UNION ALL are not implemented. Queries using these features return a
[QueryNotSupported](reference.md#error-codes) error.
[QueryNotSupported](reference.md#error_QueryNotSupported) error.

- For some COUNT DISTINCT queries, you'll encounter a [QueryNotSupported](reference.md#error-codes) error that includes `Must not have 'subtotalsSpec'` as one of its causes. This is caused by the planner attempting to use GROUPING SETs, which are not implemented.
- For some COUNT DISTINCT queries, you'll encounter a [QueryNotSupported](reference.md#error_QueryNotSupported) error
that includes `Must not have 'subtotalsSpec'` as one of its causes. This is caused by the planner attempting to use
GROUPING SETs, which are not implemented.

- The numeric varieties of the EARLIEST and LATEST aggregators do not work properly. Attempting to use the numeric
varieties of these aggregators lead to an error like
Expand Down

0 comments on commit 348c9f6

Please sign in to comment.