build 29.0.0 and 29.0.1 (#475)

apache · May 27, 2024 · dd4219c · dd4219c
1 parent 412fdba
commit dd4219c
Show file tree

Hide file tree

Showing 2,214 changed files with 139,537 additions and 16,246 deletions.
diff --git a/docs/29.0.0/development/extensions-core/druid-basic-security.md b/docs/29.0.0/development/extensions-core/druid-basic-security.md
@@ -165,13 +165,7 @@ Authorizer that requests should be directed to.<br />
 
 ##### Credential iterations and API performance
 
-As noted above, `credentialIterations` determines the number of iterations used to hash a password. A higher number increases security, but costs more in terms of CPU utilization. 
-
-This cost affects API performance, including query times. The default setting of 10000 is intentionally high to prevent attackers from using brute force to guess passwords.
-
-You can decrease the number of iterations to speed up API response times, but it may expose your system to dictionary attacks. Therefore, only reduce the number of iterations if your environment fits one of the following conditions:
-- **All** passwords are long and random which make them as safe as a randomly-generated token.
-- You have secured network access to Druid so that no attacker can execute a dictionary attack against it.
+As noted above, the value of `credentialIterations` determines the number of iterations used to hash a password. A higher number of iterations increases security. The default value of 10,000 is intentionally high to prevent attackers from using brute force to guess passwords. We recommend that you don't lower this value. Druid caches the hash of up to 1000 passwords used in the last hour to ensure that having a large number of iterations does not meaningfully impact query performance. 
 
 If Druid uses the default credentials validator (i.e., `credentialsValidator.type=metadata`), changing the `credentialIterations` value affects the number of hashing iterations only for users created after the change or for users who subsequently update their passwords via the `/druid-ext/basic-security/authentication/db/basic/users/{userName}/credentials` endpoint. If Druid uses the `ldap` validator, the change applies to any user at next log in (as well as to new users or users who update their passwords).
 

diff --git a/docs/29.0.0/multi-stage-query/known-issues.md b/docs/29.0.0/multi-stage-query/known-issues.md
@@ -53,6 +53,9 @@ differs from SQL standard behavior, where columns are inserted based on position
 including the `createBitmapIndex` and `multiValueHandling` [dimension](../ingestion/ingestion-spec.md#dimension-objects)
 properties, and the `indexSpec` [`tuningConfig`](../ingestion/ingestion-spec.md#tuningconfig) property.
 
+- Queries using `EXTERN` to export data sometimes do not contain all the results. Certain rows maybe missing from the
+created files.
+
 ## `EXTERN` Function
 
 - The [schemaless dimensions](../ingestion/ingestion-spec.md#inclusions-and-exclusions)

diff --git a/docs/29.0.0/multi-stage-query/reference.md b/docs/29.0.0/multi-stage-query/reference.md
@@ -336,7 +336,7 @@ The following table lists the context parameters for the MSQ task engine:
 | `faultTolerance` | SELECT, INSERT, REPLACE<br /><br /> Whether to turn on fault tolerance mode or not. Failed workers are retried based on [Limits](#limits). Cannot be used when `durableShuffleStorage` is explicitly set to false. | `false` |
 | `selectDestination` | SELECT<br /><br /> Controls where the final result of the select query is written. <br />Use `taskReport`(the default) to write select results to the task report. <b> This is not scalable since task reports size explodes for large results </b> <br/>Use `durableStorage` to write results to durable storage location. <b>For large results sets, its recommended to use `durableStorage` </b>. To configure durable storage see [`this`](#durable-storage) section. | `taskReport` |
 | `waitUntilSegmentsLoad` | INSERT, REPLACE<br /><br /> If set, the ingest query waits for the generated segment to be loaded before exiting, else the ingest query exits without waiting. The task and live reports contain the information about the status of loading segments if this flag is set. This will ensure that any future queries made after the ingestion exits will include results from the ingestion. The drawback is that the controller task will stall till the segments are loaded. | `false` |
-| `includeSegmentSource` | SELECT, INSERT, REPLACE<br /><br /> Controls the sources, which will be queried for results in addition to the segments present on deep storage. Can be `NONE` or `REALTIME`. If this value is `NONE`, only non-realtime (published and used) segments will be downloaded from deep storage. If this value is `REALTIME`, results will also be included from realtime tasks. | `NONE` |
+| `includeSegmentSource` | SELECT, INSERT, REPLACE<br /><br /> Controls the sources, which will be queried for results in addition to the segments present on deep storage. Can be `NONE` or `REALTIME`. If this value is `NONE`, only non-realtime (published and used) segments will be downloaded from deep storage. If this value is `REALTIME`, results will also be included from realtime tasks. `REALTIME` should not be used while writing data into the same datasource it is read from. This could potentially lead to incorrect data or missing results.| `NONE` |
 | `rowsPerPage` | SELECT<br /><br />The number of rows per page to target. The actual number of rows per page may be somewhat higher or lower than this number. In most cases, use the default.<br /> This property comes into effect only when `selectDestination` is set to `durableStorage` | 100000 |
 | `failOnEmptyInsert` | INSERT or REPLACE<br /><br /> When set to false (the default), an INSERT query generating no output rows will be no-op, and a REPLACE query generating no output rows will delete all data that matches the OVERWRITE clause.  When set to true, an ingest query generating no output rows will throw an `InsertCannotBeEmpty` fault. | `false` |
 

diff --git a/docs/29.0.0/querying/sql-translation.md b/docs/29.0.0/querying/sql-translation.md
@@ -71,7 +71,7 @@ EXPLAIN PLAN statements return:
 - a `RESOURCES` column that describes the resources used in the query
 - an `ATTRIBUTES` column that describes the attributes of the query, including:
   - `statementType`: the SQL statement type
-  - `targetDataSource`: the target datasource in an INSERT or REPLACE statement
+  - `targetDataSource`: a JSON object representing the target datasource in an INSERT or REPLACE statement
   - `partitionedBy`: the time-based partitioning granularity in an INSERT or REPLACE statement
   - `clusteredBy`: the clustering columns in an INSERT or REPLACE statement
   - `replaceTimeChunks`: the time chunks in a REPLACE statement
@@ -444,7 +444,10 @@ The above EXPLAIN PLAN returns the following result:
   ],
   {
     "statementType": "INSERT",
-    "targetDataSource": "wikipedia",
+    "targetDataSource": {
+      "type":"table",
+      "tableName":"wikipedia"
+    },
     "partitionedBy": {
       "type": "all"
     }
@@ -665,7 +668,10 @@ The above EXPLAIN PLAN query returns the following result:
   ],
   {
     "statementType": "REPLACE",
-    "targetDataSource": "wikipedia",
+    "targetDataSource": {
+      "type":"table",
+      "tableName":"wikipedia"
+    },
     "partitionedBy": "DAY",
     "clusteredBy": ["cityName","countryName"],
     "replaceTimeChunks": "all"

diff --git a/docs/29.0.0/release-info/release-notes.md b/docs/29.0.0/release-info/release-notes.md
@@ -555,6 +555,35 @@ Improved the Iceberg extension as follows:
 
 ### Upgrade notes
 
+#### Changes in `targetDataSource` payload present in the explain plan for MSQ queries
+
+Druid 29 has a breaking change for EXPLAIN for INSERT/REPLACE MSQ queries.
+In the attribute field returned as part of the result for an explain query, the value of the key `targetDataSource` from a string to a JSON object.
+This change is only present in Druid 29 and is not present in earlier or later versions.
+
+The JSON object returned plan will have the structure if the target is a datasource:
+```json
+{
+  "targetDataSource": {
+    "type": "table",
+    "tableName": "wikipedia"
+  }
+}
+```
+
+The JSON object returned plan will have the structure if the target is an external export location using :
+```json
+{
+  "targetDataSource": {
+    "type": "external",
+    "storageConnectorProvider": {
+      "type": "<export-type>",
+      "exportPath": "<export-path>"
+    }
+  }
+}
+```
+
 #### Changed `equals` filter for native queries
 
 The [equality filter](https://druid.apache.org/docs/latest/querying/filters#equality-filter) on mixed type `auto` columns that contain arrays must now be filtered as their presenting type. This means that if any rows are arrays (for example, the segment metadata and `information_schema` reports the type as some array type), then the native queries must also filter as if they are some array type.

diff --git a/docs/29.0.0/release-info/upgrade-notes.md b/docs/29.0.0/release-info/upgrade-notes.md
@@ -30,6 +30,10 @@ For the full release notes for a specific version, see the [releases page](https
 
 ### Upgrade notes
 
+#### Changes in `targetDataSource` payload present in the explain plan for MSQ queries
+
+In the attribute field returned as part of the result for an EXPLAIN MSQ query, the value of the key `targetDataSource` from a string to a JSON object. This change is only present in Druid 29 and is not present in earlier or later versions.
+
 #### Changed `equals` filter for native queries
 
 The [equality filter](https://druid.apache.org/docs/latest/querying/filters#equality-filter) on mixed type `auto` columns that contain arrays must now be filtered as their presenting type. This means that if any rows are arrays (for example, the segment metadata and `information_schema` reports the type as some array type), then the native queries must also filter as if they are some array type.

diff --git a/docs/29.0.1/api-reference/api-reference.md b/docs/29.0.1/api-reference/api-reference.md
@@ -0,0 +1,44 @@
+---
+id: api-reference
+title: API reference
+sidebar_label: Overview
+---
+
+<!--
+  ~ Licensed to the Apache Software Foundation (ASF) under one
+  ~ or more contributor license agreements.  See the NOTICE file
+  ~ distributed with this work for additional information
+  ~ regarding copyright ownership.  The ASF licenses this file
+  ~ to you under the Apache License, Version 2.0 (the
+  ~ "License"); you may not use this file except in compliance
+  ~ with the License.  You may obtain a copy of the License at
+  ~
+  ~   http://www.apache.org/licenses/LICENSE-2.0
+  ~
+  ~ Unless required by applicable law or agreed to in writing,
+  ~ software distributed under the License is distributed on an
+  ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  ~ KIND, either express or implied.  See the License for the
+  ~ specific language governing permissions and limitations
+  ~ under the License.
+  -->
+
+
+This topic is an index to the Apache Druid API documentation.
+
+## HTTP APIs
+* [Druid SQL queries](./sql-api.md) to submit SQL queries using the Druid SQL API.
+* [SQL-based ingestion](./sql-ingestion-api.md) to submit SQL-based batch ingestion requests.
+* [JSON querying](./json-querying-api.md) to submit JSON-based native queries.
+* [Tasks](./tasks-api.md) to manage data ingestion operations.
+* [Supervisors](./supervisor-api.md) to manage supervisors for data ingestion lifecycle and data processing.
+* [Retention rules](./retention-rules-api.md) to define and manage data retention rules across datasources.
+* [Data management](./data-management-api.md) to manage data segments.
+* [Automatic compaction](./automatic-compaction-api.md) to optimize segment sizes after ingestion.
+* [Lookups](./lookups-api.md) to manage and modify key-value datasources.
+* [Service status](./service-status-api.md) to monitor components within the Druid cluster. 
+* [Dynamic configuration](./dynamic-configuration-api.md) to configure the behavior of the Coordinator and Overlord processes.
+* [Legacy metadata](./legacy-metadata-api.md) to retrieve datasource metadata.
+
+## Java APIs
+* [SQL JDBC driver](./sql-jdbc.md) to connect to Druid and make Druid SQL queries using the Avatica JDBC driver.