You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Apache Druid 25.0.0 contains over 300 new features, bug fixes, performance enhancements, documentation improvements, and additional test coverage from 51 contributors.
The multi-stage query (MSQ) task engine used for SQL-based ingestion is now production ready. Use it for any supported workloads. For more information, see the following pages:
The new start-druid script greatly simplifies deploying any combination of Druid services on a single-server. It comes pre-packaged with the required configs and can be used to launch a fully functional Druid cluster simply by invoking ./start-druid. For experienced Druids, it also gives complete control over the runtime properties and JVM arguments to have a cluster that exactly fits your needs.
The start-druid script deprecates the existing profiles such as start-micro-quickstart and start-nano-quickstart. These profiles may be removed in future releases. For more information, see Single server deployment.
Added support for front coded string dictionaries for smaller string columns, leading to reduced segment sizes with only minor performance penalties for most Druid queries.
This can be enabled by setting IndexSpec.stringDictionaryEncoding to {"type":"frontCoded", "bucketSize": 4} , where bucketSize is any power of 2 less than or equal to 128. Setting this property instructs indexing tasks to write segments using compressed dictionaries of the specified bucket size.
Any segment written using string dictionary compression is not readable by older versions of Druid.
Druid now comes packaged as a dedicated binary for Hadoop-3 users, which contains Hadoop-3 compatible jars. If you do not use Hadoop-3 with your Druid cluster, you may continue using the classic binary.
The MSQ task engine supports the front-coding of String dictionaries for better compression. This can be enabled for INSERT or REPLACE statements by setting indexSpec to a valid json string in the query context.
Workers can now gather key statistics, used to generate partition boundaries, either sequentially or in parallel. Set clusterStatisticsMergeMode to PARALLEL, SEQUENTIAL or AUTO in the query context to use the corresponding sketch merging mode. For more information, see Sketch merging mode.
Prevented JDBC timeouts on long queries by returning empty batches when a batch fetch takes too long. Uses an async model to run the result fetch concurrently with JDBC requests.
# Improved algorithm to check values of an IN filter
To accommodate large value sets arising from large IN filters or from joins pushed down as IN filters, Druid now uses a sorted merge algorithm for merging the set and dictionary for larger values.
Partition-wise lag between the offsets consumed by the Kafka indexing tasks and latest offsets in Kafka brokers. Minimum emission period for this metric is a minute.
dataSource, stream, partition
ingest/kinesis/partitionLag/time
Partition-wise lag time in milliseconds between the current message sequence number consumed by the Kinesis indexing tasks and latest sequence number in Kinesis. Minimum emission period for this metric is a minute.
dataSource, stream, partition
ingest/pause/time
Milliseconds spent by a task in a paused state without ingesting.
dataSource, taskId, taskType
ingest/handoff/time
Total time taken in milliseconds for handing off a given set of published segments.
Improved NestedDataColumnSerializer to no longer explicitly write null values to the field writers for the missing values of every row. Instead, passing the row counter is moved to the field writers so that they can backfill null values in bulk.
When data requires "flattening" during processing, the operator now takes in an array and then flattens the array into N (N=number of elements in the array) rows where each row has one of the values from the array.
You can now stop at arbitrary subfolders using glob syntax in the ioConfig.inputSource.filter field for native batch ingestion from cloud storage, such as S3.
You can now enable asynchronous communication between the stream supervisor and indexing tasks by setting chatAsync to true in the tuningConfig. The async task client uses its own internal thread pool and thus ignrores the chatThreads property.
When a Kafka stream becomes inactive, the supervisor ingesting from it can be configured to stop creating new indexing tasks. The supervisor automatically resumes creation of new indexing tasks once the stream becomes active again. Set the property dataSchema.ioConfig.idleConfig.enabled to true in the respective supervisor spec or set druid.supervisor.idleConfig.enabled on the overlord to enable this behaviour. Please see the following for details:
Fixed a problem where Overlord leader election failed due to lock reacquisition issues. Druid now fails these tasks and clears all locks so that the Overlord leader election isn't blocked.
# Sampling from stream input now respects the configured timeout
Fixed a problem where sampling from a stream input, such as Kafka or Kinesis, failed to respect the configured timeout when the stream had no records available. You can now set the maximum amount of time in which the entry iterator will return results.
Fixed a problem where streaming ingestion tasks continued to run until their duration elapsed after the Overlord leader had issued a pause to the tasks. Now, when the Overlord switch occurs right after it has issued a pause to the task, the task remains in a paused state even after the Overlord re-election.
Fixed an issue with Parquet list conversion, where lists of complex objects could unexpectedly be wrapped in an extra object, appearing as [{"element":<actual_list_element>},{"element":<another_one>}...] instead of the direct list. This changes the behavior of the parquet reader for lists of structured objects to be consistent with other parquet logical list conversions. The data is now fetched directly, more closely matching its expected structure.
Introduced a tree type to flattenSpec. In the event that a simple hierarchical lookup is required, the tree type allows for faster JSON parsing than jq and path parsing types.
Compaction behavior has changed to improve the amount of time it takes and disk space it takes:
When segments need to be fetched, download them one at a time and delete them when Druid is done with them. This still takes time but minimizes the required disk space.
Don't fetch segments on the main compact task when they aren't needed. If the user provides a full granularitySpec, dimensionsSpec, and metricsSpec, Druid skips fetching segments.
You can now set the Supervisor to idle, which is useful in cases where freeing up slots so that autoscaling can be more effective.
To configure the idle behavior, use the following properties:
Property
Description
Default
druid.supervisor.idleConfig.enabled
(Cluster wide) If true, supervisor can become idle if there is no data on input stream/topic for some time.
false
druid.supervisor.idleConfig.inactiveAfterMillis
(Cluster wide) Supervisor is marked as idle if all existing data has been read from input topic and no new data has been published for inactiveAfterMillis milliseconds.
600_000
inactiveAfterMillis
(Individual Supervisor) Supervisor is marked as idle if all existing data has been read from input topic and no new data has been published for inactiveAfterMillis milliseconds.
The HttpPostEmitter option now has a backoff. This means that there should be less noise in the logs and lower CPU usage if you use this option for logging.
Segment allocation on the Overlord can take some time to finish, which can cause ingestion lag while a task waits for segments to be allocated. Performing segment allocation in batches can help improve performance.
There are two new properties that affect how Druid performs segment allocation:
Property
Description
Default
druid.indexer.tasklock.batchSegmentAllocation
If set to true, Druid performs segment allocate actions in batches to improve throughput and reduce the average task/action/run/time. See batching segmentAllocate actions for details.
false
druid.indexer.tasklock.batchAllocationWaitTime
Number of milliseconds after Druid adds the first segment allocate action to a batch, until it executes the batch. Allows the batch to add more requests and improve the average segment allocation run time. This configuration takes effect only if batchSegmentAllocation is enabled.
500
In addition to these properties, there are new metrics to track batch segment allocation. For more information, see New metrics for segment allocation.
The cachingCost balancer strategy now behaves more similarly to cost strategy. When computing the cost of moving a segment to a server, the following calculations are performed:
Subtract the self cost of a segment if it is being served by the target server
Subtract the cost of segments that are marked to be dropped
You can now use a round-robin segment strategy to speed up initial segment assignments. Set useRoundRobinSegmentAssigment to true in the Coordinator dynamic config to enable this feature.
# Default to batch sampling for balancing segments
Batch sampling is now the default method for sampling segments during balancing as it performs significantly better than the alternative when there is a large number of used segments in the cluster.
As part of this change, the following have been deprecated and will be removed in future releases:
The unused coordinator property druid.coordinator.loadqueuepeon.repeatDelay has been removed. Use only druid.coordinator.loadqueuepeon.http.repeatDelay to configure repeat delay for the HTTP-based segment loading queue.
# Improved the run time of the MarkAsUnusedOvershadowedSegments duty
Improved the run time of the MarkAsUnusedOvershadowedSegments duty by iterating over all overshadowed segments and marking segments as unused in batches.
Removed unnecessary generic type from CompressedBigDecimal, added support for number input types, added support for reading aggregator input types directly (uningested data), and fixed scaling bug in buffer aggregator.
Added POD_NAME and POD_NAMESPACE env variables to all Kubernetes Deployments and StatefulSets.
Helm deployment is now compatible with druid-kubernetes-extension.
We released our first Jupyter Notebook-based tutorial to learn the basics of the Druid API. Download the notebook and follow along with the tutorial to learn how to get basic cluster information, ingest data, and query data.
For more information, see Jupyter Notebook tutorials.
Consider the following changes and updates when upgrading from Druid 24.0.x to 25.0.0. If you're updating from an earlier version, see the release notes of the relevant intermediate versions.
# Default HTTP-based segment discovery and task management
The default segment discovery method now uses HTTP instead of ZooKeeper.
This update changes the defaults for the following properties:
Property
New default
Previous default
druid.serverview.type for segment management
http
batch
druid.coordinator.loadqueuepeon.type for segment management
http
curator
druid.indexer.runner.type for the Overlord
httpRemote
local
To use ZooKeeper instead of HTTP, change the values for the properties back to the previous defaults. ZooKeeper-based implementations for these properties are deprecated and will be removed in a subsequent release.
The aggregation functions for HLL and quantiles sketches returned sketches or numbers when they are finalized depending on where they were in the native query plan.
Druid no longer finalizes aggregators in the following two cases:
aggregators appear in the outer level of a query
aggregators are used as input to an expression or finalizing-field-access post-aggregator
This change aligns the behavior of HLL and quantiles sketches with theta sketches.
To restore old behaviour, you can set sqlFinalizeOuterSketches=true in the query context.
Apache Curator upgraded to the latest version, 5.3.0. This version drops support for ZooKeeper 3.4 but Druid has already officially dropped support in 0.22. In 5.3.0, Curator has removed support for Exhibitor so all related configurations and tests have been removed.
The behavior of the parquet reader for lists of structured objects has been changed to be consistent with other parquet logical list conversions. The data is now fetched directly, more closely matching its expected structure. See parquet list conversion for more details.
Apache Druid 25.0.0 contains over 300 new features, bug fixes, performance enhancements, documentation improvements, and additional test coverage from 51 contributors.
See the complete set of changes for additional details.
# Highlights
# MSQ task engine now production ready
The multi-stage query (MSQ) task engine used for SQL-based ingestion is now production ready. Use it for any supported workloads. For more information, see the following pages:
# Simplified Druid deployments
The new
start-druid
script greatly simplifies deploying any combination of Druid services on a single-server. It comes pre-packaged with the required configs and can be used to launch a fully functional Druid cluster simply by invoking./start-druid
. For experienced Druids, it also gives complete control over the runtime properties and JVM arguments to have a cluster that exactly fits your needs.The
start-druid
script deprecates the existing profiles such asstart-micro-quickstart
andstart-nano-quickstart
. These profiles may be removed in future releases. For more information, see Single server deployment.# String dictionary compression (experimental)
Added support for front coded string dictionaries for smaller string columns, leading to reduced segment sizes with only minor performance penalties for most Druid queries.
This can be enabled by setting
IndexSpec.stringDictionaryEncoding
to{"type":"frontCoded", "bucketSize": 4}
, wherebucketSize
is any power of 2 less than or equal to 128. Setting this property instructs indexing tasks to write segments using compressed dictionaries of the specified bucket size.For more information, see Front coding.
#12277
# Kubernetes-native tasks
Druid can now use Kubernetes to launch and manage tasks, eliminating the need for middle managers.
To use this feature, enable the druid-kubernetes-overlord-extensions in the extensions load list for your Overlord process.
#13156
# Hadoop-3 compatible binary
Druid now comes packaged as a dedicated binary for Hadoop-3 users, which contains Hadoop-3 compatible jars. If you do not use Hadoop-3 with your Druid cluster, you may continue using the classic binary.
# Multi-stage query (MSQ) task engine
# MSQ enabled for Docker
MSQ task query engine is now enabled for Docker by default.
#13069
# Query history
Multi-stage queries no longer show up in the Query history dialog. They are still available in the Recent query tasks panel.
# Limit on CLUSTERED BY columns
When using the MSQ task engine to ingest data, the number of columns that can be passed in the CLUSTERED BY clause is now limited to 1500.
#13352
# Support for string dictionary compression
The MSQ task engine supports the front-coding of String dictionaries for better compression. This can be enabled for INSERT or REPLACE statements by setting
indexSpec
to a valid json string in the query context.#13275
# Sketch merging mode
Workers can now gather key statistics, used to generate partition boundaries, either sequentially or in parallel. Set
clusterStatisticsMergeMode
toPARALLEL
,SEQUENTIAL
orAUTO
in the query context to use the corresponding sketch merging mode. For more information, see Sketch merging mode.#13205
# Performance and operational improvements
pendingTasks
andrunningTasks
to the worker report. See Query task status information for related web console changes. Add task start status to worker report #13263# Querying
# Async reads for JDBC
Prevented JDBC timeouts on long queries by returning empty batches when a batch fetch takes too long. Uses an async model to run the result fetch concurrently with JDBC requests.
#13196
# Improved algorithm to check values of an IN filter
To accommodate large value sets arising from large IN filters or from joins pushed down as IN filters, Druid now uses a sorted merge algorithm for merging the set and dictionary for larger values.
#13133
# Enhanced query context security
Added the following configuration properties that refine the query context security model controlled by
druid.auth.authorizeQueryContextParams
:druid.auth.unsecuredContextKeys
: A JSON list of query context keys that do not require a security check.druid.auth.securedContextKeys
: A JSON list of query context keys that do require a security check.If both are set,
unsecuredContextKeys
acts as exceptions tosecuredContextKeys
.#13071
# HTTP response headers
The HTTP response for a SQL query now correctly sets response headers, same as a native query.
#13052
# Metrics
# New metrics
The following metrics have been newly added. For more details, see the complete list of Druid metrics.
# Batched segment allocation
These metrics pertain to batched segment allocation.
task/action/batch/runTime
segmentAllocate
actionsdataSource
,taskActionType=segmentAllocate
task/action/batch/queueTime
segmentAllocate
actionsdataSource
,taskActionType=segmentAllocate
task/action/batch/size
segmentAllocate
actionsdataSource
,taskActionType=segmentAllocate
task/action/batch/attempts
segmentAllocate
actionsdataSource
,taskActionType=segmentAllocate
task/action/success/count
segmentAllocate
actionsdataSource
,taskId
,taskType
,taskActionType=segmentAllocate
task/action/failed/count
segmentAllocate
actionsdataSource
,taskId
,taskType
,taskActionType=segmentAllocate
# Streaming ingestion
ingest/kafka/partitionLag
dataSource
,stream
,partition
ingest/kinesis/partitionLag/time
dataSource
,stream
,partition
ingest/pause/time
dataSource
,taskId
,taskType
ingest/handoff/time
dataSource
,taskId
,taskType
#13238
#13331
#13313
# Other improvements
taskActionType
which may take values such assegmentAllocate
,segmentTransactionalInsert
, etc. This dimension is reported fortask/action/run/time
and the new batched segment allocation metrics. Add taskActionType dimension to task/action/run/time. #13333namespace/cache/heapSizeInBytes
for global cached lookups now accounts for theString
object overhead of 40 bytes. Improve global-cached-lookups metric reporting #13219jvm/gc/cpu
has been fixed to report nanoseconds instead of milliseconds. JvmMonitor: Report jvm/gc/cpu in nanos. #13383# Nested columns
# Nested columns performance improvement
Improved
NestedDataColumnSerializer
to no longer explicitly write null values to the field writers for the missing values of every row. Instead, passing the row counter is moved to the field writers so that they can backfill null values in bulk.#13101
# Support for more formats
Druid nested columns and the associated JSON transform functions now support Avro, ORC, and Parquet.
#13325
#13375
# Refactored a datasource before unnest
When data requires "flattening" during processing, the operator now takes in an array and then flattens the array into N (N=number of elements in the array) rows where each row has one of the values from the array.
#13085
# Ingestion
# Improved filtering for cloud objects
You can now stop at arbitrary subfolders using glob syntax in the
ioConfig.inputSource.filter
field for native batch ingestion from cloud storage, such as S3.#13027
# Async task client for streaming ingestion
You can now enable asynchronous communication between the stream supervisor and indexing tasks by setting
chatAsync
to true in thetuningConfig
. The async task client uses its own internal thread pool and thus ignrores thechatThreads
property.#13354
# Improved handling of JSON data with streaming ingestion
You can now better control how Druid reads JSON data for streaming ingestion by setting the following fields in the input format specification:
assumedNewlineDelimited
to parse lines of JSON independently.useJsonNodeReader
to retain valid JSON events when parsing multi-line JSON events when a parsing exception occurs.The web console has been updated to include these options.
#13089
# Ingesting from an idle Kafka stream
When a Kafka stream becomes inactive, the supervisor ingesting from it can be configured to stop creating new indexing tasks. The supervisor automatically resumes creation of new indexing tasks once the stream becomes active again. Set the property
dataSchema.ioConfig.idleConfig.enabled
to true in the respective supervisor spec or setdruid.supervisor.idleConfig.enabled
on the overlord to enable this behaviour. Please see the following for details:#13144
# Kafka Consumer improvement
You can now configure the Kafka Consumer's custom deserializer after its instantiation.
#13097
# Kafka supervisor logging
Kafka supervisor logs are now less noisy. The supervisors now log events at the DEBUG level instead of INFO.
#13392
# Fixed Overlord leader election
Fixed a problem where Overlord leader election failed due to lock reacquisition issues. Druid now fails these tasks and clears all locks so that the Overlord leader election isn't blocked.
#13172
# Support for inline protobuf descriptor
Added a new
inline
typeprotoBytesDecoder
that allows a user to pass inline the contents of a Protobuf descriptor file, encoded as a Base64 string.#13192
# Duplicate notices
For streaming ingestion, notices that are the same as one already in queue won't be enqueued. This will help reduce notice queue size.
#13334
# Sampling from stream input now respects the configured timeout
Fixed a problem where sampling from a stream input, such as Kafka or Kinesis, failed to respect the configured timeout when the stream had no records available. You can now set the maximum amount of time in which the entry iterator will return results.
#13296
# Streaming tasks resume on Overlord switch
Fixed a problem where streaming ingestion tasks continued to run until their duration elapsed after the Overlord leader had issued a pause to the tasks. Now, when the Overlord switch occurs right after it has issued a pause to the task, the task remains in a paused state even after the Overlord re-election.
#13223
# Fixed Parquet list conversion
Fixed an issue with Parquet list conversion, where lists of complex objects could unexpectedly be wrapped in an extra object, appearing as
[{"element":<actual_list_element>},{"element":<another_one>}...]
instead of the direct list. This changes the behavior of the parquet reader for lists of structured objects to be consistent with other parquet logical list conversions. The data is now fetched directly, more closely matching its expected structure.#13294
# Introduced a tree type to flattenSpec
Introduced a
tree
type toflattenSpec
. In the event that a simple hierarchical lookup is required, thetree
type allows for faster JSON parsing thanjq
andpath
parsing types.#12177
# Operations
# Compaction
Compaction behavior has changed to improve the amount of time it takes and disk space it takes:
granularitySpec
,dimensionsSpec
, andmetricsSpec
, Druid skips fetching segments.For more information, see the documentation on Compaction and Automatic compaction.
#13280
# Idle configs for the Supervisor
You can now set the Supervisor to idle, which is useful in cases where freeing up slots so that autoscaling can be more effective.
To configure the idle behavior, use the following properties:
druid.supervisor.idleConfig.enabled
true
, supervisor can become idle if there is no data on input stream/topic for some time.druid.supervisor.idleConfig.inactiveAfterMillis
inactiveAfterMillis
milliseconds.600_000
inactiveAfterMillis
inactiveAfterMillis
milliseconds.600_000
)#13311
# Improved supervisor termination
Fixed issues with delayed supervisor termination during certain transient states.
#13072
# Backoff for HttpPostEmitter
The
HttpPostEmitter
option now has a backoff. This means that there should be less noise in the logs and lower CPU usage if you use this option for logging.#12102
# DumpSegment tool for nested columns
The DumpSegment tool can now be used on nested columns with the
--dump nested
option.For more information, see dump-segment tool.
#13356
# Segment loading and balancing
# Batched segment allocation
Segment allocation on the Overlord can take some time to finish, which can cause ingestion lag while a task waits for segments to be allocated. Performing segment allocation in batches can help improve performance.
There are two new properties that affect how Druid performs segment allocation:
druid.indexer.tasklock.batchSegmentAllocation
task/action/run/time
. See batchingsegmentAllocate
actions for details.druid.indexer.tasklock.batchAllocationWaitTime
batchSegmentAllocation
is enabled.In addition to these properties, there are new metrics to track batch segment allocation. For more information, see New metrics for segment allocation.
For more information, see the following:
segmentAllocate
actions#13369
#13503
# Improved cachingCost balancer strategy
The
cachingCost
balancer strategy now behaves more similarly to cost strategy. When computing the cost of moving a segment to a server, the following calculations are performed:#13321
# Faster segment assignment
You can now use a round-robin segment strategy to speed up initial segment assignments. Set
useRoundRobinSegmentAssigment
totrue
in the Coordinator dynamic config to enable this feature.#13367
# Default to batch sampling for balancing segments
Batch sampling is now the default method for sampling segments during balancing as it performs significantly better than the alternative when there is a large number of used segments in the cluster.
As part of this change, the following have been deprecated and will be removed in future releases:
useBatchedSegmentSampler
percentOfSegmentsToConsiderPerMove
# Remove unused property
The unused coordinator property
druid.coordinator.loadqueuepeon.repeatDelay
has been removed. Use onlydruid.coordinator.loadqueuepeon.http.repeatDelay
to configure repeat delay for the HTTP-based segment loading queue.#13391
# Avoid segment over-replication
Improved the process of checking server inventory to prevent over-replication of segments during segment balancing.
#13114
# Provided service specific log4j overrides in containerized deployments
Provided an option to override log4j configs setup at the service level directories so that it works with Druid-operator based deployments.
#13020
# Various Docker improvements
gcr.io/distroless/java11-debian11
image as base by default.bash-static
to the Docker image so that scripts that require bash can be executed.3.8.4-jdk-11-slim
to3.8.6-jdk-11-slim
.amd64/busybox:1.30.0-glibc
tobusybox:1.35.0-glibc
.#13059
# Enabled cleaner JSON for various input sources and formats
Added
JsonInclude
to various properties, to avoid population of default values in serialized JSON.#13064
# Improved direct memory check on startup
Improved direct memory check on startup by providing better support for Java 9+ in
RuntimeInfo
, and clearer log messages where validation fails.#13207
# Improved the run time of the MarkAsUnusedOvershadowedSegments duty
Improved the run time of the
MarkAsUnusedOvershadowedSegments
duty by iterating over all overshadowed segments and marking segments as unused in batches.#13287
# Web console
# Delete an interval
You can now pick an interval to delete from a dropdown in the kill task dialog.
#13431
# Removed the old query view
The old query view is removed. Use the new query view with tabs.
For more information, see Web console.
#13169
# Filter column values in query results
The web console now allows you to add to existing filters for a selected column.
#13169
# Support for Kafka lookups in the web-console
Added support for Kafka-based lookups rendering and input in the web console.
#13098
# Query task status information
The web console now exposes a textual indication about running and pending tasks when a query is stuck due to lack of task slots.
#13291
# Extensions
# Extension optimization
Optimized the
compareTo
function inCompressedBigDecimal
.#13086
# CompressedBigDecimal cleanup and extension
Removed unnecessary generic type from CompressedBigDecimal, added support for number input types, added support for reading aggregator input types directly (uningested data), and fixed scaling bug in buffer aggregator.
#13048
# Support for Kubernetes discovery
Added
POD_NAME
andPOD_NAMESPACE
env variables to all Kubernetes Deployments and StatefulSets.Helm deployment is now compatible with
druid-kubernetes-extension
.#13262
# Docs
# Jupyter Notebook tutorials
We released our first Jupyter Notebook-based tutorial to learn the basics of the Druid API. Download the notebook and follow along with the tutorial to learn how to get basic cluster information, ingest data, and query data.
For more information, see Jupyter Notebook tutorials.
#13342
#13345
# Dependency updates
# Updated Kafka version
Updated the Apache Kafka core dependency to version 3.3.1.
#13176
# Docker improvements
Updated dependencies for the Druid image for Docker, including JRE 11. Docker BuildKit cache is enabled to speed up building.
#13059
# Upgrading to 25.0.0
Consider the following changes and updates when upgrading from Druid 24.0.x to 25.0.0. If you're updating from an earlier version, see the release notes of the relevant intermediate versions.
# Default HTTP-based segment discovery and task management
The default segment discovery method now uses HTTP instead of ZooKeeper.
This update changes the defaults for the following properties:
druid.serverview.type
for segment managementdruid.coordinator.loadqueuepeon.type
for segment managementdruid.indexer.runner.type
for the OverlordTo use ZooKeeper instead of HTTP, change the values for the properties back to the previous defaults. ZooKeeper-based implementations for these properties are deprecated and will be removed in a subsequent release.
#13092
# Finalizing HLL and quantiles sketch aggregates
The aggregation functions for HLL and quantiles sketches returned sketches or numbers when they are finalized depending on where they were in the native query plan.
Druid no longer finalizes aggregators in the following two cases:
This change aligns the behavior of HLL and quantiles sketches with theta sketches.
To restore old behaviour, you can set
sqlFinalizeOuterSketches=true
in the query context.#13247
# Kill tasks mark segments as unused only if specified
When you issue a kill task, Druid marks the underlying segments as unused only if explicitly specified. For more information, see the API reference
#13104
# Incompatible changes
# Upgrade curator to 5.3.0
Apache Curator upgraded to the latest version, 5.3.0. This version drops support for ZooKeeper 3.4 but Druid has already officially dropped support in 0.22. In 5.3.0, Curator has removed support for Exhibitor so all related configurations and tests have been removed.
#12939
# Fixed Parquet list conversion
The behavior of the parquet reader for lists of structured objects has been changed to be consistent with other parquet logical list conversions. The data is now fetched directly, more closely matching its expected structure. See parquet list conversion for more details.
#13294
# Credits
Thanks to everyone who contributed to this release!
@317brian
@599166320
@a2l007
@abhagraw
@abhishekagarwal87
@adarshsanjeev
@adelcast
@AlexanderSaydakov
@amaechler
@AmatyaAvadhanula
@ApoorvGuptaAi
@arvindanugula
@asdf2014
@churromorales
@clintropolis
@cloventt
@cristian-popa
@cryptoe
@dampcake
@dependabot[bot]
@didip
@ektravel
@eshengit
@findingrish
@FrankChen021
@gianm
@hnakamor
@hosswald
@imply-cheddar
@jasonk000
@jon-wei
@Junge-401
@kfaraz
@LakshSingla
@mcbrewster
@paul-rogers
@petermarshallio
@rash67
@rohangarg
@sachidananda007
@santosh-d3vpl3x
@senthilkv
@somu-imply
@techdocsmith
@tejaswini-imply
@vogievetsky
@vtlim
@wcc526
@writer-jill
@xvrl
@zachjsh
The text was updated successfully, but these errors were encountered: