Skip to content

@clintropolis clintropolis released this Jun 10, 2021

Apache Druid 0.21.1 is a bug fix release that fixes a few regressions with the 0.21 release. The first is an issue with the published Docker image, which causes containers to fail to start due to volume permission issues, described in #11166 as fixed in #11167. This release also fixes an issue caused by a bug in the upgraded Jetty version which was released in 0.21, described in #11206 and fixed in #11207. Finally, a web console regression related to field validation has been added in #11228.

# Bug fixes

#11167 fix docker volume permissions
#11207 Upgrade jetty version
#11228 Web console: Fix required field treatment
#11299 Fix permission problems in docker

# Credits

Thanks to everyone who contributed to this release!


22 people reacted
Assets 2
Jun 2, 2021
[maven-release-plugin] copy for tag druid-0.21.1-rc2
May 14, 2021
[maven-release-plugin] copy for tag druid-0.21.1-rc1

@jihoonson jihoonson released this Apr 28, 2021

Apache Druid 0.21.0 contains around 120 new features, bug fixes, performance enhancements, documentation improvements, and additional test coverage from 36 contributors. Refer to the complete list of changes and everything tagged to the milestone for further details.

# New features

# Operation

# Service discovery and leader election based on Kubernetes

The new Kubernetes extension supports service discovery and leader election based on Kubernetes. This extension works in conjunction with the HTTP-based server view (druid.serverview.type=http) and task management (druid.indexer.runner.type=httpRemote) to allow you to run a Druid cluster with zero ZooKeeper dependencies. This extension is still experimental. See Kubernetes extension for more details.


# New dynamic coordinator configuration to limit the number of segments when finding a candidate segment for segment balancing

You can set the percentOfSegmentsToConsiderPerMove to limit the number of segments considered when picking a candidate segment to move. The candidates are searched up to maxSegmentsToMove * 2 times. This new configuration prevents Druid from iterating through all available segments to speed up the segment balancing process, especially if you have lots of available segments in your cluster. See Coordinator dynamic configuration for more details.


# status and selfDiscovered endpoints for Indexers

The Indexer now supports status and selfDiscovered endpoints. See Processor information APIs for details.


# Querying

# New grouping aggregator function

You can use the new grouping aggregator SQL function with GROUPING SETS or CUBE to indicate which grouping dimensions are included in the current grouping set. See Aggregation functions for more details.


# Improved missing argument handling in expressions and functions

Expression processing now can be vectorized when inputs are missing. For example a non-existent column. When an argument is missing in an expression, Druid can now infer the proper type of result based on non-null arguments. For instance, for longColumn + nonExistentColumn, nonExistentColumn is treated as (long) 0 instead of (double) 0.0. Finally, in default null handling mode, math functions can produce output properly by treating missing arguments as zeros.


# Allow zero period for TIMESTAMPADD

TIMESTAMPADD function now allows zero period. This functionality is required for some BI tools such as Tableau.


# Ingestion

# Native parallel ingestion no longer requires explicit intervals

Parallel task no longer requires you to set explicit intervals in granularitySpec. If intervals are missing, the parallel task executes an extra step for input sampling which collects the intervals to index.


# Old Kafka version support

Druid now supports Apache Kafka older than 0.11. To read from an old version of Kafka, set the isolation.level to read_uncommitted in consumerProperties. Only have been tested up until this release. See Kafka supervisor configurations for details.


Multi-phase segment merge for native batch ingestion

A new tuningConfig, maxColumnsToMerge, controls how many segments can be merged at the same time in the task. This configuration can be useful to avoid high memory pressure during the merge. See tuningConfig for native batch ingestion for more details.


# Native re-ingestion is less memory intensive

Parallel tasks now sort segments by ID before assigning them to subtasks. This sorting minimizes the number of time chunks for each subtask to handle. As a result, each subtask is expected to use less memory, especially when a single Parallel task is issued to re-ingest segments covering a long time period.


# Web console

# Updated and improved web console styles

The new web console styles make better use of the Druid brand colors and standardize paddings and margins throughout. The icon and background colors are now derived from the Druid logo.



# Partitioning information is available in the web console

The web console now shows datasource partitioning information on the new Segment granularity and Partitioning columns.

Segment granularity column in the Datasources tab


Partitioning column in the Segments tab



# The column order in the Schema table matches the dimensionsSpec

The Schema table now reflects the dimension ordering in the dimensionsSpec.



# Metrics

# Coordinator duty runtime metrics

The coordinator performs several 'duty' tasks. For example segment balancing, loading new segments, etc. Now there are two new metrics to help you analyze how fast the Coordinator is executing these duties.

  • coordinator/time: the time for an individual duty to execute
  • coordinator/global/time: the time for the whole duties runnable to execute


# Query timeout metric

A new metric provides the number of timed out queries. Previously timed out queries were treated as interrupted and included in the query/interrupted/count (see Changed HTTP status codes for query errors for more details).

query/timeout/count: the number of timed out queries during the emission period


# Shuffle metrics for batch ingestion

Two new metrics provide shuffle statistics for MiddleManagers and Indexers. These metrics have the supervisorTaskId as their dimension.

  • ingest/shuffle/bytes: number of bytes shuffled per emission period
  • ingest/shuffle/requests: number of shuffle requests per emission period

To enable the shuffle metrics, add org.apache.druid.indexing.worker.shuffle.ShuffleMonitor in druid.monitoring.monitors. See Shuffle metrics for more details.


# New clock-drift safe metrics monitor scheduler

The default metrics monitor scheduler is implemented based on ScheduledThreadPoolExecutor which is prone to unbounded clock drift. A new monitor scheduler, ClockDriftSafeMonitorScheduler, overcomes this limitation. To use the new scheduler, set druid.monitoring.schedulerClassName to in the file.


# Others

# New extension for a password provider based on AWS RDS token

A new PasswordProvider type allows access to AWS RDS DB instances using temporary AWS tokens. This extension can be useful when an RDS is used as Druid's metadata store. See AWS RDS extension for more details.


# The sys.servers table shows leaders

A new long-typed column is_leader in the sys.servers table indicates whether or not the server is the leader.


# druid-influxdb-emitter extension supports the HTTPS protocol

See Influxdb emitter extension for new configurations.


# Docker

# Small docker image

The docker image size is reduced by half by eliminating unnecessary duplication.


# Development

# Extensible Kafka consumer properties via a new DynamicConfigProvider

A new class DynamicConfigProvider enables fetching consumer properties at runtime. For instance, you can use DynamicConfigProvider fetch bootstrap.servers from location such as a local environment variable if it is not static. Currently, only a map-based config provider is supported by default. See DynamicConfigProvider for how to implement a custom config provider.


# Bug fixes

Druid 0.21.0 contains 30 bug fixes, you can see the complete list here.

# Post-aggregator computation with subtotals

Before 0.21.0, the query fails with an error when you use post aggregators with sub-totals. Now this bug is fixed and you can use post aggregators with subtotals.


# Indexers announce themselves as segment servers

In 0.19.0 and 0.20.0, Indexers could not process queries against streaming data as they did not announce themselves as segment servers. They are fixed to announce themselves properly in 0.21.0.


# Validity check for segment files in historicals

Historicals now perform validity check after they download segment files and re-download automatically if those files are crashed.


# StorageLocationSelectorStrategy injection failure is fixed

The injection failure while reading the configurations of StorageLocationSelectorStrategy is fixed.


# Upgrading to 0.21.0

Consider the following changes and updates when upgrading from Druid 0.20.0 to 0.21.0. If you're updating from an earlier version than 0.20.0, see the release notes of the relevant intermediate versions.

# Improved HTTP status codes for query errors

Before this release, Druid returned the "internal error (500)" for most of the query errors. Now Druid returns different error codes based on their cause. The following table lists the errors and their corresponding codes that has changed:

Exception Description Old code New code
SqlParseException and ValidationException from Calcite Query planning failed 500 400
QueryTimeoutException Query execution didn't finish in timeout 500 504
ResourceLimitExceededException Query asked more resources than configured threshold 500 400
InsufficientResourceException Query failed to schedule because of lack of merge buffers available at the time when it was submitted 500 429, merged to QueryCapacityExceededException
QueryUnsupportedException Unsupported functionality 400 501

There is also a new query metric for query timeout errors. See New query timeout metric for more details.


# Query interrupted metric

query/interrupted/count no longer counts the queries that timed out. These queries are counted by query/timeout/count.

# context dimension in query metrics

context is now a default dimension emitted for all query metrics. context is a JSON-formatted string containing the query context for the query that the emitted metric refers to. The addition of a dimension that was not previously alters some metrics emitted by Druid. You should plan to handle this new context dimension in your metrics pipeline. Since the dimension is a JSON-formatted string, a common solution is to parse the dimension and either flatten it or extract the bits you want and discard the full JSON-formatted string blob.


# Deprecated support for Apache ZooKeeper 3.4

As ZooKeeper 3.4 has been end-of-life for a while, support for ZooKeeper 3.4 is deprecated in 0.21.0 and will be removed in the near future.


# Consistent serialization format and column naming convention for the sys.segments table

All columns in the sys.segments table are now serialized in the JSON format to make them consistent with other system tables. Column names now use the same "snake case" convention.


# Known issues

# Known security vulnerability in the Thrift library

The Thrift extension can be useful for ingesting files of the Thrift format into Druid. However, there is a known security vulnerability in the version of the Thrift library that Druid uses. The vulerability can be exploitable by ingesting maliciously crafted Thrift files when you use Indexers. We recommend granting the DATASOURCE WRITE permission to only trusted users.

# Permission issues in running the docker-based Druid cluster

If you run the Druid docker cluster for the first time in your machine, using the 0.21.0 image can create internal directories with the root account. As a result, Druid services can fail due lack of permissions. This issue is filed in #11166.

If you are using docker compose, you can use the below commands to work around this issue. These commands will create internal directories first using an old image and then start services using the 0.21.0 image.

$ cd ${PREV_SRC_DIR}
$ docker-compose -f distribution/docker/docker-compose.yml create
$ cd ${0.21.0_SRC_DIR}
$ docker-compose -f distribution/docker/docker-compose.yml up

If you are not using docker compose, you can directly pass the volume parameter for /opt/druid/var when you start services using the 0.21.0 image. For example, you can run the command below to start the coordinator service.

$ docker run -v /path/to/host/dir:/opt/druid/var apache/druid:0.21.0 coordinator

For a full list of open issues, please see

# Credits

Thanks to everyone who contributed to this release!


Assets 2
Apr 16, 2021
[maven-release-plugin] copy for tag druid-0.21.0-rc1

@jihoonson jihoonson released this Mar 29, 2021

Apache Druid 0.20.2 introduces new configurations to address CVE-2021-26919: Authenticated users can execute arbitrary code from malicious MySQL database systems. Users are recommended to enable new configurations in the below to mitigate vulnerable JDBC connection properties. These configurations will be applied to all JDBC connections for ingestion and lookups, but not for metadata store. See security configurations for more details.

  • druid.access.jdbc.enforceAllowedProperties: When true, Druid applies druid.access.jdbc.allowedProperties to JDBC connections starting with jdbc:postgresql: or jdbc:mysql:. When false, Druid allows any kind of JDBC connections without JDBC property validation. This config is set to false by default to not break rolling upgrade. This config is deprecated now and can be removed in a future release. The allow list will be always enforced in that case.
  • druid.access.jdbc.allowedProperties: Defines a list of allowed JDBC properties. Druid always enforces the list for all JDBC connections starting with jdbc:postgresql: or jdbc:mysql: if druid.access.jdbc.enforceAllowedProperties is set to true. This option is tested against MySQL connector 5.1.48 and PostgreSQL connector 42.2.14. Other connector versions might not work.
  • druid.access.jdbc.allowUnknownJdbcUrlFormat: When false, Druid only accepts JDBC connections starting with jdbc:postgresql: or jdbc:mysql:. When true, Druid allows JDBC connections to any kind of database, but only enforces druid.access.jdbc.allowedProperties for PostgreSQL and MySQL.
Assets 2
Mar 25, 2021
[maven-release-plugin] copy for tag druid-0.20.2-rc1

@jihoonson jihoonson released this Jan 29, 2021

Apache Druid 0.20.1 is a bug fix release that addresses CVE-2021-25646: Authenticated users can override system configurations in their requests which allows them to execute arbitrary code.

# Known issues

# Incorrect Druid version in docker-compose.yml

The Druid version is specified as 0.20.0 in the docker-compose.yml file. We recommend to update the version to 0.20.1 before you run a Druid cluster using docker compose.

Assets 2
Jan 27, 2021
[maven-release-plugin] copy for tag druid-0.20.1-rc1

@jon-wei jon-wei released this Oct 17, 2020

Apache Druid 0.20.0 contains around 160 new features, bug fixes, performance enhancements, documentation improvements, and additional test coverage from 36 contributors. Refer to the complete list of changes and everything tagged to the milestone for further details.

# New Features

# Ingestion

# Combining InputSource

A new combining InputSource has been added, allowing the user to combine multiple input sources during ingestion. Please see for more details.


# Automatically determine numShards for parallel ingestion hash partitioning

When hash partitioning is used in parallel batch ingestion, it is no longer necessary to specify numShards in the partition spec. Druid can now automatically determine a number of shards by scanning the data in a new ingestion phase that determines the cardinalities of the partitioning key.


# Subtask file count limits for parallel batch ingestion

The size-based splitHintSpec now supports a new maxNumFiles parameter, which limits how many files can be assigned to individual subtasks in parallel batch ingestion.

The segment-based splitHintSpec used for reingesting data from existing Druid segments also has a new maxNumSegments parameter which functions similarly.

Please see for more details.


# Task slot usage metrics

New task slot usage metrics have been added. Please see the entries for the taskSlot metrics at for more details.


# Compaction

# Support for all partitioning schemes for auto-compaction

A partitioning spec can now be defined for auto-compaction, allowing users to repartition their data at compaction time. Please see the documentation for the new partitionsSpec property in the compaction tuningConfig for more details:


# Auto-compaction status API

A new coordinator API which shows the status of auto-compaction for a datasource has been added. The new API shows whether auto-compaction is enabled for a datasource, and a summary of how far compaction has progressed.

The web console has also been updated to show this information:

Please see for details on the new API, and for information on new related compaction metrics.


# Querying

# Query segment pruning with hash partitioning

Druid now supports query-time segment pruning (excluding certain segments as read candidates for a query) for hash partitioned segments. This optimization applies when all of the partitionDimensions specified in the hash partition spec during ingestion time are present in the filter set of a query, and the filters in the query filter on discrete values of the partitionDimensions (e.g., selector filters). Segment pruning with hash partitioning is not supported with non-discrete filters such as bound filters.

For existing users with existing segments, you will need to reingest those segments to take advantage of this new feature, as the segment pruning requires a partitionFunction to be stored together with the segments, which does not exist in segments created by older versions of Druid. It is not necessary to specify the partitionFunction explicitly, as the default is the same partition function that was used in prior versions of Druid.

Note that segments created with a default partitionDimensions value (partition by all dimensions + the time column) cannot be pruned in this manner, the segments need to be created with an explicit partitionDimensions.


# Vectorization

To enable vectorization features, please set the druid.query.default.context.vectorizeVirtualColumns property to true or set the vectorize property in the query context. Please see for more information.

# Vectorization support for expression virtual columns

Expression virtual columns now have vectorization support (depending on the expressions being used), which an results in a 3-5x performance improvement in some cases.

Please see for details on the specific expressions that support vectorization.


# More vectorization support for aggregators

Vectorization support has been added for several aggregation types: numeric min/max aggregators, variance aggregators, ANY aggregators, and aggregators from the druid-histogram extension.

#10260 - numeric min/max
#10304 - histogram
#10338 - ANY
#10390 - variance

We've observed about a 1.3x to 1.8x performance improvement in some cases with vectorization enabled for the min, max, and ANY aggregator, and about 1.04x to 1.07x wuth the histogram aggregator.

# offset parameter for GroupBy and Scan queries

It is now possible set an offset parameter for GroupBy and Scan queries, which tells Druid to skip a number of rows when returning results. Please see and for details.


# OFFSET clause for SQL queries

Druid SQL queries now support an OFFSET clause. Please see for details.


# Substring search operators

Druid has added new substring search operators in its expression language and for SQL queries.

Please see documentation for CONTAINS_STRING and ICONTAINS_STRING string functions for Druid SQL ( and documentation for contains_string and icontains_string for the Druid expression language (

We've observed about a 2.5x performance improvement in some cases by using these functions instead of STRPOS.


# UNION ALL operator for SQL queries

Druid SQL queries now support the UNION ALL operator, which fuses the results of multiple queries together. Please see for details on what query shapes are supported by this operator.


# Cluster-wide default query context settings

It is now possible to set cluster-wide default query context properties by adding a configuration of the form druid.query.override.default.context.*, with * replaced by the property name.


# Other features

# Improved retention rules UI

The retention rules UI in the web console has been improved. It now provides suggestions and basic validation in the period dropdown, shows the cluster default rules, and makes editing the default rules more accessible.


# Redis cache extension enhancements

The Redis cache extension now supports Redis Cluster, selecting which database is used, connecting to password-protected servers, and period-style configurations for the expiration and timeout properties.


# Disable sending server version in response headers

It is now possible to disable sending of server version information in Druid's response headers.

This is controlled by a new property druid.server.http.sendServerVersion, which defaults to true.


# Specify byte-based configuration properties with units

Druid now supports units for specifying byte-based configuration properties, e.g.:


equivalent to


Please see for more details.


# Bug fixes

# Fix query correctness issue when historical has no segment timeline

Druid 0.20.0 fixes a query correctness issue when a broker issues a query expecting a historical to have certain segments for a datasource, but the historical when queried does not actually have any segments for that datasource (e.g., they were all unloaded before the historical processed the query). Prior to 0.20.0, the query would return successfully but without the results from the segments that were missing in the manner described previously. In 0.20.0, queries will now fail in such situations.


# Fix issue preventing result-level cache from being populated

Druid 0.20.0 fixes an issue introduced in 0.19.0 (#10337) which can prevent query caches from being populated when result-level caching is enabled.


# Fix for variance aggregator ordering

The variance aggregator previously used an incorrect comparator that compared using an aggregator's internal count variable instead of the variance.


# Fix incorrect caching for groupBy queries with limit specs

Druid 0.20.0 fixes an issues with groupBy queries and caching, where the limitSpec of the query was not considered in the cache key, leading to potentially incorrect results if queries that are identical except for the limitSpec are issued.


# Fix for stringFirst and stringLast with rollup enabled

#7243 has been resolved, the stringFirst and stringLast aggregators no longer cause an exception when used during ingestion with rollup enabled.


# Upgrading to Druid 0.20.0

Please be aware of the following considerations when upgrading from 0.19.0 to 0.20.0. If you're updating from an earlier version than 0.19.0, please see the release notes of the relevant intermediate versions.

# Default maxSize

druid.server.maxSize will now default to the sum of maxSize values defined within the druid.segmentCache.locations. The user can still provide a custom value for druid.server.maxSize which will take precedence over the default value.


# Compaction and kill task ID changes

Compaction and kill tasks issued by the coordinator will now have their task IDs prefixed by coordinator-issued, while user-issued kill tasks will be prefixed by api-issued.


# New size limits for parallel ingestion split hint specs

The size-based and segment-based splitHintSpec for parallel batch ingestion now apply a default file/segment limit of 1000 per subtask, controlled by the maxNumFiles and maxNumSegments respectively.


# New PostAggregator and AggregatorFactory methods

Users who have developed an extension with custom PostAggregator or AggregatorFactory implementions will need to update their extensions, as these two interfaces have new methods defined in 0.20.0.

PostAggregator now has a new method:

  ValueType getType();

To support type information on PostAggregator, AggregatorFactory also has 2 new methods:

  public abstract ValueType getType();

  public abstract ValueType getFinalizedType();

Please see #9638 for more details on the interface changes.

# New Expr-related methods

Users who have developed an extension with custom Expr implementions will need to update their extensions, as Expr and related interfaces hae changed in 0.20.0. Please see the PR below for details:


# More accurate query/cpu/time metric

In 0.20.0, the accuracy of the query/cpu/time metric has been improved. Previously, it did not account for certain portions of work during query processing, described in more detail in the following PR:


# New audit log service metric columns

If you are using audit logging, please be aware that new columns have been added to the audit log service metric (comment, remote_address, and created_date). An optional payload column has also been added, which can be enabled by setting druid.audit.manager.includePayloadAsDimensionInMetric to true.


# sqlQueryContext in request logs

If you are using query request logging, the request log events will now include the sqlQueryContext for SQL queries.


# Additional per-segment state in metadata store

Hash-partitioned segments created by Druid 0.20.0 will now have additional partitionFunction data in the metadata store.

Additionally, compaction tasks will now store additional per-segment information in the metadata store, used for tracking compaction history.


# Known issues

# druid.segmentCache.locationSelectorStrategy injection failure

Specifying a value for druid.segmentCache.locationSelectorStrategy prevents services from starting due to an injection error. Please see #10348 for more details.

# Resource leak in web console data sampler

When a timeout occurs while sampling data in the web console, internal resources created to read from the input source are not properly closed. Please see #10467 for more information.

# Credits

Thanks to everyone who contributed to this release!


Assets 2