druid-0.11.0
Druid 0.11.0 contains over a hundred performance improvements, stability improvements, and bug fixes from almost 40 contributors. This release adds two major security features, TLS support and extension points for authentication and authorization.
Major new features include:
- TLS (a.k.a. SSL) support
- Extension points for authentication and authorization
- Double columns support
- cachingCost Balancer Strategy
- jq expression support in JSON parser
- Redis cache extension
- GroupBy performance improvements
- Various improvements to Druid SQL
The full list of changes is here: https://github.com/druid-io/druid/pulls?utf8=%E2%9C%93&q=is%3Apr%20is%3Aclosed%20milestone%3A0.11.0
Documentation for this release is at: http://druid.io/docs/0.11.0/
Highlights
TLS support
Druid now supports TLS, enabling encrypted client and inter-node communications. Please see http://druid.io/docs/0.11.0/operations/tls-support.html for details on configuration and related extensions.
Authentication/authorization extension points
Extension points for authenticating and authorizing requests have been added to Druid. Please see http://druid.io/docs/0.11.0/configuration/auth.html for information on configuration and extension implementation.
The existing Kerberos authentication extension has been updated to implement the new Authenticator interface, please see the "Kerberos configuration changes" section under "Updating from 0.10.1 and earlier" for more information if you are using the Kerberos extension.
Double columns support
Druid now supports Double type aggregator columns. Please see http://druid.io/docs/0.11.0/querying/aggregations.html for documentation on the new Double aggregators.
cachingCost Balancer Strategy
Users upgrading to 0.11.0 are encouraged to try the new cachingCost
segment balancing strategy on their coordinators. This strategy offers large performance improvements over the existing cost balancer strategy, and it is planned to become the default strategy in the release following 0.11.0.
This strategy can be selected by setting the following property on coordinators:
druid.coordinator.balancer.strategy=cachingCost
Added by @dgolitsyn in #4731
jq expression support in JSON parser
Druid's JSON input parser now supports jq expressions using jackson-jq, enabling more input transforms before ingestion. Please see http://druid.io/docs/0.11.0/ingestion/flatten-json.html for more details.
Redis cache extension
A new cache implementation using Redis has been added in an extension, added by @QiuMM in #4615. Please refer to the preceding pull request for more details.
GroupBy performance improvements
Several new performance optimizations have been added to the GroupBy query by @jihoonson in the following PRs:
#4660 Parallel sort for ConcurrentGrouper
#4576 Array-based aggregation for groupBy query
#4668 Add IntGrouper to avoid unnecessary boxing/unboxing in array-based aggregation
PR #4660 offers a general improvement by parallelizing partial result sorting, while PR #4576 and #4668 offer significant improvements when grouping on a single String column.
SQL improvements
Various improvements and features have been added to Druid SQL, by @gianm in the following PRs:
#4750 - TRIM support
#4720 - Rounding for count distinct
#4561 - Metrics for SQL queries
#4360 - SQL expressions support
And much more!
The full list of changes is here: https://github.com/druid-io/druid/pulls?utf8=%E2%9C%93&q=is%3Apr%20is%3Aclosed%20milestone%3A0.11.0
Updating from 0.10.1 and earlier
Please see below for changes between 0.10.1 and 0.11.0 that you should be aware of before upgrading. If you're updating from an earlier version than 0.10.1, please see release notes of the relevant intermediate versions for additional notes.
Upgrading coordinators and overlords
The following patch changes the way coordinator->overlord redirects are handled:
#5037
The overlord leader election algorithm has changed in 0.11.0: #4699.
As a result of the two patches above, special care is needed when upgrading Coordinator or Overlord to 0.11.0. All coordinators and overlords must be shut down and upgraded together.
For example, to upgrade Coordinators, you would shutdown all coordinators, upgrade them to 0.11.0 and then start them. Overlords should be upgraded in a similar way.
During the upgrade process, there must not be any time period where a non-0.11.0 coordinator or overlord is running simultaneously with an 0.11.0 coordinator or overlord.
Note that at least one overlord should be brought up as quickly as possible after shutting them all down so that peons, tranquility etc continue to work after some retries.
Also note that the druid.zk.paths.indexer.leaderLatchPath
property is no longer used now.
Service name changes
In earlier versions of Druid, /
characters in service names defined by druid.service
would be replaced by :
characters because these service names were used in Zookeeper paths. Druid 0.11.0 no longer performs these character replacements.
Example:1 - if the old configuration had a broker with service name test/broker
:
druid.service=test/broker
and a Router was configured assuming that /
will be replaced with :
in the broker service name,
druid.router.tierToBrokerMap={"hot":"test:broker","_default_tier":"test:broker"}
the Router configuration should be updated to remove that assumption:
druid.router.tierToBrokerMap={"hot":"test/broker","_default_tier":"test/broker"}
Example:2 - If the old configuration had overlord with service Name test/overlord
then value of druid.coordinator.asOverlord.overlordService
or druid.selectors.indexing.serviceName
should be test/overlord
and not test:overlord
Example:3 - If the old configuration had overlord with service Name test:overlord
then value of druid.coordinator.asOverlord.overlordService
or druid.selectors.indexing.serviceName
should be test:overlord
and not test/overlord
Following service name-related configurations are also affected and should be updated to exactly match the value of druid.service
property on other node being discovered.
druid.coordinator.asOverlord.overlordService
druid.selectors.coordinator.serviceName
druid.selectors.indexing.serviceName
druid.router.defaultBrokerServiceName
druid.router.coordinatorServiceName
druid.router.tierToBrokerMap
Please see #4992 for more details.
Kerberos configuration changes
The Kerberos authentication configuration format has changed as a result of the new interfaces introduced by #4271. Please refer to http://druid.io/docs/0.11.0/development/extensions-core/druid-kerberos.html for the new configuration properties.
Users can point the Kerberos authenticator's authorizerName to an instance of an "allowAll" authorizer to replicate the pre-0.11.0 behavior of a cluster using Kerberos authentication with no authorization.
Lookups API path changes
The paths for the lookups configuration API have changed due to #5058.
Configuration paths that had the form /druid/coordinator/v1/lookups
now have the form /druid/coordinator/v1/lookups/config
.
Please see http://druid.io/docs/0.11.0/querying/lookups.html for the current API.
Migrating to Double columns
Prior to 0.11.0, the Double* aggregators would store column values on disk as Float while performing aggregations using Double representations.
PR #4491 allows the Double aggregators to store column values on disk as Doubles. Due to concerns related to rolling updates and version downgrades, this behavior is disabled by default and Druid will continue to store Double aggregators on disk as floats.
To enable Double column storage, set the following property in the common runtime properties:
druid.indexing.doubleStorage=double
Users should not set this property during an initial rolling upgrade to 0.11.0, as any nodes running pre-0.11.0 Druid will not be able to handle Double columns created during the upgrade period. Users will also need to reindex any segments with Double columns if downgrading from 0.11.0 to an older version. Please see #4944 and #4605 for more information.
Scan query changes
The Scan query has been moved from extensions-contrib to core Druid. As part of this migration: #4751, the scan query's handling of the time column has changed.
The time column is now is returned as "__time" rather than "timestamp", it is no longer included if you do not specifically ask for it in your "columns", and it is returned as a long rather than a string.
Users can revert the Scan query's time handling to the legacy extension behavior by setting "legacy" : true
in their queries, or setting the property druid.query.scan.legacy = true
. This is meant to provide a migration path for users that were formerly using the contrib extension.
Extension Interface Changes
Aggregator double column support
The Aggregator interface has gained a getDouble()
method, which defaults to casting the result of getFloat()
. The getDouble()
method should be re-implemented for any custom aggregators that can support doubles.
See #4595 for more details.
QueryRunner interface change
The QueryRunner interface has changed and the old run() method has been removed, replaced by a new method that accepts a QueryPlus object.
Custom query extensions will need to implement the new interface.
Please see #4184 and #4482 for more details.
Filter interface change
The Filter.getBitmapResult() method no longer has a default implementation: #4481
Custom filter extensions will need to provide an implementation for getBitmapResult() now.
Other Notes
jvm/gc/time metric
The jvm/gc/time
metric is no longer emitted, replaced by a new metric named jvm/gc/cpu
for the reasons described here: #4480
Default worker select strategy
Please note that the default worker select strategy has changed from fillCapacity
to equalDistribution
. This change was introduced in 0.10.1, the previous release, but was not mentioned in the 0.10.1 release notes, so it is called out again here.
V8 segment creation removed
Druid will now always build V9 segments, creating V8 segments is no longer supported and the buildV9Directly
property for ingestion tasks has been removed.
Please see #4420 for more details.
LogLevelAdjuster removed
Please note that the LogLevelAdjuster has been removed: #4236
Any user using mbeans to configure log levels should configure log4j2 using jmx instead.
Credits
Thanks to everyone who contributed to this release!
@a2l007
@akashdw
@Andy256
@asifmansoora
@b-slim
@benvogan
@blugowski
@chrisgavin
@dclim
@dgolitsyn
@drcrallen
@egor-ryashin
@erikdubbelboer
@Fokko
@fuji-151a
@gaodayue
@gianm
@ginoledesma
@himanshug
@hzy001
@jihoonson
@jon-wei
@kevinconaway
@knoguchi
@leiwangx
@leventov
@michalmisiewicz
@niketh
@pjain1
@praveev
@QiuMM
@scan-the-automator
@solimant
@SpotXPeterCunningham
@tkyaw
@wywlds
@xanec
@yuusaku-t
@zhangxinyu1