@jon-wei jon-wei released this Aug 23, 2017 · 788 commits to master since this release

Assets 2

Druid 0.10.1 contains hundreds of performance improvements, stability improvements, and bug fixes from over 40 contributors. Major new features include:

  • Large performance improvements and additional query metrics for TopN queries
  • The ability to push down limit clauses for GroupBy queries
  • More accurate query timeout handling
  • Hadoop indexing support for the Amazon S3A filesystem
  • Support for ingesting Protobuf data
  • A new Firehose that can read input via HTTP
  • Improved disk space management when indexing from cloud stores
  • Various improvements to coordinator lookups management
  • A new Kafka metrics emitter
  • A new dimension comparison filter
  • Various improvements to Druid SQL

If you are upgrading from a previous version of Druid, please see "Updating from 0.10.0 and earlier" below for upgrade notes, including some backwards incompatible changes.

The full list of changes is here: https://github.com/druid-io/druid/pulls?utf8=%E2%9C%93&q=is%3Apr%20is%3Aclosed%20milestone%3A0.10.1

Documentation for this release is at: http://druid.io/docs/0.10.1/


TopN performance improvements

Processing for TopN queries with 1-2 aggregators on historical nodes is now 2-4 times faster. This is accomplished with new runtime inspection logic that generates monomorphic implementations of query processing classes, reducing polymorphism in the TopN query execution path.

Added in a series of PRs described here by @leventov: #3798.

Limit clause push down for GroupBy

Druid can now optimize limit clauses in GroupBy queries by distributing the limit application to historical/realtime nodes, applying the limit to partial result sets before they are sent to the broker for merging. This reduces network traffic within the cluster and reduces the merging workload on broker nodes. Please refer to http://druid.io/docs/0.10.1/querying/groupbyquery.html#query-context for more information.

Added in #3873 by @jon-wei.

Hadoop indexing support for Amazon S3A

Amazon's S3A filesystem is now supported for deep storage and as an input source for batch ingestion tasks. Please refer to <> for documentation.

Added in #4116 by @b-slim.

Protobuf 3.0 support and other enhancements

Support for ingesting Protobuf 3.0 data has been added, along with other enhancements such as reading Protobuf descriptors from a URL. Protobuf-supporting code has been moved into its own core extension as well. See http://druid.io/docs/0.10.1/development/extensions-core/protobuf.html for documentation.

Added in #4039 by @knoguchi.

HTTP Firehose

A new Firehose for realtime ingestion that reads data from a list of URLs via HTTP has been added. Please see http://druid.io/docs/latest/ingestion/firehose.html#httpfirehose for documentation.

Added in #4297 by @jihoonson.

Improved disk space management for realtime indexing from cloud stores

The Firehose implementations for Microsoft Azure, Rackspace Cloud Files, Google Cloud Storage, and Amazon S3 now support caching and prefetching of data. These firehoses can now operate on portions of the input data and pull new data as needed, instead of having to fully read the firehose's input to disk.

Please refer to the following links for documentation:

Added in #4193 by @jihoonson.

Improvements to coordinator lookups management

Several enhancements have been made to the state management/synchronization logic for query-time lookups, including versioning of lookup specs. Please see http://druid.io/docs/0.10.1/querying/lookups.html for documentation.

Added in #3855 by @himanshug.

Kafka metrics emitter

A new metrics emitter that sends metrics data to Kafka in JSON format has been added. See http://druid.io/docs/0.10.1/development/extensions-contrib/kafka-emitter.html

Added in #3860 by @dkhwangbo.

Column comparison filter

A new column comparison filter has been added. This filter allows the user to compare values across columns within a row, like a "WHERE columnA = columnB" clause in SQL. See http://druid.io/docs/0.10.1/querying/filters.html#column-comparison-filter for documentation.

Added in #3928 by @erikdubbelboer.

Druid SQL improvements

Druid 0.10.1 has a number of enhancements to Druid SQL, such as support for lookups (PRs by @gianm):

#4368 - More forgiving Avatica server
#4109 - Support for another form of filtered aggregator
#4085 - Rule to collapse sort chains
#4055 - Add SQL REGEXP_EXTRACT function
#3991 - Make row extractions extensible and add one for lookups
#4028 - Support for coercing to DECIMAL
#3999 - Ability to generate exact distinct count queries

Other performance improvements

Druid 0.10.1 has a number of other performance improvements, including:

#4364 - Uncompress streams without having to download to tmp first, by @niketh
#4315 - Server selector improvement, by @dgolitsyn
#4110 - Remove "granularity" from IngestSegmentFirehose, by @gianm
#4038 - serialize DateTime As Long to improve json serde performance, by @kaijianding

And much more!

The full list of changes is here: https://github.com/druid-io/druid/pulls?utf8=%E2%9C%93&q=is%3Apr%20is%3Aclosed%20milestone%3A0.10.1

Updating from 0.10.0 and earlier

Please see below for changes between 0.10.0 and 0.10.1 that you should be aware of before upgrading. If you're updating from an earlier version than 0.10.0, please see release notes of the relevant intermediate versions for additional notes.

Deprecation of support for Hadoop versions < 2.6.0

To add support for Amazon's S3A filesystem, Druid is now built against Hadoop 2.7.3 libraries, and we are deprecating support for Hadoop versions older than 2.6.0.

For users running a Hadoop version older than 2.6.0, it is possible to continue running Druid 0.10.1 with the older Hadoop version using a workaround.

The user would need to downgrade hadoop.compile.version in the main Druid pom.xml, remove the hadoop-aws dependency from pom.xml in the druid-hdfs-storage core extension, and then rebuild Druid.

Users are strongly encouraged to upgrade their Hadoop clusters to a 2.6.0+ version as of this release, as support for Hadoop <2.6.0 may be dropped completely in future releases.

If users wish to use Hadoop 2.7.3 as default for ingestion tasks, users should double check any existing druid.indexer.task.defaultHadoopCoordinates configurations.

Kafka Broker Changes

Due to changes from #4115, the Kafka indexing service is no longer compatible with version 0.9.x Kafka brokers. Users will need to upgrade their Kafka brokers to an 0.10.x version.

Coordinator Lookup Management Changes

#3855 introduces various improvements to coordinator lookup propagation behavior. Please see http://druid.io/docs/0.10.1/querying/lookups.html for details. Note the changes to coordinator HTTP API regarding lookups management.

If Lookups are being used in prior deployment, then as part of upgrade to 0.10.1, All coordinators should be stopped, upgraded, and then started with version 0.10.1 at one time rather than upgrading them one at a time. There should never be a situation where one coordinator is running 0.10.0 while other coordinator is running 0.10.1 at the same time.

During the course of the cluster upgrade, lookup query nodes will report an error starting with got notice to load lookup [LookupExtractorFactoryContainer{version='null'. This is not actually an error and is a side effect of the update. See #4603 for details.

Off-heap query-time lookup cache

Please note that the off-heap query-time lookup cache is broken at this time because of an excessive memory use issue, and must not be used:

Default worker select strategy

Please note that the default worker select strategy has changed from fillCapacity to equalDistribution.

Rolling updates

The standard Druid update process described by http://druid.io/docs/0.10.1/operations/rolling-updates.html should be followed for rolling updates.


Thanks to everyone who contributed to this release!