Druid 0.9.2 release notes #3503

gianm · 2016-09-23T21:53:25Z

DRAFT

Druid 0.9.2 contains hundreds of performance improvements, stability improvements, and bug fixes from over 30 contributors. Major new features include a new groupBy engine, ability to disable rollup at ingestion time, ability to filter on longs, new encoding options for long-typed columns, performance improvements for HyperUnique and DataSketches, a query cache implementation based on Caffeine, a new lookup extension exposing fine grained caching strategies, support for reading ORC files, and new aggregators for variance and standard deviation.

The full list of changes is here: https://github.com/druid-io/druid/pulls?utf8=%E2%9C%93&q=is%3Apr%20is%3Aclosed%20milestone%3A0.9.2

Documentation for this release is here: http://druid.io/docs/0.9.2/

Highlights

New groupBy engine

Druid now includes a new groupBy engine, rewritten from the ground up for better performance and memory management. Benchmarks show a 2–5x performance boost on our test datasets. The new engine also supports strict limits on memory usage and the option to spill to disk when memory is exhausted, avoiding result row count limitations and potential OOMEs generated by the previous engine.

The new engine is off by default, but you can enable it through configuration or query context parameters. We intend to enable it by default in a future version of Druid.

See "implementation details" on http://druid.io/docs/0.9.2/querying/groupbyquery.html#implementation-details for documentation and configuration.

Added in #2998 by @gianm.

Ability to disable rollup

Since its inception, Druid has had a concept of "dimensions" and "metrics" that applied both at ingestion time and at query time. Druid is unique in that it is one of the only databases that supports aggregation at data loading time, which we call "rollup". But, for some use cases, ingestion-time rollup is not desired, and it's better to load the original data as-is. With rollup disabled, one row in Druid will be created for each input row.

Query-time aggregation is, of course, still supported through the groupBy, topN, and timeseries queries.

See the "rollup" flag on http://druid.io/docs/0.9.2/ingestion/index.html for documentation. By default, rollup remains enabled.

Added in #3020 by @kaijianding.

Ability to filter on longs

Druid now supports sophisticated filtering on integer-typed columns, including long metrics and the special __time column. This opens up a number of new capabilities:

Filtered aggregations on time, useful for time comparison queries using two filtered aggregators and a post-aggregator. This can also be used for retention analysis with theta sketches. Examples here: http://druid.io/docs/0.9.2/development/extensions-core/datasketches-aggregators.html#retention-analysis-example
Filtering on integer-typed columns, which is especially useful when rollup is disabled using the new rollup-disabling flag.

Druid does not yet support grouping on longs. We intend to add this capability in a future release.

Added in #3180 by @jon-wei.

New long encodings

Until now, all integer-typed columns in Druid, including long metrics and the special __time column, were stored as 64-bit longs optionally compressed in blocks with LZ4. Druid 0.9.2 adds new encoding options which, in many cases, can reduce file sizes and improve performance:

Long encoding option "auto", which potentially uses table or delta encoding to use fewer than 64 bits per row. The "longs" encoding option is the default behavior, which always uses 64 bits.
Compression option "none", which is like the old "uncompressed" option, except it offers a speedup by bypassing block copying.

The default remains "longs" encoding + "lz4" compression. In our testing, two options that often yield useful benefits are "auto" + "lz4" (generally smaller than longs + lz4) and "auto" + "none" (generally faster than longs + lz4, file size impact varies). See the PR for full test results.

See "metricCompression" and "longEncoding" on http://druid.io/docs/0.9.2/ingestion/batch-ingestion.html for documentation.

Added in #3148 by @acslk.

Sketch performance improvements

DataSketches speedups of up to 80% from datasketches perf: SketchAggregatorFactory.combine(..) returns Union object now #3471.
HyperUnique speedups of 19–30% from HLL: Avoid some allocations when possible. #3314, used for "hyperUnique" and "cardinality" aggregators.

New extensions

druid-caffeine-cache from Caffeine cache extension #3028 by @drcrallen.
druid-lookups-cached-single from [QTL]Cached lookup module for JDBC connectors. #2819 by @b-slim.
druid-orc-extensions from Hadoop InputRowParser for Orc file #3019 by @sirpkt.
druid-stats from Support variance and standard deviation #2525 by @navis.

And much more!

The full list of changes is here: https://github.com/druid-io/druid/pulls?utf8=%E2%9C%93&q=is%3Apr%20is%3Aclosed%20milestone%3A0.9.2

Updating from 0.9.1.1

Rolling updates

The standard Druid update process described by http://druid.io/docs/0.9.2/operations/rolling-updates.html should be followed for rolling updates.

Query time lookups

The druid-namespace-lookup extension, which was deprecated in 0.9.1 in favor of druid-lookups-cached-global, has been removed in 0.9.2. If you are using druid-namespace-lookup, migrate to druid-lookups-cached-global before upgrading to 0.9.2. See our migration guide for details: http://druid.io/docs/0.9.1.1/development/extensions-core/namespaced-lookup.html#transitioning-to-lookups-cached-global

Other notes

Please note the following changes:

Druid now ships Guice 4.1.0 rather than 4.0-beta (Update to guice-4.1.0. #3222). This conflicts with the version shipped in some Hadoop distributions, so for Hadoop indexing you may need to adjust your mapreduce.job.classloader or mapreduce.job.user.classpath.first options. In testing we have found this to be an effective workaround. See http://druid.io/docs/0.9.2/operations/other-hadoop.html for details.
If you are using Roaring bitmaps, note that compressRunOnSerialization now defaults to true. As a result, segments written will not be readable by Druid 0.8.1 or earlier. If you need segments written by Druid 0.9.2 to be readable by 0.8.1, and you are using Roaring bitmaps, you must set compressRunOnSerialization = false. By default, bitmaps are Concise, not Roaring, so this point will not apply to you unless you overrode that. See Configurable compressRunOnSerialization for Roaring bitmaps. #3228 for details.
If you use the new long encoding or compression options, segments written by Druid will not be readable by any version older than 0.9.2. If you don't use the new options, segments will remain backwards compatible.
If you are using the experimental Kafka indexing service, there is a known issue that may cause task supervision to hang when it tries to stop all running tasks simultaneously during the upgrade process. To prevent this from happening, you can shutdown all supervisors and wait for the indexing tasks to complete before updating your overlord. Alternatively, you can set chatThreads in the supervisor tuning configuration to a value greater than the number of running tasks as a workaround.

Credits

Thanks to everyone who contributed to this release!

@acslk
@AlexanderSaydakov
@ashishawasthi
@b-slim
@chtefi
@dclim
@drcrallen
@du00cs
@ecesena
@erikdubbelboer
@fjy
@Fokko
@gianm
@giaosudau
@guobingkun
@gvsmirnov
@hamlet-lee
@himanshug
@HyukjinKwon
@jaehc
@jianran
@jon-wei
@kaijianding
@leventov
@linbojin
@michaelschiff
@navis
@nishantmonu51
@pjain1
@rajk-tetration
@SainathB
@sirpkt
@vogievetsky
@xvrl
@yuppie-flu

The text was updated successfully, but these errors were encountered:

MdeArcayne · 2016-09-30T15:43:29Z

Documentation for v0.9.2-rc1 is not available right now, any chance to upload it?

gianm · 2016-09-30T16:02:45Z

@MdeArcayne, we haven't actually finished releasing 0.9.2-rc1 due to some technical difficulties with the web site, see: https://groups.google.com/forum/#!topic/druid-development/aYBS7wQHho8.

Hope to have this done soon and announced on the mailing list.

MdeArcayne · 2016-09-30T16:28:37Z

@gianm Thanks for the quick answer, keep up the good work!

gianm · 2016-10-01T01:52:01Z

0.9.2-rc1 announced: https://groups.google.com/d/topic/druid-user/7LY8PUqGuAA/discussion

drcrallen · 2016-10-14T16:20:37Z

Performance results look good on our query systems for rc1. Pretty significant topN query performance improvement.

drcrallen · 2016-10-14T16:21:23Z

lower is better

giaosudau · 2016-10-14T16:40:03Z

@drcrallen It would be nice if you also show the group by compare.
Thanks.

drcrallen · 2016-10-14T16:42:56Z

@giaosudau We don't use group-by in a production environment.

fjy · 2016-10-17T19:41:03Z

@giaosudau in our internal benchmarks groupBys are 2-5x faster

drcrallen · 2016-10-24T20:55:14Z

Note: Upgrade ordering is very important here to ensure cardinality aggregators work appropriately after #3406

gianm · 2016-12-01T21:50:39Z

Final notes up at https://github.com/druid-io/druid/releases/tag/druid-0.9.2

sascha-coenen · 2016-12-03T23:05:46Z

I wonder what the topn performance improvement in above screenshot is due to? Is it the new long encodings or the loop unrolling that's doing that?

We did some early tests that aren't concluded yet, but while the improvement for hyperloglogs was immediately noticeable, the segment scan times for topn queries seemed to be mostly the same as before. Is there something that needs to be specifically taken care of, like special inlining directives for jvm or using the new segment encoding options, or special topn query scenarios that bring out the perf difference more than others?
thanks

gianm added the Release Notes label Sep 23, 2016

gianm added this to the 0.9.2 milestone Sep 23, 2016

ben-manes mentioned this issue Sep 27, 2016

LRU cache guarantee to keep size under limit #3510

Merged

gianm closed this as completed Dec 1, 2016

seoeun25 pushed a commit to seoeun25/incubator-druid that referenced this issue Feb 25, 2022

apache#3503 Add some more logs for query processing

99ed0e3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Druid 0.9.2 release notes #3503

Druid 0.9.2 release notes #3503

gianm commented Sep 23, 2016 •

edited

Loading

MdeArcayne commented Sep 30, 2016

gianm commented Sep 30, 2016

MdeArcayne commented Sep 30, 2016 •

edited

Loading

gianm commented Oct 1, 2016

drcrallen commented Oct 14, 2016

drcrallen commented Oct 14, 2016

giaosudau commented Oct 14, 2016

drcrallen commented Oct 14, 2016

fjy commented Oct 17, 2016

drcrallen commented Oct 24, 2016

gianm commented Dec 1, 2016

sascha-coenen commented Dec 3, 2016

Druid 0.9.2 release notes #3503

Druid 0.9.2 release notes #3503

Comments

gianm commented Sep 23, 2016 • edited Loading

DRAFT

Highlights

New groupBy engine

Ability to disable rollup

Ability to filter on longs

New long encodings

Sketch performance improvements

New extensions

And much more!

Updating from 0.9.1.1

Rolling updates

Query time lookups

Other notes

Credits

MdeArcayne commented Sep 30, 2016

gianm commented Sep 30, 2016

MdeArcayne commented Sep 30, 2016 • edited Loading

gianm commented Oct 1, 2016

drcrallen commented Oct 14, 2016

drcrallen commented Oct 14, 2016

giaosudau commented Oct 14, 2016

drcrallen commented Oct 14, 2016

fjy commented Oct 17, 2016

drcrallen commented Oct 24, 2016

gianm commented Dec 1, 2016

sascha-coenen commented Dec 3, 2016

gianm commented Sep 23, 2016 •

edited

Loading

MdeArcayne commented Sep 30, 2016 •

edited

Loading