Druid 0.9.0

@gianm gianm released this Apr 12, 2016 · 930 commits to master since this release

Druid 0.9.0 introduces an update to the extension system that requires configuration changes. There were additionally over 400 pull requests from 0.8.3 to 0.9.0. Below we highlight the more important changes in this patch.

Full list of changes is here: https://github.com/druid-io/druid/issues?q=milestone%3A0.9.0+is%3Aclosed

Updating from 0.8.x

Extensions

In Druid 0.9, we have refactored the extension loading mechanism. The main reason behind this change is to make Druid load extensions from the local file system without having to download stuff from the internet at runtime.

To learn all about the new extension loading mechanism, see Include extensions and Include Hadoop Dependencies. If you are impatient, here is the summary.

The following properties have been deprecated:
druid.extensions.coordinates
druid.extensions.remoteRepositories
druid.extensions.localRepository
druid.extensions.defaultVersion

Instead, specify druid.extensions.loadList, druid.extensions.directory and druid.extensions.hadoopDependenciesDir.

druid.extensions.loadList specifies the list of extensions that will be loaded by Druid at runtime. An example would be druid.extensions.loadList=["druid-datasketches", "mysql-metadata-storage"].

druid.extensions.directory specifies the directory where all the extensions live. An example would be druid.extensions.directory=/xxx/extensions.

Note that mysql-metadata-storage extension is not packaged in druid distribution due to license issue. You will have to manually download it from druid.io, decompress and then put in the extensions directory specified.

druid.extensions.hadoopDependenciesDir specifies the directory where all the Hadoop dependencies live. An example would be druid.extensions.hadoopDependenciesDir=/xxx/hadoop-dependencies. Note: We didn't change the way of specifying which Hadoop version to use. So you just need to make sure the Hadoop you want to use exists underneath /xxx/hadoop-dependencies.

You might now wonder if you have to manually put extensions inside /xxx/extensions and /xxx/hadoop-dependencies. The answer is no, we already have created them for you. Download the latest Druid tarball at http://druid.io/downloads.html. Unpack it and you will see extensions and hadoop-dependencies folders there. Simply copy them to /xxx/extensions and /xxx/hadoop-dependencies respectively, now you are all set!

If the extension or the Hadoop dependency you want to load is not included in the core extension, you can use pull-deps to download it to your extension directory.

If you want to load your own extension, you can first do mvn install to install it into local repository, and then use pull-deps to download it to your extension directory.

Please feel free to leave any questions regarding the migration.

Extensions have now also been refactored in core and contrib extensions. Core extensions will be maintained by Druid committers and are packaged as part of the download tarball. Contrib extensions are community maintained and can be installed as needed. For more information, please see here.

Ordering of Dimensions

Until Druid 0.8.x the order of dimensions given at indexing time did not affect the way data gets indexed. Rows would be ordered first by timestamp, then by dimension values, in lexicographical order of dimension names.

As of Druid 0.9.0, Druid respects the given dimension order given and will order rows first by timestamp, then by dimension values, in the given dimension order.

This means segments may now vary in size depending on the order in which dimensions are given. Specifying a dimension with many unique values first, may result in worse compression than specifying dimensions with repeating values first.

Min/Max Aggregators no longer supported, use doubleMin/doubleMax instead

As indicated in the 0.8.3 release notes, min/max aggregators have been removed in favor of doubleMin, doubleMax, longMin, and longMax aggregators.

If you have any issues starting up because of this, please see #2749

Configuration changes

druid.indexer.task.baseDir and druid.indexer.task.baseTaskDir now default to using the standard Java temporary directory specified by java.io.tmpdir system property, instead of /tmp,

Other issues to be aware of: https://github.com/druid-io/druid/issues?q=milestone%3A0.9.0+is%3Aclosed+label%3A%22Release+Notes%22

and

https://github.com/druid-io/druid/issues?q=milestone%3A0.9.0+is%3Aclosed+label%3AIncompatible

New Features

Full list: https://github.com/druid-io/druid/issues?q=milestone%3A0.9.0+is%3Aclosed+label%3AFeature

#1719 Add Rackspace Cloud Files Deep Storage Extension
#1858 Support avro ingestion for realtime & hadoop batch indexing
#1873 add ability to express CONCAT as an extractionFn
#1921 Add docs and benchmark for JSON flattening parser
#1936 adding Upper/Lower Bound Filter
#1978 Graphite emitter
#1986 Preserve dimension order across indexes during ingestion
#2008 Regex search query
#2014 Support descending time ordering for time series query
#2043 Add dimension selector support for groupby/having filter
#2076 adding lower and upper extraction fn
#2209 support cascade execution of extraction filters in extraction dimension spec
#2221 Allow change minTopNThreshold per topN query
#2264 Adding custom mapper for json processing exception
#2271 time-descending result of select queries
#2258 acl for zookeeper is added

Improvements

Full list: https://github.com/druid-io/druid/issues?q=milestone%3A0.9.0+is%3Aclosed+label%3AImprovement

#984 Use thread priorities. (aka set nice values for background-like tasks)
#1638 Remove Maven client at runtime + Provide a way to load Druid extensions through local file system
#1728 Store AggregatorFactory[] in segment metadata
#1988 support multiple intervals in dataSource inputSpec
#2006 Preserve dimension order across indexes during ingestion
#2047 optimize InputRowSerde
#2075 Configurable value replacement on match failure for RegexExtractionFn
#2079 reduce bytearray copy to minimal optimize VSizeIndexedWriter
#2084 minor optimize IndexMerger's MMappedIndexRowIterable
#2094 Simplifying dimension merging
#2107 More efficient SegmentMetadataQuery
#2111 optimize create inverted indexes
#2138 build v9 directly
#2228 Improve heap usage for IncrementalIndex
#2261 Prioritize loading of segments based on segment interval
#2306 More specific null/empty str handling in IndexMerger

Bug Fixes

Full list: https://github.com/druid-io/druid/issues?q=milestone%3A0.9.0+is%3Aclosed+label%3ABug

Documentation

Full list: https://github.com/druid-io/druid/issues?q=milestone%3A0.9.0+is%3Aclosed+label%3ADocumentation

#2100 doc update to make it easy to find how to do re-indexing or delta ingestion
#2186 Add intro developer docs
#2279 Some more multitenancy docs
#2364 Add more docs around timezone handling
#2216 Completely rework the Druid getting started process

Thanks to everyone who contributed to this patch!
@fjy
@xvrl
@drcrallen
@pjain1
@chtefi
@liubin
@salsakran
@jaebinyo
@erikdubbelboer
@gianm
@bjozet
@navis
@AlexanderSaydakov
@himanshug
@guobingkun
@abbondanza
@binlijin
@rasahner
@jon-wei
@CHOIJAEHONG1
@loganlinn
@michaelschiff
@himank
@nishantmonu51
@sirpkt
@duilio
@pdeva
@KurtYoung
@mangesh-pardeshi
@dclim
@desaianuj
@stevemns
@b-slim
@cheddar
@jkukul
@AdrieanKhisbe
@liuqiyun
@codingwhatever
@clintropolis
@zhxiaogg
@rohitkochar
@itsmee
@Angelmmiguel
@noddi
@se7entyse7en
@zhaown
@genevien

Downloads