Skip to content

Apache Druid 0.15.0-incubating contains over 250 new features, performance/stability/documentation improvements, and bug fixes from 39 contributors. Major new features and improvements include:

  • New Data Loader UI
  • Support transactional Kafka topic
  • New Moving Average query
  • Time ordering for Scan query
  • New Moments Sketch aggregator
  • SQL enhancements
  • Light lookup module for routers
  • Core ORC extension
  • Core GCP extension
  • Document improvements

The full list of changes is here: https://github.com/apache/incubator-druid/pulls?q=is%3Apr+is%3Aclosed+milestone%3A0.15.0

Documentation for this release is at: http://druid.apache.org/docs/0.15.0-incubating/

Highlights

New Data Loader UI (Batch indexing part)

0 15 0-data-loader

Druid has a new Data Loader UI which is integrated with the Druid Console. The new Data Loader UI shows some sampled data to easily verify the ingestion spec and generates the final ingestion spec automatically. The users are expected to easily issue batch index tasks instead of writing a JSON spec by themselves.

Added by @vogievetsky and @dclim in #7572 and #7531, respectively.

Support Kafka Transactional Topics

The Kafka indexing service now supports Kafka Transactional Topics.

Please note that only Kafka 0.11.0 or later versions are supported after this change.

Added by @surekhasaharan in #6496.

New Moving Average Query

A new query type was introduced to compute moving average.

Please see http://druid.apache.org/docs/0.15.0-incubating/development/extensions-contrib/moving-average-query.html for more details.

Added by @yurmix in #6430.

Time Ordering for Scan Query

The Scan query type now supports time ordering. Please see http://druid.apache.org/docs/0.15.0-incubating/querying/scan-query.html#time-ordering for more details.

Added by @justinborromeo in #7133.

New Moments Sketch Aggregator

The Moments Sketch is a new sketch type for approximate quantile computation. Please see http://druid.apache.org/docs/0.15.0-incubating/development/extensions-contrib/momentsketch-quantiles.html for more details.

Added by @edgan8 in #6581.

SQL enhancements

Druid community has been striving to enhance SQL support and now it's no longer experimental.

New SQL functions

Autocomplete in Druid Console

0 15 0-autocomplete

Druid Console now supports autocomplete for SQL.

Added by @shuqi7 in #7244.

Time-ordered scan support for SQL

Druid SQL supports time-ordered scan query.

Added by @justinborromeo in #7373.

Lookups view added to the web console

0 15 0-lookup-view

You can now configure your lookups from the web console directly.

Added by @shuqi7 in #7259.

Misc web console improvements

"NoSQL" mode : #7493 [@shuqi7]

The web console now has a backup mode that allows it to function as best as it can if DruidSQL is disabled or unavailable.

Added compaction configuration dialog : #7242 [@shuqi7]

You can now configure the auto compaction settings for a data source from the Datasource view.

Auto wrap query with limit : #7449 [@vogievetsky]

0 15 0-misc

The console query view will now (by default) wrap DruidSQL queries with a SELECT * FROM (...) LIMIT 1000 allowing you to enter queries like SELECT * FROM your_table without worrying about the impact to the cluster. You can still send 'raw' queries by selecting the option from the ... menu.

SQL explain query : #7402 [@shuqi7]

You can now click on the ... menu in the query view to get an explanation of the DruidSQL query.

Surface is_overshadowed as a column in the segments table #7555 , #7425 [@shuqi7][@surekhasaharan]

is_overshadowed column represents that this segment is overshadowed by any published segments. It can be useful to see what segments should be loaded by historicals. Please see http://druid.apache.org/docs/0.15.0-incubating/querying/sql.html for more details.

Improved status UI for actions on tasks, supervisors, and datasources : #7528 [shuqi7]

This PR condenses the actions list into a tidy menu and lets you see the detailed status for supervisors and tasks. New actions for datasources around loading and dropping data by interval has also been added.

Light Lookup Module for Routers

Light lookup module was introduced for Routers and they now need only minimum amount of memory. Please see http://druid.apache.org/docs/0.15.0-incubating/operations/basic-cluster-tuning.html#router for basic memory tuning.

Added by @clintropolis in #7222.

Core ORC extension

ORC extension is now promoted to a core extension. Please read the below 'Updating from 0.14.0-incubating and earlier' section if you are using the ORC extension in an earlier version of Druid.

Added by @clintropolis in #7138.

Core GCP extension

GCP extension is now promoted to a core extension. Please read the below 'Updating from 0.14.0-incubating and earlier' section if you are using the GCP extension in an earlier version of Druid.

Added by @drcrallen in #6953.

Document Improvements

Single-machine deployment example configurations and scripts

Several configurations and scripts were added for easy single machine setup. Please see http://druid.apache.org/docs/0.15.0-incubating/operations/single-server.html for details.

Added by @jon-wei in #7590.

Tool for migrating from local deep storage/Derby metadata

A new tool was added for easy migration from single machine to a cluster environment. Please see http://druid.apache.org/docs/0.15.0-incubating/operations/deep-storage-migration.html for details.

Added by @jon-wei in #7598.

Document for basic tuning guide

Documents for basic tuning guide was added. Please see http://druid.apache.org/docs/0.15.0-incubating/operations/basic-cluster-tuning.html for details.

Added by @jon-wei in #7629.

Security Improvement

The Druid system table now requires only mandatory permissions instead of the read permission for the whole sys database. Please see http://druid.apache.org/docs/0.15.0-incubating/development/extensions-core/druid-basic-security.html for details.

Added by @jon-wei in #7579.

Deprecated/removed

Drop support for automatic segment merge

The automatic segment merge by the coordinator is not supported anymore. Please use auto compaction instead.

Added by @jihoonson in #6883.

Drop support for insert-segment-to-db tool

In Druid 0.14.x or earlier, Druid stores segment metadata (descriptor.json file) in deep storage in addition to metadata store. This behavior has changed in 0.15.0 and it doesn't store segment metadata file in deep storage anymore. As a result, insert-segment-to-db tool is no longer supported as well since it works based on descriptor.json files in deep storage. Please see http://druid.apache.org/docs/0.15.0-incubating/operations/insert-segment-db.html for details.

Please note that kill task will fail if you're using HDFS as deep storage and descriptor.json file is missing in 0.14.x or earlier versions.

Added by @jihoonson in #6911.

Removed "useFallback" configuration for SQL

This option was removed since it generates unscalable query plans and doesn't work with some SQL functions.

Added by @gianm in #7567.

Removed a public API in CompressionUtils for extension developers

public static void gunzip(File pulledFile, File outDir) was removed in #6908 by @clintropolis.

Other behavior changes

Coordinator await initialization before finishing startup

A new configuration (druid.coordinator.segment.awaitInitializationOnStart) was added to make Coordinator wait for segment view initialization. This option is enabled by default.

Added by @QiuMM in #6847.

Coordinator API behavior change

The coordinator periodically polls segment metadata information from metadata store and caches them in memory. In Druid 0.14.x or earlier, removing segments via coordinator APIs (/druid/coordinator/v1/datasources/{dataSourceName} and /druid/coordinator/v1/datasources/{dataSourceName}/segments/{segmentId}) immediately updates the segment cache in memory as well as metadata store. But this behavior has changed in 0.15.0 and the cache is updated per poll rather than being updated immediately on removal. The below APIs can return removed segments via the above API calls until the cache is updated in the next poll.

  • /druid/coordinator/v1/metadata/datasources/{dataSourceName}
  • /druid/coordinator/v1/metadata/datasources/{dataSourceName}/segments
  • /druid/coordinator/v1/metadata/datasources/{dataSourceName}/segments/{segmentId}
  • /druid/coordinator/v1/metadata/datasources
  • /druid/coordinator/v1/loadstatus

The below metrics can also contain removed segments via the above API calls until the cache is updated in the next poll.

  • segment/unavailable/count
  • segment/underReplicated/count

This behavior was changed in #7595 by @surekhasaharan.

Listing Lookup API change

The /druid/coordinator/v1/lookups/config API now returns a list of tiers currently active in the cluster in addition to ones known in the dynamic configuration.

Added by @clintropolis in #7647.

Zookeeper loss

With a new configuration (druid.zk.service.terminateDruidProcessOnConnectFail), Druid processes can terminate itself on disconnection to ZooKeeper.

Added by @michael-trelinski in #6740.

Updating from 0.14.0-incubating and earlier

Minimum compatible Kafka version change for Kafka Indexing Service

Kafka 0.11.x or later versions are only supported after #6496. Please consider updating Kafka version if you're using an older one.

ORC extension changes

The ORC extension has been promoted to a core extension. When deploying 0.15.0-incubating, please ensure that your extensions directory does not have any older versions of druid-orc-extensions extension.

Additionally, even though the new core extension can index any data the old contrib extension could, the JSON spec for the ingestion task is incompatible, and will need modified to work with the newer core extension.

To migrate to 0.15.0-incubating:

  • In inputSpec of ioConfig, inputFormat must be changed from "org.apache.hadoop.hive.ql.io.orc.OrcNewInputFormat" to
    "org.apache.orc.mapreduce.OrcInputFormat"
  • The contrib extension supported a typeString property, which provided the schema of the
    ORC file, of which was essentially required to have the types correct, but notably not the column names, which facilitated column renaming. In the core extension, column renaming can be achieved with flattenSpec expressions.
  • The contrib extension supported a mapFieldNameFormat property, which provided a way to specify a dimension to flatten OrcMap columns with primitive types. This functionality has also been replaced with flattenSpec expressions.

For more details and examples, please see http://druid.apache.org/docs/0.15.0-incubating/development/extensions-core/orc.html.

GCP extension changes

The GCP extension has been promoted to a core extension. When deploying 0.15.0-incubating, please ensure that your extensions directory does not have any older versions of the druid-google-extensions extension.

Dropped auto segment merge

The coordinator configuration for auto segment merge (druid.coordinator.merge.on) is not supported anymore. Please use auto compaction instead.

Removed descriptor.json metadata file in deep storage

The segment metadata file (descriptor.json) is not stored in deep storage any more. If you are using HDFS as your deep storage and need to roll back to 0.14.x or earlier, then please consider that the kill task could fail because of the missing descriptor.json files.

Credits

Thanks to everyone who contributed to this release!

@a2l007
@asdf2014
@capistrant
@clintropolis
@dampcake
@dclim
@donbowman
@drcrallen
@Dylan1312
@edgan8
@es1220
@esevastyanov
@FaxianZhao
@fjy
@gianm
@glasser
@hpandeycodeit
@jihoonson
@jon-wei
@jorbay-au
@justinborromeo
@kamaci
@KazuhitoT
@leventov
@lxqfy
@michael-trelinski
@peferron
@puneetjaiswal
@QiuMM
@richardstartin
@samarthjain
@scrawfor
@shuqi7
@surekhasaharan
@venkatramanp
@vogievetsky
@xueyumusic
@xvrl
@yurmix

Assets 2
You can’t perform that action at this time.