Skip to content

druid-0.16.0-incubating

Compare
Choose a tag to compare
@clintropolis clintropolis released this 25 Sep 00:36
· 4475 commits to master since this release

Apache Druid 0.16.0-incubating contains over 350 new features, performance enhancements, bug fixes, and major documentation improvements from 50 contributors. Check out the complete list of changes and everything tagged to the milestone.

Highlights

# Performance

# 'Vectorized' query processing

An experimental 'vectorized' query execution engine is new in 0.16.0, which can provide a speed increase in the range of 1.3-3x for timeseries and group by v2 queries. It operates on the principle of batching operations on rows instead of processing a single row at a time, e.g. iterating bitmaps in batches instead of per row, reading column values in batches, filtering in batches, aggregating values in batches, and so on. This results in significantly fewer method calls, better memory locality, and increased cache efficiency.

This is an experimental feature, but we view it as the path forward for Druid query processing and are excited for feedback as we continue to improve and fill out missing features in upcoming releases.

  • Only timeseries and groupBy have vectorized engines.
  • GroupBy doesn't handle multi-value dimensions or granularity other than "all" yet.
  • Vector cursors cannot handle virtual columns or descending order.
  • Expressions are not supported anywhere: not as inputs to aggregators, in virtual functions, or in filters.
  • Only some aggregators have vectorized implementations: "count", "doubleSum", "floatSum", "longSum", "hyperUnique", and "filtered".
  • Only some filters have vectorized matchers: "selector", "bound", "in", "like", "regex", "search", "and", "or", and "not".
  • Dimension specs other than "default" don't work yet (no extraction functions or filtered dimension specs).

The feature can be enabled by setting "vectorize": true your query context (the default is false). This works both for Druid SQL and for native queries. When set to true, vectorization will be used if possible; otherwise, Druid will fall back to its non-vectorized query engine. You can also set it to "force", which will return an error if the query cannot be fully vectorized. This is helpful for confirming that vectorization is indeed being used.

You can control the block size during execution by setting the vectorSize query context parameter (default is 1000).

#7093
#6794

# GroupBy array-based result rows

groupBy v2 queries now use an array-based representation of result rows, rather than the map-based representation used by prior versions of Druid. This provides faster generation and processing of result sets. Out of the box this change is invisible and backwards-compatible; you will not have to change any configuration to reap the benefits of this more efficient format, and it will have no impact on cached results. Internally this format will always be utilized automatically by the broker in the queries that it issues to historicals. By default the results will be translated back to the existing 'map' based format at the broker before sending them back to the client.

However, if you would like to avoid the overhead of this translation, and get even faster results,resultAsArray may be set on the query context to directly pass through the new array based result row format. The schema is as follows, in order:

  • Timestamp (optional; only if granularity != ALL)
  • Dimensions (in order)
  • Aggregators (in order)
  • Post-aggregators (optional; in order, if present)

#8118
#8196

# Additional performance enhancements

The complete set of pull requests tagged as performance enhancements for 0.16 can be found here.

# "Minor" compaction

Users of the Kafka indexing service and compaction and who get a trickle of late data, can find a huge improvement in the form of a new concept called 'minor' compaction. Enabled by internal changes to how data segments are versioned, minor compaction is based on the idea of 'segment' based locking at indexing time instead of the current Druid locking behavior (which is now referred to as 'time chunk' locking). Segment locking as you might expect allows only the segments which are being compacted to be locked, while still allowing new 'appending' indexing tasks (like Kafka indexing tasks) to continue to run and create new segments, simulataneously. This is a big deal if you get a lot of late data, because the current behavior results in compaction tasks starving as higher priority realtime tasks hog the locks. This prevention of compaction tasks from optimizing the datasources segment sizes results in reduced overall performance.

To enable segment locking, you will need to set forceTimeChunkLock to false in the task context, or set druid.indexer.tasklock.forceTimeChunkLock=false in the Overlord configuration. However, beware, after enabling this feature, due to the changes in segment versioning, there is no rollback path built in, so once you upgrade to 0.16, you cannot downgrade to an older version of Druid. Because of this, we highly recommend confirming that Druid 0.16 is stable in your cluster before enabling this feature.

It has a humble name, but the changes of minor compaction run deep, and it is not possible to adequately describe the mechanisms that drive this in these release notes, so check out the proposal and PR for more details.

#7491
#7547

# Druid "indexer" process

The new Indexer process is an alternative to the MiddleManager + Peon task execution system. Instead of forking a separate JVM process per-task, the Indexer runs tasks as separate threads within a single JVM process. The Indexer is designed to be easier to configure and deploy compared to the MiddleManager + Peon system and to better enable resource sharing across tasks.

The advantage of the Indexer is that it allows query processing resources, lookups, cached authentication/authorization information, and much more to be shared between all running indexing task threads, giving each individual task access to a larger pool of resources and far fewer redundant actions done than is possible with the Peon model of execution where each task is isolated in its own process.

Using Indexer does come with one downside: the loss of process isolation provided by Peon processes means that a single task can potentially affect all running indexing tasks on that Indexer. The druid.worker.globalIngestionHeapLimitBytes and druid.worker.numConcurrentMerges configurations are meant to help minimize this. Additionally, task logs for indexer processes will be inline with the Indexer process log, and not persisted to deep storage.

You can start using indexing by supplying server indexer as the command-line argument to org.apache.druid.cli.Main when starting the service. To use Indexer in place of a MiddleManager and Peon, you should be able to adapt values from the configuration into the Indexer configuration, lifting druid.indexer.fork.property. configurations directly to the Indexer, and sizing heap and direct memory based on the Peon sizes multiplied by the number of task slots (unlike a MiddleManager, it does not accept the configurations druid.indexer.runner.javaOpts or druid.indexer.runner.javaOptsArray). See the indexer documentation for details.

#8107

# Native parallel batch indexing with shuffle

In 0.16.0, Druid's index_parallel native parallel batch indexing task now supports 'perfect' rollup with the implementation of a 2 stage shuffle process.

Tasks in stage 1 perform a secondary partitioning of rows on top of the standard time based partitioning of segment granularity, creating an intermediary data segment for each partition. Stage 2 tasks are each assigned a set of the partitionings created during stage 1, and will collect and combine the set of intermediary data segments which belong to that partitioning, allowing it to achieve complete rollup when building the final segments. At this time, only hash-based partitioning is supported.

This can be enabled by setting forceGuaranteedRollup to true in the tuningConfig; numShards in partitionsSpec and intervals in granularitySpec must also be set.

The Druid MiddleManager (or the new Indexer) processes have a new responsibility for these indexing tasks, serving the intermediary partition segments output of stage 1 into the stage 2 tasks, so depending on configuration and cluster size, the MiddleManager jvm configuration might need to be adjusted to increase heap allocation and http threads. These numbers are expected to scale with cluster size, as all MiddleManager or Indexer processes involved in a shuffle will need the ability to communicate with each other, but we do not expect the footprint to be significantly larger than it is currently. Optimistically we suggest trying with your existing configurations, and bumping up heap and http thread count only if issues are encountered.

#8061

# Web console

# Data loader

The console data loader, introduced in 0.15, has been expanded in 0.16 to support Kafka, Kinesis, segment reindexing, and even "inline" data which can be pasted in directly.

DataLoaderKafka

#7643
#8181
#8056
#7947

# SQL query view

The query view on the web console has received a major upgrade for 0.16, transitioning into an interactive point-and-click SQL editor. There is a new column picker sidebar and the result output table let you directly manipulate the SQL query without needing to actually type SQL.

rn_sql_query_view

There is also a query history view and a the ability to fully edit the query context.

#7905
#7934
#8251
#7816

# Servers view

The "Data servers" view has been renamed to "Servers", and now displays a list of all Druid processes which are discovered as part of the cluster, using the sys.servers system table.

rn_servers_view

This should make it much more convenient to at a glance ensure that all your Druid servers are up and reporting to the rest of the cluster.

#7770
#7654

# Datasources view

The datasource view adds a new segment timeline visualization, allowing segment size and distribution to be gauged visually.

rn_datasources_view

If you previously have used what is now the legacy coordinator console, do not be alarmed, the timeline is reversed and the newest segments are now on the right side of the visualization!

#8202

# Tasks view

The tasks view has been significantly improved to assist a cluster operators ability to manage Druid supervisors and indexing tasks. The supervisors display now includes status information to allow determining their overall status and displaying error messaging if a supervisor is in error.

#7428
#7799

# SQL and native query enhancements

# SQL and native expression support for multi-value string columns

Druid SQL and native query expression 'virtual columns' can now correctly utilize multi-value string columns, either operating on them as individual VARCHAR values for parity with native Druid query behavior, or as an array like type, with a new set of multi-value string functions. Note that so far this is only supported in query time expressions; use of multi-value expressions at ingestion time via a transformSpec will be available in a future release. The complete list of added functions and behavior changes is far too large to list here, so check out the SQL and expression documentation for more details.

#7525
#7588
#7950
#7973
#7974
#8011

# TIMESTAMPDIFF

To complement TIMESTAMPADD which can modify timestamps by time units, a TIMESTAMPDIFF which can compute the signed number of time units between two timestamps. For syntax information, check out the SQL documentation

#7695

# TIME_CEIL

TIME_CEIL, a time specific, more flexible version of the CEIL function, has also been added to Druid 0.16.0. This function can round a timestamp up by an ISO8601 period, like P3M (quarters) or PT12H (half-days), optionally for a specific timezone. See SQL documentation for additional information.

#8027

# IPv4

Druid 0.16.0 also adds specialized SQL operators and native expressions for dealing with IPv4 internet addresses in dotted-decimal string or integer format. The new operators are IPV4_MATCH(address, subnet), IPV4_PARSE(address), and IPV4_STRINGIFY(address), which can match IP addresses to subnets in CIDR notation, translate dotted-decimal string format to integer format, and translate integer format into dotted-decimal string format, respectively. See SQL documentation for details.

#8223

# NVL

To increase permissiveness and SQL compatibility with users coming to Druid with experience in other databases, an alias for the COALESCE function has been added in the form of NVL, which some SQL dialects use instead. See SQL documentation for details

#7965

# long/double/float sum/min/max aggregator support for string columns

Long, double, and float sum, min, and max aggregators now will permissively work when used with string columns, performing a best-effort parsing to try and translate them to the correct numerical type.

#8243
#8319

# Official Docker image

An official "convenience binary" Docker image will now be offered for every release starting with 0.16, available at https://hub.docker.com/r/apache/incubator-druid.

# Refreshed website documentation

The documentation section of the Druid website has received a major upgrade, transitioning to using Docusaurus, which creates much more beautiful and functional pages than we have currently. Each documentation page now has a left-hand collapsible table of contents showing the outline of the overall docs, and a right-hand table of contents showing the outline of that particular doc page, vastly improving navigability.

Alongside this, the ingestion documentation has been totally refreshed, beginning with a new ingestion/index.md doc that introduces all the key ingestion spec concepts, and describes the most popular ingestion methods. This was a much needed rework of many existing documentation pages, which had grown organically over time and have become difficult to follow, into a simpler set of fewer, larger, more cross-referenced pages. They are also a bit more 'opinionated', pushing new people towards Kafka, Kinesis, Hadoop, and native batch ingestion. They discuss Tranquility but don't present it as something highly recommended since it is effectively in minimal maintenance mode.

Check out these improvements (and many more) here.

#8311

Extensions

# druid-datasketches

The druid-datasketches extension, built on top of Apache Datasketches (incubating), has been expanded with 3 new post aggregators, quantilesDoublesSketchToRank which computes an approximation to the rank of a given value that is the fraction of the distribution less than that value, and quantilesDoublesSketchToCDF which computes an approximation to the Cumulative Distribution Function given an array of split points that define the edges of the bins.

Another post aggregation, thetaSketchToString which will print a summary of sketch has been added to assist in debugging. See Datasketches extension documentation to learn more.

#7550
#7937

The HLLSketch aggregator has been improved with a query-time only round option to support rounding values into whole numbers, to give it feature parity with the built-in cardinality and hyperUnique aggregators.

#8023

Finally, users of HllSketch should also see a performance improvement due to some changes made which allow Druid to precompute an empty sketch and copy that into the aggregation buffers, greatly decreasing time to initialize the aggregator during query processing.

#8194

# druid-stats

The druid-stats core extension has been enhanced with SQL support, exposing VAR_POP and VAR_SAMP to compute variance population and sample with the variance aggregator, as well as STDDEV_POP and STDDEV_SAMPto compute standard deviation population and sample using the standard deviation post aggregator. Additionally, VARIANCE and STDDEV functions are added as aliases for VAR_SAMP and STDDEV_SAMP respectively. See SQL documentation and stats extension documentation for more details.

#7801

# druid-histogram

The 'fixed bucket' histogram aggregator of the druid-histogram extension has a new added property finalizeAsBase64Binary that enables serializing the resulting histogram as a base64 string instead of a human readable JSON summary, making it consistent with the approximate histogram aggregator of the same extension. See the documentation for further information.

#7784

# statsd-emitter

The statsd-emitter extension has a new option, druid.emitter.statsd.dogstatsdServiceAsTag which enables emitting the Druid service name as a 'tag', such as druid_service:druid/broker instead of part of the metric name as in druid.broker.query.time. druid is used as the prefix. so the example of druid.broker.query.time would instead be druid.query.time, allowing consolidation metrics across Druid service types and discriminating by tag instead. druid.emitter.statsd.dogstatsd must be set to true for this setting to take effect. See the documentation for more details.

#8238
#8472

# New extension: druid-tdigestsketch

A new set of approximate sketch aggregators for computing quantiles and the like and based on t-digest has been added in Druid 0.16. T-digest was designed for parallel programming use cases like distributed aggregations or map reduce jobs by making combining two intermediate t-digests easy and efficient. It serves to complement existing algorithms provided by the Apache Datasketches extension and moments sketch extension. See the extension documentation for more details.

#7303
#7331

# New extension: druid-influxdb-emitter

A new Druid emitter extension to allow sending Druid metrics to influxdb over HTTP has also been added in 0.16. Currently this emitter only emits service metric events to InfluxDB (See Druid metrics for a list of metrics). When a metric event is fired it is added to a queue of events. After a configurable amount of time, the events on the queue are transformed to InfluxDB's line protocol and POSTed to the InfluxDB HTTP API. The entire queue is flushed at this point. The queue is also flushed as the emitter is shutdown. See the extension docs for details.

#7717

# Fine tuning your workloads

In addition to the experimental vectorized query engine and new indexer process type, 0.16 also has some additional features available to allow potentially fine tuning indexing and query performance via experimentation.

# Control of indexing intermediary segment compression

First up, the ability to independently control what compression is used (or disable it) when persisting intermediary segments during indexing. This configuration available to the indexSpec property, and can be added to tuningConfig as:

"indexSpecForIntermediatePersists": {
  "dimensionCompression": "uncompressed",
  "metricCompression": "none"
}

for example to disable compression entirely for intermediary segments. One potential reason to consider 'uncompressed' intermediary segments is to ease up on the amount of Java 'direct' memory required to perform the final merge of intermediary segments before they are published and pushed to deep storage, as reading data from uncompressed columns does not require the 64kb direct buffers which are used to decode lz4 and other encoded columns. Of course this is a trade-off of storage space and page cache footprint, so we recommend experimenting with this before settling on a configuration to use for your production workloads.

#7919

# Control Filter Bitmap Index Utilization

Bitmap indexes are usually a huge performance boost for Druid, but under some scenarios can result in slower query speeds, particularly in cases of computationally expensive filters on very high cardinality dimensions. In Druid 0.16, a new mechanism to provide some manual control over when bitmap indexes are utilized, and when a filter will be done as a row scan are now in place, and available on a per filter, per query basis. Most filters will accept a new property, filterTuning, which might look something like this:

"filterTuning": {
  "useBitmapIndex": true
}

useBitmapIndex if set to false will disallow a filter to utilize bitmap indexes. This property is optional, and default behavior if filterTuning is not supplied remains unchanged. Note that this feature is not documented in user facing documentation, considered experimental, and subject to change in any future release.

#8209

# Request logging

If you would have liked to enable Druid request logging, but use Druid SQL and find them a bit too chatty due to all the metadata queries, you are luck with 0.16 due to a new configuration option that allows selectively muting specific types of queries from request logging. The option, druid.request.logging.mutedQueryTypes, accepts a list of "queryType" strings as defined by Druid's native JSON query API, and defaults to an empty list (so nothing is ignored). For example,

druid.request.logging.mutedQueryTypes=["segmentMetadata", "timeBoundary"]

would mute all request logs for segmentMetadata and timeBoundary queries.

#7562

# Upgrading to Druid 0.16.0

# S3 task log storage

After upgrading to 0.16.0, the task logs will require the same S3 permissions as pushing segments to deep storage, which has some additional logic to possibly get the bucket owner (by fetching the bucket ACL) and set the ACL of the segment object to give the bucket owner full access. If you wish to avoid providing these additional permissions, the existing behavior can be retained by setting druid.indexer.logs.disableAcl=true.

#7907

# Druid SQL

Druid SQL is now enabled on the router by default. Note that this may put some additional pressure on broker processes, due to additional metadata queries required to maintain datasource schemas for query planning, and can be disabled to retain the previous behavior by setting druid.sql.enable=false.

#7808

# Druid SQL lookup function

Druid SQL queries that use the LOOKUP function will now take advantage of injective lookups that are a one-to-one transformation, which allows an optimization to be performed that avoids pushing down lookup evaluation to historical processes, instead allowing the transformation to be done later. The drawback of this change is that lookups which are incorrectly defined as being one-to-one, but are not in fact a one-to-one transformation at evaluation time will produce incorrect query results.

#7655

# Coordinator metadata API changes

The metadata API provided by the coordinator has been modified to provide more consistent HTTP responses. The changes are minor, but backwards incompatible if you depend on a specific API.

  • POST /druid/coordinator/v1/datasources/{dataSourceName}
  • DELETE /druid/coordinator/v1/datasources/{dataSourceName}
  • POST /druid/coordinator/v1/datasources/{dataSourceName}/markUnused
  • POST /druid/coordinator/v1/datasources/{dataSourceName}/markUsed

now return a JSON object of the form {"numChangedSegments": N} instead of 204 (empty response) when no segments were changed. On the other
hand, 500 (server error) is not returned instead of 204 (empty response).

  • POST /druid/coordinator/v1/datasources/{dataSourceName}/segments/{segmentId}
  • DELETE /druid/coordinator/v1/datasources/{dataSourceName}/segments/{segmentId}

now return a JSON object of the form {"segmentStateChanged": true/false}.

/druid/coordinator/v1/metadata/datasources?includeDisabled is now /druid/coordinator/v1/metadata/datasources?includeUnused. includeDisabled is still accepted, but a warning is emitted in a log.

#7653

# Compaction tasks

The keepSegmentGranularity option would create compaction tasks that ignored interval boundary of segments before compacting them, but was deprecated because it was not very useful in most cases. If you did find this behavior useful, it can still be achieved by setting segmentGranularity to ALL.

#7747

There is a known issue with all Druid versions that support 'auto' compaction via the Coordinator, where auto compaction can get stuck repeatedly trying the same interval. This issue can now be mitigated in 0.16 by setting targetCompactionSizeBytes, but is still present when using maxRowsPerSegment or maxTotalRows. A future version of Druid will fully resolve this issue.

#8481
#8495

# Indexing spec

All native indexing task types now use PartitonsSpec to define secondary partitioning to be consistent with other indexing task types. The changes are not backwards incompatible, but users should consider migrating maxRowsPerSegment, maxTotalRows, numShards, and partitionDimensions to the partitions spec of native, kafka, and kinesis indexing tasks.

#8141

# Kafka indexing

Incremental publishing of segments for the Apache Kafka indexing service was introduced in Druid 0.12, at which time the existing Kafka indexing service was retained as a 'legacy' format to allow rolling update from older versions of Druid. This 'legacy' codebase has now been removed in Druid 0.16.0, which means that a rolling-update from a Druid version older than 0.12.0 is not supported.

#7735

# Avro extension

The fromPigAvroStorage option has been removed from the Apache Avro extension in Druid 0.16, in order to clean up dependencies. This option provided a special transformation of GenericData.Array, but this column type will now be handled by the default List implementation.

#7810

# Kerberos extension

druid.auth.authenticator.kerberos.excludedPaths superseded by druid.auth.unsecuredPaths which applies across all authenticators/authorizers, so use this configuration property instead.

#7745

# Zookeeper

The setting druid.zk.service.terminateDruidProcessOnConnectFail has been removed and this is now the default behavior - Druid processes will exit when an unhandled Curator client exception occurs instead of continuing to run in a 'zombie' state.

Additionally, a new optional setting druid.zk.service.connectionTimeoutMs, which wires into the Curator client connectionTimeoutMs, setting is now available to customize. By default it will continue to use the Curator default of 15000 milliseconds.

#8458

# Realtime process

The deprecated standalone 'realtime' process has been removed from Druid. Use of this has been discouraged for some time, but now middleManager and the new index processes should process all of your realtime indexing tasks.

#7915
#8020

Credits

Thanks to everyone who contributed to this release!

@a2l007
@abossert
@AlexanderSaydakov
@AlexandreYang
@andrewluotechnologies
@asdf2014
@awelsh93
@blugowski
@capistrant
@ccaominh
@clintropolis
@dclim
@dene14
@dinsaw
@Dylan1312
@esevastyanov
@fjy
@Fokko
@gianm
@gocho1
@himanshug
@ilhanadiyaman
@jennyzzz
@jihoonson
@Jinstarry
@jon-wei
@justinborromeo
@kamaci
@khwj
@legoscia
@leventov
@litao91
@lml2468
@mcbrewster
@nieroda
@nishantmonu51
@pjain1
@pphust
@samarthjain
@SandishKumarHN
@sashidhar
@satybald
@sekingme
@shuqi7
@surekhasaharan
@viongpanzi
@vogievetsky
@xueyumusic
@xvrl
@yurmix