Skip to content

Druid 0.7.0 - Stable

Compare
Choose a tag to compare
@xvrl xvrl released this 24 Feb 17:37
· 9175 commits to master since this release

Updating to Druid 0.7.0 – Things to be Aware

  • New ingestion spec

    Druid 0.7.0 requires a new ingestion spec format. Druid 0.6.172 supports both the old and new formats of ingestion and has scripts to convert from the old to the new format. This script can be run with 'tools convertSpec' using the same Main used to run Druid nodes. You can update your Druid cluster to 0.6.172, update your ingestion specs to the new format, and then update to Druid 0.7.0. If you update your cluster to Druid 0.7.0 directly, make sure your real-time ingestion pipeline understands the new spec.

  • MySQL is no longer the default metadata storage

    Druid now defaults to embedding Apache Derby, which was chosen mainly for testability purposes. However, we do not recommend using Derby in production. For anything other than testing, please use MySQL or PostgreSQL metadata storage.

    Configuration parameters for metadata storage were renamed from druid.db to druid.metadata.storage and an additional druid.metadata.storage.type=<mysql|postgresql> is required to use anything other than Derby.

    The convertProps tool can assis you in convertng all 0.6.x properties to 0.7 properties.

  • Druid is now case-sensitive

    Druid column names are now case-sensitive. We previously tried to be case-insensitive for queries and case-preserving for data, but we decided to make this change as there were numerous bugs related to various casing problems.

    If you are upgrading from version 0.6.x:

    1. Please make sure the column casing in your queries matches the casing of your column names in your data and update your queries accordingly.
    2. One very important thing to note is that 0.6 internally lower-cased all column names at ingestion time and query time. In 0.7, this is no longer the case, however, we still strongly recommend that you use lowercase column names in 0.7 for simplicity.
    3. If you are currently ingesting data with mixed case column names as part of your data or ingestion schema:
      • for TSV or CSV data, simply lower-case your column names in your schema when you update to 0.7.0.
      • for JSON data with mixed case fields and if you were not specifying the names of the columns, you can use the jsonLowerCase parseSpec to lower-case the data for you at ingestion time and maintain backwards compatibility.

    For all other parse specs, you will need to lower-case the

    metric/aggregator names if you were using mixed case before.

  • Batch segment announcement is now the default

    Druid now uses batch segment announcement by default for all nodes. If you are already using batch segment announcement, you should be all set.

    If you have not yet updated to using batch segments announcement, please read this guide in the forum on how to update your current 0.6.x cluster to use batch announcement first.

  • Kafka 0.7.x removed in favor of Kafka 0.8.x

    If you are using Kafka 0.7, you will have to build the kafka-seven extension manually. It is commented out in the build, because Kafka 0.7 is not available in Maven Central. The Kafka 0.8 (kafka-eight) extension is unaffected.

  • Coordinator endpoint changes

    Numerous coordinator endpoints have changed. Please refer to the coordinator documentation for what they are.

    In particular:

    1. /info on the coordinator has been removed.
    2. /health on historical nodes has been removed
  • Separate jar required for com.metamx.metrics.SysMonitor

    If you currenly have com.metamx.metrics.SysMonitor as part of your druid.monitoring.monitors configuration and would like to keep it, you will have to add the SIGAR library jar to your classpath.

    Alternatively, you can simply remove com.metamx.metrics.SysMonitor if you do not rely on the sys/.* metrics.

    We had to remove the direct dependency on SIGAR in order to move Druid artifacts to Maven Central, since SIGAR is currently not available there.

  • Update Procedure

    If you are running a version of Druid older than 0.6.172, please upgrade to 0.6.172 first. See the 0.6.172 release notes for instructions.

    In order to ensure a smooth rolling upgrade without downtime, nodes must be updated in the following order:

    1. historical nodes
    2. indexing service/real-time nodes
    3. router nodes (if you have any),
    4. broker nodes
    5. coordinator nodes

New Features

  • Long metric column support

    Until now Druid stored all metrics as single precision floating point values, which could introduce rounding errors and unexpected results with queries using longSum aggregators, especially for groupBy queries.

  • Pluggable metadata storage

    MySQL, PostgreSQL, and Derby (for testing) are now supported out of the box. Derby only supports single master or should not be used for high availability production, use MySQL or PostgreSQL failover for that.

  • Simplified data ingestion API

    completely redo Druid’s data ingestion API.

  • Switch compression for metric colums from LZF to LZ4

    Initial performance tests show it may be between 15% and 25% faster, and results in segments about 3-5% smaller on typical data sets.

  • Configurable inverted bitmap indexes

    Druid now supports Roaring Bitmaps in addition to the default Concise Bitmaps. Initial performance tests show Roaring may be up to 20% faster for certain types of queries, at the expense of segments being 20% larger on average.

  • Integration tests

    We have added a set of integration tests that use Docker to spin up a Druid cluster to run a series of indexing and query tests.

  • New Druid Coordinator console

    We introduced a new Druid console that should hopefully provide a better overview of the status of your cluster and be a bit more scalable if you have hundreds of thousands of segments. We plan to expand this console to provide more information about the current state of a Druid cluster.

  • Query Result Context

    Result contexts can report errors during queries in the query headers. We are currently using this feature for internal retries, but hope to expand it to report more information back to clients.

Improvements

  • Faster query speeds

    Lots of speed improvements thanks to faster compression format, small optimizations in column structure, and optimizations of queries with multiple aggregations, as well as numerous groupBy query performance improvements. Overall, some queries can be up to twice as fast using the new index format.

  • Druid artifacts in Maven Central

    Druid artifacts are now available in Maven Central to make your own builds and deployments easier.

  • Common Configuration File

    Druid now has a common.runtime.properties where you can declare all global properties as well as all of your external dependencies. This avoids repeated configuration across multiple nodes and will hopefully make setting up a Druid cluster a little less painful.

  • Default host names, port and service names

    Default host names, ports, and service names for all nodes means a lot less configuration is required upfront if you are happy with the defaults. It also means you can run all node types on a single machine without fiddling with port conflicts.

  • Druid column names are now case sensitive

    Death to casing bugs. Be aware of the dangers of updating to 0.7.0 if you have mixed case columns and are using 0.6.x. See above for more details.

  • Query Retries

    Druid will now automatically retry queries for certain classes of failures.

  • Background caching

    For certain types of queries, especially those that involve distinct (hyperloglog) counts, this can improve performance up over 20%. Background caching is disabled by default.

  • Reduced coordinator memory usage

    Reduced coordinator memory usage (by up to 50%). This fixes a problem where a coordinator would sometimes lose leadership due to frequent GCs.

  • Metrics can now be emitted to SSL endpoints

  • Additional AWS credentials support, Thanks @gnethercutt

  • Additional persist and throttle metrics for real-time ingestion

    This should help diagnose when real-time ingestion is being throttled and how long persists are taking. These metrics provide a good indication of when it is time to scale up real-time ingestion.

  • Broker initialization endpoint

    Brokers now provides a status endpoint at /druid/broker/v1/loadstatus to indicate whether they are ready to be queried, making rolling upgrades / restarts easier.

Bug Fixes

  • Support multiple shards of the same datasource on the same realtime node. Thanks @zhaown
  • HDFS task logs should now work as expected. Thanks @flowbehappy.
  • Possible deadlock condition fixed in the Broker.
  • Various fixes for GZIP compression in returning results.
  • druid.host should now support IPv6 addresses as well.

Documentation

  • New tutorials.
  • New ingestion documentation.
  • New configuration documentation.
  • Improvements to rule documentation. Thanks @mrijke

Known issues

  • Merging segments with different types of bitmap indices is currently not possible, so if you have both types of indices in your cluster, you must set druid.coordinator.merge.on to false. ‘false’ is the default value of the config.
  • #1045 Issue with GoupBy queries with complex aggregations and post-aggregations using the same name
  • druid-io/druid-api#38 If you are using longSum in your ingestion spec, having floating point data may throw exceptions.