Installing and Setup Druid

As of this writing we used druid version 0.9.0. Download and unzip a version of Druid. You will also need zookeeper We are using version zookeeper-3.4.6. We have a top level folder named druid, underneath we have zookeeper installed and different versions of druid(for example druid-0.8.2). Use this structure if you want to use the helper start/stop procedures listed in this page.

Setup MySQL

Druid needs a Relational DB as a Metadata Storage. Here we describe how to setup Mysql. For other databases see

Assuming you have root access to your mysql db, the setup steps are:

Start a cli session: mysql -u root -p
Inside the session issue the following commands.

CREATE DATABASE druid DEFAULT CHARACTER SET utf8;

CREATE USER 'druid'@'%' IDENTIFIED BY 'diurd';
GRANT ALL PRIVILEGES ON *.* TO 'druid'@'%' WITH GRANT OPTION;

CREATE USER 'druid'@'localhost' IDENTIFIED BY 'diurd';
GRANT ALL PRIVILEGES ON *.* TO 'druid'@'localhost' WITH GRANT OPTION;

flush privileges;

Configuration Settings for Druid.

The settings we use for a dev. environment are listed below. For information on settings see the druid production settings and configuration pages

In addition we setup helper start and stop scripts. The sequence we use to start the Druid services is:

cd <druid_home>
../zookeeper-3.4.6/bin/zkServer.sh stop
../zookeeper-3.4.6/bin/zkCleanup.sh 
../zookeeper-3.4.6/bin/zkServer.sh start
./start-all.sh

The script to stop all the services is ./stop-all.sh

Common

# Extensions (no deep storage model is listed - using local fs for deep storage - not recommended for production)
druid.extensions.coordinates=["io.druid.extensions:druid-examples","io.druid.extensions:druid-kafka-eight","io.druid.extensions:mysql-metadata-storage"]

# Zookeeper
druid.zk.service.host=localhost

# Metadata Storage (mysql)
druid.metadata.storage.type=mysql
druid.metadata.storage.connector.connectURI=jdbc\:mysql\://localhost\:3306/druid
druid.metadata.storage.connector.user=druid
druid.metadata.storage.connector.password=diurd

# Deep storage (local filesystem for examples - don't use this in production)
druid.storage.type=local
druid.storage.storageDirectory=/Users/hbutani/druid/localStorage

# Query Cache (we use a simple 10mb heap-based local cache on the broker)
druid.cache.type=local
druid.cache.sizeInBytes=10000000

# Indexing service discovery
druid.selectors.indexing.serviceName=overlord

# Monitoring (disabled for examples, if you enable SysMonitor, make sure to include sigar jar in your cp)
# druid.monitoring.monitors=["com.metamx.metrics.SysMonitor","com.metamx.metrics.JvmMonitor"]

# Metrics logging (disabled for examples - change this to logging or http in production)
druid.emitter=noop

Broker

druid.service=broker

# We enable using the local query cache here
druid.broker.cache.useCache=true
druid.broker.cache.populateCache=true

# For prod: set numThreads = # cores - 1, and sizeBytes to 512mb
druid.processing.buffer.sizeBytes=100000000
druid.processing.numThreads=1

Coordinator

# Default host: localhost. Default port: 8081. If you run each node type on its own node in production, you should override these values to be IP:8080
#druid.host=localhost
#druid.port=8081
druid.service=coordinator

# The coordinator begins assignment operations after the start delay.
# We override the default here to start things up faster for examples.
# In production you should use PT5M or PT10M
druid.coordinator.startDelay=PT70s

Historical

# Default host: localhost. Default port: 8083. If you run each node type on its own node in production, you should override these values to be IP:8080
#druid.host=localhost
#druid.port=8083
druid.service=historical


# Our intermediate buffer is also very small so longer topNs will be slow.
# In prod: set sizeBytes = 512mb
druid.processing.buffer.sizeBytes=100000000
# We can only 1 scan segment in parallel with these configs.
# In prod: set numThreads = # cores - 1
druid.processing.numThreads=1

# maxSize should reflect the performance you want.
# Druid memory maps segments.
# memory_for_segments = total_memory - heap_size - (processing.buffer.sizeBytes * (processing.numThreads+1)) - JVM overhead (~1G)
# The greater the memory/disk ratio, the better performance you should see
druid.segmentCache.locations=[{"path": "/tmp/druid/indexCache", "maxSize"\: 10000000000}]
druid.server.maxSize=10000000000

Overlord

# Default host: localhost. Default port: 8090. If you run each node type on its own node in production, you should override these values to be IP:8080
#druid.host=localhost
#druid.port=8090
druid.service=overlord

# Run the overlord in local mode with a single peon to execute tasks
# This is not recommended for production.
druid.indexer.queue.startDelay=PT0M
# This setting is too small for real production workloads
druid.indexer.runner.javaOpts="-server -Xmx4g"
# These settings are also too small for real production workloads
# Please see our recommended production settings in the docs (http://druid.io/docs/latest/Production-Cluster-Configuration.html)
druid.indexer.fork.property.druid.processing.numThreads=4
druid.indexer.fork.property.druid.computation.buffer.size=500000000

Realtime

# Default host: localhost. Default port: 8084. If you run each node type on its own node in production, you should override these values to be IP:8080
#druid.host=localhost
#druid.port=8084
druid.service=realtime

# We can only 1 scan segment in parallel with these configs.
# Our intermediate buffer is also very small so longer topNs will be slow.
# In production sizeBytes should be 512mb, and numThreads should be # cores - 1
druid.processing.buffer.sizeBytes=100000000
druid.processing.numThreads=1

# Enable Real monitoring
# druid.monitoring.monitors=["com.metamx.metrics.SysMonitor","com.metamx.metrics.JvmMonitor","io.druid.segment.realtime.RealtimeMetricsMonitor"]

Overview
Quick Start
- Installing and Setup Druid
User Guide
- [Defining a DataSource on a Flattened Dataset](https://github.com/SparklineData/spark-druid-olap/wiki/Defining-a Druid-DataSource-on-a-Flattened-Dataset)
- Defining a Star Schema
- Sample Queries
- Approximate Count and Spatial Queries
- Druid Datasource Options
- Sparkline SQLContext Options
- Using Tableau with Sparkline
- How to debug a Query Plan?
- Running the ThriftServer with Sparklinedata components
- [Setting up multiple Sparkline ThriftServers - Load Balancing & HA] (https://github.com/SparklineData/spark-druid-olap/wiki/Setting-up-multiple-Sparkline-ThriftServers-(Load-Balancing-&-HA))
- Runtime Views
- Sparkline SQL extensions
- Sparkline Pluggable Modules
Dev. Guide
Reference Architectures
- Accelerating existing SQL Datasets
Releases
Cluster Spinup Tool
TPCH Benchmark
- Generating Denormalized TPCH Dataset
- Build TPCH Index for Benchmark

Provide feedback

Saved searches

Use saved searches to filter your results more quickly