Skip to content


Folders and files

Last commit message
Last commit date

Latest commit



11 Commits

Repository files navigation

Setup Guide

First Start The Druid, from apache-druid-26.0.0 folder


Then Start Kafka server (version/folder kafka_2.13-2.7.0)

./bin/ config/

Both are running on the same zookeper instance so the order matters

Then run the file to populate the queue on the topic mentioned in the code.

During Closing first close the Kafka Server, then the druid server.

Druid Service SQL calls

SQL Calls can be made to localhost:8888,

  "query" : "SELECT TIME_FLOOR(\"__time\", 'P1D') AS \"__time_by_day\" , AVG(\"tempValue\") AS \"avg_tempValue\" FROM \"test1\" GROUP BY 1 ORDER BY 2 DESC"

This Query gets average temperature on a hourly basis

to send the request we write use the following command (if we add the query into the query.json file)

curl -X POST -H'Content-Type: application/json' http://localhost:8888/druid/v2/sql/ -d @query.json

Running Kafka and Druid in a linux machine.

for druid to get working we have to make sure that the port 8080 on the linux machine is not being utilized. we can verify this by

lsof -i :8080

and if some process is running we can kill it (if possible) using

pkill -9 <proces id from lsof command>

For Kafka to work, we need to change so that it accepts listeners other than the ones from localhost, so we need to change the advertised.listners

advertised.listeners = PLAINTEXT://

Here the IP address provided is of the linux machine running the kafka and druid instance.

Setting Up Hadoop as the deep storage database;

we first download and tar the hadoop version 2.10.2 we change, etc/hadoop/core-site.xml to,


and, etc/hadoop/hdfs-site.xml


after wards we need to format the filesystem for the namenode;

 bin/hdfs namenode -format

to start and stop the dfs deamon

## to start
## to stop

We can make directories in the hadoop distributed Files sytem using

bin/hdfs dfs -mkdir input
// we can make a root path like
bin/hdfs dfs -mkdir -p /user/root

we create two paths for the druid project

  • druid/segments, to store data segments
  • druid/index-logs, to store the indexing logs that are used for querying.

next we copy all the .xml files from etc/hadoop/*.xml to conf/druid/single-server/nano-quickstart/_common, in the druid system

and we change the commands in conf/druid/single-server/nano-quickstart/_common/ = hdfs = hdfs://localhost:9000/user/root/druid/segments

// and for indexs

druid.indexer.logs.type = hdfs hdfs://localhost:9000/user/root/druid/index-logs

Running the entire application

First, run the hdfs file system, then the druid system and then the kafka queue

// 1 inside the hadoop2.10.2 folder

//2  inside the druid folder

//3  inside the kafka folder.
./bin/ config/

application in the current state is working and is able to use the hadoop DFS as its deep storage.

Clustered Deployment

  • We copied the config files from the nano-quickstart in cluster config.

  • all the files are same except the indexer/runtime.configuratio, which is not present in the nano-quickstart. start up for clustered development->

  • first start the hdfs server in ubuntu server

    • sbin/ [from the hadoop folder in druid project]
  • next start the master server, data server, query server

    • bin/start-cluster-master-with-zk-server in DuruidClusterProject inside

    • bin/start-cluster-data-server in DuruidClusterProject-Data inside

    • bin/start-cluster-query-server in DuruidClusterProject-Data inside

  • start the kafka broker,

    • ./bin/ config/ in, inside the kafka-2_13-3.5.0 folder
# for the master server in DruidClusterProject
# for the data server in the DruidClusterProject-data
# for the query server in the DruidClusterProject-query

The order matters so , first start master, then data and then query.

Appendix- useful commands

to learn where the java sdk is

whereis javac

to get the complete directory

readlink -f /usr/bin/javac

the "/usr/bin/javac" is the output from above command (whereis javac)

to get the list of all the places where java sdks are

update-alternatives --list java


No releases published


No packages published
