Skip to content
darxriggs edited this page Dec 2, 2012 · 12 revisions

Cluster composition

Logical Composition

The general logical composition of a Druid cluster is:

  1. Compute daemons (at least 2, but the more, the merrier; reads from block store)
  2. Master daemons (at least one, two for high availability, 3 or more for capacity)
  3. Broker daemons (at least one, two for high availability, 3 or more for capacity)
  4. Zookeeper daemons (at least one, 3 for high availability, 5 for jumbo)
  5. MySQL (or equiv.) for metadata service (at least one, 2 for a hot backup)
  6. blob store for holding data segments (such as S3)
  7. optional Realtime (1 or more for real-time data ingestion, writes to block store)
  8. optional, periodic or episodic indexer Batch-ingestion (writes to block store)

Sizing for Cores and RAM

The Compute and Broker nodes will use as many cores as are available, depending on usage, so it is best to keep these on dedicated machines. The upper limit of effectively utilized cores is not well characterized yet and would depend on types of queries, query load, and the schema. Compute daemons should have a heap a size of at least 1GB per core for normal usage, but could be squeezed into a smaller heap for testing. Since in-memory caching is essential for good performance, even more RAM is better. Broker nodes will use RAM for caching, so they do more than just route queries.

The effective utilization of cores by Zookeeper, MySQL, and Master nodes is likely to be between 1 and 2 for each process/daemon, so these could potentially share a machine with lots of cores. These daemons work with heap a size between 500MB and 1GB.

Storage

Indexed segments should be kept in a permanent store accessible by all nodes like AWS S3 or HDFS or equivalent. Currently Druid supports S3, but this will be extended soon.

Local disk (“ephemeral” on AWS EC2) for caching is recommended over network mounted storage (example of mounted: AWS EBS, Elastic Block Store) in order to avoid network delays during times of heavy usage. If your data center is suitably provisioned for networked storage, perhaps with separate LAN/NICs just for storage, then mounted might work fine.

Absolute Minimum Physical Layout

As a special case, the absolute minimum setup is one of the standalone examples for realtime ingestion and querying; see RealtimeStandaloneMain that can easily run on one machine with one core and 1GB RAM.

Minimum Physical Layout, for Experimental Testing with 4GB of RAM

A minimal physical layout for a 1 or 2 core machine with 4GB of RAM, no realtime, and single user, experimental use is:

  1. node1: Master + metadata service + zookeeper + Compute
  2. transient nodes: indexer

The Broker is not required when there is only one Compute (the Compute daemon can service queries directly). Obviously, more cores, up to 4, will improve single-user request latency performance.

This setup is only reasonable to prove that a configuration works. It would not be worthwhile to use this layout for performance measurement, of course, and the active data set size is minimal.

Comfortable Physical Layout, for Pilot Project on AWS-EC2 “small” or larger machines

A minimal physical layout not constrained by cores that demonstrates parallel querying and realtime, using AWS-EC2 “small”/m1.small (one core, with 1.7GB of RAM) or larger, no realtime, is:

  1. node1: Master (m1.small)
  2. node2: metadata service (m1.small)
  3. node3: zookeeper (m1.small)
  4. node4: Broker (m1.small or m1.medium or m1.large)
  5. node5: Compute (m1.small or m1.medium or m1.large)
  6. node6: Compute (m1.small or m1.medium or m1.large)
  7. node7: Realtime (m1.small or m1.medium or m1.large)
  8. transient nodes: indexer

This layout naturally lends itself to adding more RAM and core to Compute nodes, and to adding many more Compute nodes. Depending on the actual load, the Master, metadata server, and zookeeper might need to use larger machines.

The machine size “flavors” are using AWS/EC2 terminology for descriptive purposes only and is not meant to imply that AWS/EC2 is required or recommended. Another cloud provider or your own hardware can also work.

High Availability Physical Layout

An HA layout allows full rolling restarts and heavy volume:

  1. node1: Master (m1.small or m1.medium or m1.large)
  2. node2: Master (m1.small or m1.medium or m1.large)
  3. node3: metadata service (c1.medium or m1.large)
  4. node4: metadata service (c1.medium or m1.large)
  5. node5: zookeeper (c1.medium)
  6. node6: zookeeper (c1.medium)
  7. node7: zookeeper (c1.medium)
  8. node8: Broker (m1.small or m1.medium or m1.large or m2.xlarge or m2.2xlarge or m2.4xlarge)
  9. node9: Broker (m1.small or m1.medium or m1.large or m2.xlarge or m2.2xlarge or m2.4xlarge)
  10. node10: Compute (m1.small or m1.medium or m1.large or m2.xlarge or m2.2xlarge or m2.4xlarge)
  11. node11: Compute (m1.small or m1.medium or m1.large or m2.xlarge or m2.2xlarge or m2.4xlarge)
  12. node12: Realtime (m1.small or m1.medium or m1.large or m2.xlarge or m2.2xlarge or m2.4xlarge)
  13. transient nodes: indexer

The machine size “flavors” are using AWS/EC2 terminology for descriptive purposes only and is not meant to imply that AWS/EC2 is required or recommended. Another cloud provider or your own hardware can also work.

Setup

Setting up a cluster is essentially just firing up all of the nodes you want with the proper configuration. One thing to be aware of is that there are a few properties in the configuration that potentially need to be set individually for each process:


druid.server.type=historical|realtime
druid.host=someHostOrIPaddrWithPort
druid.port=8080

druid.server.type should be set to “historical” for your compute nodes and realtime for the realtime nodes. The master will only assign segments to a “historical” node and the broker has some intelligence around its ability to cache results when talking to a realtime node. This does not need to be set for the master or the broker.

druid.host should be set to the hostname and port that can be used to talk to the given server process. Basically, someone should be able to send a request to http://${druid.host}/ and actually talk to the process.

druid.port should be set to the port that the server should listen on. In the vast majority of cases, this port should be the same as what is on druid.host.

Build/Run

We are working on some init.d-style scripts to fire up the individual nodes and when that is done, we will update this documentation to discuss how those can be used to fire up the various instances.

That said, there is a lot of value in understanding what scripts are doing for you, so this is the basest description of how to run the processes.

The simplest way to build and run from the repository is to run mvn package from the base directory and then take druid-services/target/druid-services-*-selfcontained.jar and push that around to your machines; the jar does not need to be expanded, and since it contains the main() methods for each kind of service, it is not invoked with java -jar. It can be run from a normal java command-line by just including it on the classpath and then giving it the main class that you want to run. For example one instance of the Compute node/service can be started like this:


java -Duser.timezone=UTC -Dfile.encoding=UTF-8 -cp compute/runtime.properties:druid-services/target/druid-services-*-selfcontained.jar com.metamx.druid.http.ComputeMain

The following table shows the possible services and fully qualified class for main().

service main class
Realtime com.metamx.druid.http.RealtimeMain
Master com.metamx.druid.http.MasterMain
Broker com.metamx.druid.http.BrokerMain
Compute com.metamx.druid.http.ComputeMain