Add basic tuning guide, getting started page, updated clustering docs #7629

jon-wei · 2019-05-10T02:39:17Z

This PR:

Adds a basic cluster tuning guide at operations/basic-cluster-tuning.md
Updates tutorials/cluster.md with information on migrating from a single-server deployment, and changes the examples to use the new bin/start-cluster-* scripts and configurations under examples/conf/druid/cluster
Adds a "Getting Started" page that links to the design/ingestion overview, the single-machine quickstarts, the clustering guide, and the community page

clintropolis · 2019-05-10T07:26:43Z

docs/content/operations/basic-cluster-tuning.md

+To estimate total memory usage of the Historical under these guidelines:
+
+- Heap: `(0.5GB * number of CPU cores) + (2 * total size of lookup maps) + druid.cache.sizeInBytes`
+- Direct Memory: `(druid.processing.numThreads + druid.processing.numMergeBuffers + 1) * druid.processing.buffer.sizeBytes`


Free space for page cache seems important enough that it should be listed here I think as going into the total calculation of historical usage.

Added a note on the importance of free system memory here

clintropolis · 2019-05-10T07:31:34Z

docs/content/operations/basic-cluster-tuning.md

+
+The Broker heap requirements scale based on the number of segments in the cluster, and the total data size of the segments. 
+
+The heap size will vary based on data size and usage patterns, but 4G to 8G is a good starting point for a small or medium cluster. For a rough estimate of memory requirements on the high end, very large clusters with a node count on the order of ~100 nodes may need Broker heaps of 30GB-60GB.


Should we clarify what we consider a "small or medium cluster"?

Added clarification here (~15 nodes or less)

clintropolis · 2019-05-10T07:32:37Z

docs/content/operations/basic-cluster-tuning.md

+
+The biggest contributions to heap usage on Brokers are:
+- Partial unmerged query results from Historicals and Tasks
+- The segment timeline


It might be worth including that this also consists of like the locations of all the segments on all historicals and realtime tasks, but I'm not sure how to do that concisely.

I added an explanation of the timeline and cached metadata

clintropolis · 2019-05-10T07:37:55Z

docs/content/operations/basic-cluster-tuning.md

+
+#### Number of Brokers
+
+A 1:15 ratio of Brokers to Historicals is a reasonable starting point (this is not a hard rule).


It might be obvious, but regardless maybe worth calling out the exception that if you need HA for queries that you should use that ratio after the first 2 brokers.

Added note on HA

clintropolis · 2019-05-10T07:42:21Z

docs/content/operations/basic-cluster-tuning.md

+
+The heap requirements of the Overlord scale with the number of servers, segments, and tasks in the cluster.
+
+You can set the Overlord heap to the same size as your Broker heap, or slightly smaller: both services have to process cluster-wide state and answer API requests about this state.


This doesn't sound right to me, an overlord isn't as heavy weight as a coordinator or broker

Adjust overlord recommendation to 25%-50% of coordinator heap

clintropolis · 2019-05-10T07:43:28Z

docs/content/operations/basic-cluster-tuning.md

+
+Please see the [General Connection Pool Guidelines](#general-connection-pool-guidelines) section for an overview of connection pool configuration.
+
+On the Brokers, please ensure that the sum of `druid.broker.http.numConnections` across all the Brokers is slightly lower than the value of `druid.server.http.numThreads` on your Historicals.


Should this be just historicals or also include realtime tasks?

Added Tasks here

clintropolis · 2019-05-10T07:45:30Z

docs/content/operations/basic-cluster-tuning.md

+
+`druid.processing.buffer.sizeBytes` is a closely related property that controls the size of the off-heap buffers allocated to the processing threads. 
+
+One buffer is allocated for each processing thread. A size between 500MB and 1GB (the default) is a reasonable choice for general use.


1G isn't the default, it is currently auto calculated if it can detect the amount of direct memory according to the formula, for java 8 at least.

Removed mention of default

sascha-coenen · 2019-05-11T12:12:11Z

docs/content/operations/basic-cluster-tuning.md

+
+For Historicals, `druid.server.http.numThreads` should be set to a value slightly higher than the sum of `druid.broker.http.numConnections` across all the Brokers in the cluster.
+
+Tuning the cluster so that each Historical can accept 50 queries and 10 non-queries is a reasonable starting point.


To me this statement is not clear. What is the definition of queries vs. non-queries and how to configure a number for each?

I updated the "General Connection Pool Guidelines" section with definition of queries vs non-queries, added a note on exact tuning-> depends on your inflow vs. drain rate for requests which depends on the specific queries you're running

clintropolis · 2019-05-15T00:43:57Z

docs/content/tutorials/cluster.md


-In production, we recommend deploying multiple Master servers with Coordinator and Overlord processes in a fault-tolerant configuration as well.
+In production, we recommend deploying multiple Master servers with Coordinator and Overlord processes in a fault-tolerant configuration.


Probably multiple query servers too for fault tolerance on the user side of things?

Adjusted this to recommend multiple master/query servers in production while dropping the example to one master server

clintropolis · 2019-05-15T00:45:37Z

examples/conf/druid/cluster/query/broker/jvm.config

-XX:MaxDirectMemorySize=12g
+-Xms12g
+-Xmx12g
+-XX:MaxDirectMemorySize=6g


I think you've sized this to run on 8 core 32g machine, should you add some more merge buffers or larger buffer sizes to use up a bit more of the space, and any of the rest give to heap for breathing room?

I doubled the number of merge buffers

clintropolis · 2019-05-15T01:10:48Z

docs/content/tutorials/cluster.md

+
+The example cluster above is chosen as a single example out of many possible ways to size a Druid cluster.
+
+You can choose smaller/larger hardware or less/more servers for your specific needs and constraints. 


I think it might also be worth mentioning here that advanced use cases can also choose not to co-locate services, to scale out the workload? It doesn't need to be very detailed, it just isn't incompatible with the following part that directs the reader to the basic tuning guide which describes how all the processes use resources independently, so it seems like there is no reason to leave it out since this section is about making choices.

Added a note on not colocating

fjy · 2019-05-15T21:45:29Z

docs/content/tutorials/cluster.md


-In production, we recommend deploying multiple Master servers with Coordinator and Overlord processes in a fault-tolerant configuration as well.
+In production, we recommend deploying multiple Master servers and multiple Query servers in a fault-tolerant configuration based on your specific fault-tolerance needs, but you can get started quickly with one Master and one Query server and add more servers later.


What about migrating the metadata store from local derby to something else? I think we should talk about HA and the metadata store

Nevermind, it is covered later

fjy · 2019-05-15T21:47:19Z

👍

jihoonson

This documents look very useful. Thanks @jon-wei.

jihoonson · 2019-05-15T19:52:22Z

docs/content/operations/basic-cluster-tuning.md

+
+Segments are memory-mapped by Historical processes using any available free system memory (i.e., memory not used by the Historical JVM and heap/direct memory buffers or other processes on the system). Segments that are not currently in memory will be paged from disk when queried.
+
+Therefore, `druid.server.maxSize` should be set such that a Historical is not allocated an excessive amount of segment data. As the value of (`free system memory` / `druid.server.maxSize`) increases, a greater proportion of segments can be kept in memory, allowing for better query performance.


Should we also mention that druid.server.maxSize should be sum of druid.segmentCache.locations across all cache locations?

Added info about druid.segmentCache.locations and how it relates to druid.server.maxSize

jihoonson · 2019-05-15T21:30:53Z

docs/content/operations/basic-cluster-tuning.md

+
+Tuning the cluster so that each Task can accept 50 queries and 10 non-queries is a reasonable starting point.
+
+#### SSD storage


How about moving this section to above Task Configurations since this is also a configuration for middleManagers?

Moved to MM section

jihoonson · 2019-05-15T21:34:44Z

docs/content/operations/basic-cluster-tuning.md

+
+The TopN and GroupBy queries use these buffers to store intermediate computed results. As the buffer size increases, more data can be processed in a single pass.
+
+## GroupBy Merging Buffers


Should it be groupBy v2 because it's the only one using merging buffers?

Added v2 here

jihoonson

+1 after CI. @jon-wei the CI failure looks fixed in #7667. Would you please merge master and trigger the CI?

jihoonson · 2019-05-16T17:17:02Z

Oh wait, this is a doc change. I'm changing my vote to just +1.

jihoonson · 2019-05-16T23:54:48Z

I'm backporting this to 0.15.0-incubating since it contains changes for some scripts and tutorial configurations.

…apache#7629) * Add basic tuning guide, getting started page, updated clustering docs * Add note about caching, fix tutorial paths * Adjust hadoop wording * Add license * Tweak * Shrink overlord heaps, fix tutorial urls * Tweak xlarge peon, update peon sizing * Update Data peon buffer size * Fix cluster start scripts * Add upper level _common to classpath * Fix cluster data/query confs * Address PR comments * Elaborate on connection pools * PR comments * Increase druid.broker.http.maxQueuedBytes * Add guidelines for broker backpressure * PR comments

…#7629) (#7684) * Add basic tuning guide, getting started page, updated clustering docs * Add note about caching, fix tutorial paths * Adjust hadoop wording * Add license * Tweak * Shrink overlord heaps, fix tutorial urls * Tweak xlarge peon, update peon sizing * Update Data peon buffer size * Fix cluster start scripts * Add upper level _common to classpath * Fix cluster data/query confs * Address PR comments * Elaborate on connection pools * PR comments * Increase druid.broker.http.maxQueuedBytes * Add guidelines for broker backpressure * PR comments

…apache#7629) (apache#7684) * Add basic tuning guide, getting started page, updated clustering docs * Add note about caching, fix tutorial paths * Adjust hadoop wording * Add license * Tweak * Shrink overlord heaps, fix tutorial urls * Tweak xlarge peon, update peon sizing * Update Data peon buffer size * Fix cluster start scripts * Add upper level _common to classpath * Fix cluster data/query confs * Address PR comments * Elaborate on connection pools * PR comments * Increase druid.broker.http.maxQueuedBytes * Add guidelines for broker backpressure * PR comments

…apache#7629) * Add basic tuning guide, getting started page, updated clustering docs * Add note about caching, fix tutorial paths * Adjust hadoop wording * Add license * Tweak * Shrink overlord heaps, fix tutorial urls * Tweak xlarge peon, update peon sizing * Update Data peon buffer size * Fix cluster start scripts * Add upper level _common to classpath * Fix cluster data/query confs * Address PR comments * Elaborate on connection pools * PR comments * Increase druid.broker.http.maxQueuedBytes * Add guidelines for broker backpressure * PR comments

Add basic tuning guide, getting started page, updated clustering docs

de8b2c5

jon-wei added Area - Documentation Ease of Use labels May 10, 2019

jon-wei added 3 commits May 9, 2019 20:22

Add note about caching, fix tutorial paths

9e54dca

Adjust hadoop wording

84df008

Add license

256f3ba

clintropolis reviewed May 10, 2019

View reviewed changes

fjy added this to the 0.15.0 milestone May 10, 2019

jon-wei added 8 commits May 10, 2019 13:25

Tweak

c6dafb0

Shrink overlord heaps, fix tutorial urls

76fceb4

Tweak xlarge peon, update peon sizing

b0be6a9

Update Data peon buffer size

fc4c53d

Fix cluster start scripts

5fe7303

Add upper level _common to classpath

10502ab

Fix cluster data/query confs

6ce1da1

Address PR comments

9f9818a

sascha-coenen reviewed May 11, 2019

View reviewed changes

Elaborate on connection pools

7b38b09

clintropolis reviewed May 15, 2019

View reviewed changes

jon-wei added 2 commits May 15, 2019 13:41

PR comments

bfed651

Increase druid.broker.http.maxQueuedBytes

7e32003

fjy reviewed May 15, 2019

View reviewed changes

jihoonson reviewed May 15, 2019

View reviewed changes

jon-wei added 3 commits May 15, 2019 15:16

Add guidelines for broker backpressure

0e777e5

Merge remote-tracking branch 'upstream/master' into more_clustering_docs

ecca0b6

PR comments

4d8354f

jihoonson reviewed May 16, 2019

View reviewed changes

clintropolis approved these changes May 16, 2019

View reviewed changes

Merge remote-tracking branch 'upstream/master' into more_clustering_docs

81f62be

fjy merged commit d667655 into apache:master May 16, 2019

jon-wei mentioned this pull request May 17, 2019

[Backport] Add basic tuning guide, getting started page, updated clustering docs (#7629) #7684

Merged

This was referenced May 24, 2019

Fix references to removed performance FAQ page #7755

Merged

Fix performance-faq and remove insert-segment-to-db redirects #7759

Merged

jihoonson mentioned this pull request Jun 8, 2019

0.15.0-incubating release notes #7854

Closed

kgyrtkirk mentioned this pull request Sep 17, 2024

Fix Multiple GC declared error when running Druid Cluster #17078

Merged

2 tasks


		The Broker heap requirements scale based on the number of segments in the cluster, and the total data size of the segments.

		The heap size will vary based on data size and usage patterns, but 4G to 8G is a good starting point for a small or medium cluster. For a rough estimate of memory requirements on the high end, very large clusters with a node count on the order of ~100 nodes may need Broker heaps of 30GB-60GB.


		#### Number of Brokers

		A 1:15 ratio of Brokers to Historicals is a reasonable starting point (this is not a hard rule).


		The heap requirements of the Overlord scale with the number of servers, segments, and tasks in the cluster.

		You can set the Overlord heap to the same size as your Broker heap, or slightly smaller: both services have to process cluster-wide state and answer API requests about this state.


		Please see the [General Connection Pool Guidelines](#general-connection-pool-guidelines) section for an overview of connection pool configuration.

		On the Brokers, please ensure that the sum of `druid.broker.http.numConnections` across all the Brokers is slightly lower than the value of `druid.server.http.numThreads` on your Historicals.


		`druid.processing.buffer.sizeBytes` is a closely related property that controls the size of the off-heap buffers allocated to the processing threads.

		One buffer is allocated for each processing thread. A size between 500MB and 1GB (the default) is a reasonable choice for general use.


		For Historicals, `druid.server.http.numThreads` should be set to a value slightly higher than the sum of `druid.broker.http.numConnections` across all the Brokers in the cluster.

		Tuning the cluster so that each Historical can accept 50 queries and 10 non-queries is a reasonable starting point.


		In production, we recommend deploying multiple Master servers with Coordinator and Overlord processes in a fault-tolerant configuration as well.
		In production, we recommend deploying multiple Master servers with Coordinator and Overlord processes in a fault-tolerant configuration.


		The example cluster above is chosen as a single example out of many possible ways to size a Druid cluster.

		You can choose smaller/larger hardware or less/more servers for your specific needs and constraints.


		Segments are memory-mapped by Historical processes using any available free system memory (i.e., memory not used by the Historical JVM and heap/direct memory buffers or other processes on the system). Segments that are not currently in memory will be paged from disk when queried.

		Therefore, `druid.server.maxSize` should be set such that a Historical is not allocated an excessive amount of segment data. As the value of (`free system memory` / `druid.server.maxSize`) increases, a greater proportion of segments can be kept in memory, allowing for better query performance.


		Tuning the cluster so that each Task can accept 50 queries and 10 non-queries is a reasonable starting point.

		#### SSD storage


		The TopN and GroupBy queries use these buffers to store intermediate computed results. As the buffer size increases, more data can be processed in a single pass.

		## GroupBy Merging Buffers

Add basic tuning guide, getting started page, updated clustering docs #7629

Add basic tuning guide, getting started page, updated clustering docs #7629

Conversation

jon-wei commented May 10, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jon-wei May 10, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fjy commented May 15, 2019

jihoonson left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jihoonson left a comment

Choose a reason for hiding this comment

jihoonson commented May 16, 2019

jihoonson commented May 16, 2019

jon-wei May 10, 2019 •

edited

Loading