-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add basic tuning guide, getting started page, updated clustering docs #7629
Conversation
To estimate total memory usage of the Historical under these guidelines: | ||
|
||
- Heap: `(0.5GB * number of CPU cores) + (2 * total size of lookup maps) + druid.cache.sizeInBytes` | ||
- Direct Memory: `(druid.processing.numThreads + druid.processing.numMergeBuffers + 1) * druid.processing.buffer.sizeBytes` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Free space for page cache seems important enough that it should be listed here I think as going into the total calculation of historical usage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a note on the importance of free system memory here
|
||
The Broker heap requirements scale based on the number of segments in the cluster, and the total data size of the segments. | ||
|
||
The heap size will vary based on data size and usage patterns, but 4G to 8G is a good starting point for a small or medium cluster. For a rough estimate of memory requirements on the high end, very large clusters with a node count on the order of ~100 nodes may need Broker heaps of 30GB-60GB. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we clarify what we consider a "small or medium cluster"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added clarification here (~15 nodes or less)
|
||
The biggest contributions to heap usage on Brokers are: | ||
- Partial unmerged query results from Historicals and Tasks | ||
- The segment timeline |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be worth including that this also consists of like the locations of all the segments on all historicals and realtime tasks, but I'm not sure how to do that concisely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added an explanation of the timeline and cached metadata
|
||
#### Number of Brokers | ||
|
||
A 1:15 ratio of Brokers to Historicals is a reasonable starting point (this is not a hard rule). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be obvious, but regardless maybe worth calling out the exception that if you need HA for queries that you should use that ratio after the first 2 brokers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added note on HA
|
||
The heap requirements of the Overlord scale with the number of servers, segments, and tasks in the cluster. | ||
|
||
You can set the Overlord heap to the same size as your Broker heap, or slightly smaller: both services have to process cluster-wide state and answer API requests about this state. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't sound right to me, an overlord isn't as heavy weight as a coordinator or broker
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adjust overlord recommendation to 25%-50% of coordinator heap
|
||
Please see the [General Connection Pool Guidelines](#general-connection-pool-guidelines) section for an overview of connection pool configuration. | ||
|
||
On the Brokers, please ensure that the sum of `druid.broker.http.numConnections` across all the Brokers is slightly lower than the value of `druid.server.http.numThreads` on your Historicals. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be just historicals or also include realtime tasks?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added Tasks here
|
||
`druid.processing.buffer.sizeBytes` is a closely related property that controls the size of the off-heap buffers allocated to the processing threads. | ||
|
||
One buffer is allocated for each processing thread. A size between 500MB and 1GB (the default) is a reasonable choice for general use. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1G isn't the default, it is currently auto calculated if it can detect the amount of direct memory according to the formula, for java 8 at least.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed mention of default
|
||
For Historicals, `druid.server.http.numThreads` should be set to a value slightly higher than the sum of `druid.broker.http.numConnections` across all the Brokers in the cluster. | ||
|
||
Tuning the cluster so that each Historical can accept 50 queries and 10 non-queries is a reasonable starting point. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To me this statement is not clear. What is the definition of queries vs. non-queries and how to configure a number for each?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated the "General Connection Pool Guidelines" section with definition of queries vs non-queries, added a note on exact tuning-> depends on your inflow vs. drain rate for requests which depends on the specific queries you're running
docs/content/tutorials/cluster.md
Outdated
|
||
In production, we recommend deploying multiple Master servers with Coordinator and Overlord processes in a fault-tolerant configuration as well. | ||
In production, we recommend deploying multiple Master servers with Coordinator and Overlord processes in a fault-tolerant configuration. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably multiple query servers too for fault tolerance on the user side of things?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adjusted this to recommend multiple master/query servers in production while dropping the example to one master server
-XX:MaxDirectMemorySize=12g | ||
-Xms12g | ||
-Xmx12g | ||
-XX:MaxDirectMemorySize=6g |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you've sized this to run on 8 core 32g machine, should you add some more merge buffers or larger buffer sizes to use up a bit more of the space, and any of the rest give to heap for breathing room?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I doubled the number of merge buffers
docs/content/tutorials/cluster.md
Outdated
|
||
The example cluster above is chosen as a single example out of many possible ways to size a Druid cluster. | ||
|
||
You can choose smaller/larger hardware or less/more servers for your specific needs and constraints. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it might also be worth mentioning here that advanced use cases can also choose not to co-locate services, to scale out the workload? It doesn't need to be very detailed, it just isn't incompatible with the following part that directs the reader to the basic tuning guide which describes how all the processes use resources independently, so it seems like there is no reason to leave it out since this section is about making choices.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a note on not colocating
|
||
In production, we recommend deploying multiple Master servers with Coordinator and Overlord processes in a fault-tolerant configuration as well. | ||
In production, we recommend deploying multiple Master servers and multiple Query servers in a fault-tolerant configuration based on your specific fault-tolerance needs, but you can get started quickly with one Master and one Query server and add more servers later. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about migrating the metadata store from local derby to something else? I think we should talk about HA and the metadata store
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nevermind, it is covered later
👍 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This documents look very useful. Thanks @jon-wei.
|
||
Segments are memory-mapped by Historical processes using any available free system memory (i.e., memory not used by the Historical JVM and heap/direct memory buffers or other processes on the system). Segments that are not currently in memory will be paged from disk when queried. | ||
|
||
Therefore, `druid.server.maxSize` should be set such that a Historical is not allocated an excessive amount of segment data. As the value of (`free system memory` / `druid.server.maxSize`) increases, a greater proportion of segments can be kept in memory, allowing for better query performance. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we also mention that druid.server.maxSize
should be sum of druid.segmentCache.locations
across all cache locations?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added info about druid.segmentCache.locations
and how it relates to druid.server.maxSize
|
||
Tuning the cluster so that each Task can accept 50 queries and 10 non-queries is a reasonable starting point. | ||
|
||
#### SSD storage |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about moving this section to above Task Configurations
since this is also a configuration for middleManagers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved to MM section
|
||
The TopN and GroupBy queries use these buffers to store intermediate computed results. As the buffer size increases, more data can be processed in a single pass. | ||
|
||
## GroupBy Merging Buffers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should it be groupBy v2 because it's the only one using merging buffers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added v2
here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh wait, this is a doc change. I'm changing my vote to just +1. |
I'm backporting this to 0.15.0-incubating since it contains changes for some scripts and tutorial configurations. |
…apache#7629) * Add basic tuning guide, getting started page, updated clustering docs * Add note about caching, fix tutorial paths * Adjust hadoop wording * Add license * Tweak * Shrink overlord heaps, fix tutorial urls * Tweak xlarge peon, update peon sizing * Update Data peon buffer size * Fix cluster start scripts * Add upper level _common to classpath * Fix cluster data/query confs * Address PR comments * Elaborate on connection pools * PR comments * Increase druid.broker.http.maxQueuedBytes * Add guidelines for broker backpressure * PR comments
…#7629) (#7684) * Add basic tuning guide, getting started page, updated clustering docs * Add note about caching, fix tutorial paths * Adjust hadoop wording * Add license * Tweak * Shrink overlord heaps, fix tutorial urls * Tweak xlarge peon, update peon sizing * Update Data peon buffer size * Fix cluster start scripts * Add upper level _common to classpath * Fix cluster data/query confs * Address PR comments * Elaborate on connection pools * PR comments * Increase druid.broker.http.maxQueuedBytes * Add guidelines for broker backpressure * PR comments
…apache#7629) (apache#7684) * Add basic tuning guide, getting started page, updated clustering docs * Add note about caching, fix tutorial paths * Adjust hadoop wording * Add license * Tweak * Shrink overlord heaps, fix tutorial urls * Tweak xlarge peon, update peon sizing * Update Data peon buffer size * Fix cluster start scripts * Add upper level _common to classpath * Fix cluster data/query confs * Address PR comments * Elaborate on connection pools * PR comments * Increase druid.broker.http.maxQueuedBytes * Add guidelines for broker backpressure * PR comments
…apache#7629) * Add basic tuning guide, getting started page, updated clustering docs * Add note about caching, fix tutorial paths * Adjust hadoop wording * Add license * Tweak * Shrink overlord heaps, fix tutorial urls * Tweak xlarge peon, update peon sizing * Update Data peon buffer size * Fix cluster start scripts * Add upper level _common to classpath * Fix cluster data/query confs * Address PR comments * Elaborate on connection pools * PR comments * Increase druid.broker.http.maxQueuedBytes * Add guidelines for broker backpressure * PR comments
This PR:
operations/basic-cluster-tuning.md
tutorials/cluster.md
with information on migrating from a single-server deployment, and changes the examples to use the newbin/start-cluster-*
scripts and configurations underexamples/conf/druid/cluster