Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add basic tuning guide, getting started page, updated clustering docs #7629

Merged
merged 19 commits into from
May 16, 2019

Conversation

jon-wei
Copy link
Contributor

@jon-wei jon-wei commented May 10, 2019

This PR:

  • Adds a basic cluster tuning guide at operations/basic-cluster-tuning.md
  • Updates tutorials/cluster.md with information on migrating from a single-server deployment, and changes the examples to use the new bin/start-cluster-* scripts and configurations under examples/conf/druid/cluster
  • Adds a "Getting Started" page that links to the design/ingestion overview, the single-machine quickstarts, the clustering guide, and the community page

To estimate total memory usage of the Historical under these guidelines:

- Heap: `(0.5GB * number of CPU cores) + (2 * total size of lookup maps) + druid.cache.sizeInBytes`
- Direct Memory: `(druid.processing.numThreads + druid.processing.numMergeBuffers + 1) * druid.processing.buffer.sizeBytes`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Free space for page cache seems important enough that it should be listed here I think as going into the total calculation of historical usage.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a note on the importance of free system memory here


The Broker heap requirements scale based on the number of segments in the cluster, and the total data size of the segments.

The heap size will vary based on data size and usage patterns, but 4G to 8G is a good starting point for a small or medium cluster. For a rough estimate of memory requirements on the high end, very large clusters with a node count on the order of ~100 nodes may need Broker heaps of 30GB-60GB.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we clarify what we consider a "small or medium cluster"?

Copy link
Contributor Author

@jon-wei jon-wei May 10, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added clarification here (~15 nodes or less)


The biggest contributions to heap usage on Brokers are:
- Partial unmerged query results from Historicals and Tasks
- The segment timeline
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be worth including that this also consists of like the locations of all the segments on all historicals and realtime tasks, but I'm not sure how to do that concisely.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added an explanation of the timeline and cached metadata


#### Number of Brokers

A 1:15 ratio of Brokers to Historicals is a reasonable starting point (this is not a hard rule).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be obvious, but regardless maybe worth calling out the exception that if you need HA for queries that you should use that ratio after the first 2 brokers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added note on HA


The heap requirements of the Overlord scale with the number of servers, segments, and tasks in the cluster.

You can set the Overlord heap to the same size as your Broker heap, or slightly smaller: both services have to process cluster-wide state and answer API requests about this state.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't sound right to me, an overlord isn't as heavy weight as a coordinator or broker

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adjust overlord recommendation to 25%-50% of coordinator heap


Please see the [General Connection Pool Guidelines](#general-connection-pool-guidelines) section for an overview of connection pool configuration.

On the Brokers, please ensure that the sum of `druid.broker.http.numConnections` across all the Brokers is slightly lower than the value of `druid.server.http.numThreads` on your Historicals.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be just historicals or also include realtime tasks?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added Tasks here


`druid.processing.buffer.sizeBytes` is a closely related property that controls the size of the off-heap buffers allocated to the processing threads.

One buffer is allocated for each processing thread. A size between 500MB and 1GB (the default) is a reasonable choice for general use.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1G isn't the default, it is currently auto calculated if it can detect the amount of direct memory according to the formula, for java 8 at least.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed mention of default

@fjy fjy added this to the 0.15.0 milestone May 10, 2019

For Historicals, `druid.server.http.numThreads` should be set to a value slightly higher than the sum of `druid.broker.http.numConnections` across all the Brokers in the cluster.

Tuning the cluster so that each Historical can accept 50 queries and 10 non-queries is a reasonable starting point.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me this statement is not clear. What is the definition of queries vs. non-queries and how to configure a number for each?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the "General Connection Pool Guidelines" section with definition of queries vs non-queries, added a note on exact tuning-> depends on your inflow vs. drain rate for requests which depends on the specific queries you're running


In production, we recommend deploying multiple Master servers with Coordinator and Overlord processes in a fault-tolerant configuration as well.
In production, we recommend deploying multiple Master servers with Coordinator and Overlord processes in a fault-tolerant configuration.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably multiple query servers too for fault tolerance on the user side of things?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adjusted this to recommend multiple master/query servers in production while dropping the example to one master server

-XX:MaxDirectMemorySize=12g
-Xms12g
-Xmx12g
-XX:MaxDirectMemorySize=6g
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you've sized this to run on 8 core 32g machine, should you add some more merge buffers or larger buffer sizes to use up a bit more of the space, and any of the rest give to heap for breathing room?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I doubled the number of merge buffers


The example cluster above is chosen as a single example out of many possible ways to size a Druid cluster.

You can choose smaller/larger hardware or less/more servers for your specific needs and constraints.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it might also be worth mentioning here that advanced use cases can also choose not to co-locate services, to scale out the workload? It doesn't need to be very detailed, it just isn't incompatible with the following part that directs the reader to the basic tuning guide which describes how all the processes use resources independently, so it seems like there is no reason to leave it out since this section is about making choices.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a note on not colocating


In production, we recommend deploying multiple Master servers with Coordinator and Overlord processes in a fault-tolerant configuration as well.
In production, we recommend deploying multiple Master servers and multiple Query servers in a fault-tolerant configuration based on your specific fault-tolerance needs, but you can get started quickly with one Master and one Query server and add more servers later.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about migrating the metadata store from local derby to something else? I think we should talk about HA and the metadata store

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nevermind, it is covered later

@fjy
Copy link
Contributor

fjy commented May 15, 2019

👍

Copy link
Contributor

@jihoonson jihoonson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This documents look very useful. Thanks @jon-wei.


Segments are memory-mapped by Historical processes using any available free system memory (i.e., memory not used by the Historical JVM and heap/direct memory buffers or other processes on the system). Segments that are not currently in memory will be paged from disk when queried.

Therefore, `druid.server.maxSize` should be set such that a Historical is not allocated an excessive amount of segment data. As the value of (`free system memory` / `druid.server.maxSize`) increases, a greater proportion of segments can be kept in memory, allowing for better query performance.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also mention that druid.server.maxSize should be sum of druid.segmentCache.locations across all cache locations?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added info about druid.segmentCache.locations and how it relates to druid.server.maxSize


Tuning the cluster so that each Task can accept 50 queries and 10 non-queries is a reasonable starting point.

#### SSD storage
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about moving this section to above Task Configurations since this is also a configuration for middleManagers?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to MM section


The TopN and GroupBy queries use these buffers to store intermediate computed results. As the buffer size increases, more data can be processed in a single pass.

## GroupBy Merging Buffers
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should it be groupBy v2 because it's the only one using merging buffers?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added v2 here

Copy link
Contributor

@jihoonson jihoonson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 after CI. @jon-wei the CI failure looks fixed in #7667. Would you please merge master and trigger the CI?

@jihoonson
Copy link
Contributor

Oh wait, this is a doc change. I'm changing my vote to just +1.

@fjy fjy merged commit d667655 into apache:master May 16, 2019
@jihoonson
Copy link
Contributor

I'm backporting this to 0.15.0-incubating since it contains changes for some scripts and tutorial configurations.

jon-wei added a commit to jon-wei/druid that referenced this pull request May 17, 2019
…apache#7629)

* Add basic tuning guide, getting started page, updated clustering docs

* Add note about caching, fix tutorial paths

* Adjust hadoop wording

* Add license

* Tweak

* Shrink overlord heaps, fix tutorial urls

* Tweak xlarge peon, update peon sizing

* Update Data peon buffer size

* Fix cluster start scripts

* Add upper level _common to classpath

* Fix cluster data/query confs

* Address PR comments

* Elaborate on connection pools

* PR comments

* Increase druid.broker.http.maxQueuedBytes

* Add guidelines for broker backpressure

* PR comments
jon-wei added a commit that referenced this pull request May 17, 2019
…#7629) (#7684)

* Add basic tuning guide, getting started page, updated clustering docs

* Add note about caching, fix tutorial paths

* Adjust hadoop wording

* Add license

* Tweak

* Shrink overlord heaps, fix tutorial urls

* Tweak xlarge peon, update peon sizing

* Update Data peon buffer size

* Fix cluster start scripts

* Add upper level _common to classpath

* Fix cluster data/query confs

* Address PR comments

* Elaborate on connection pools

* PR comments

* Increase druid.broker.http.maxQueuedBytes

* Add guidelines for broker backpressure

* PR comments
jihoonson pushed a commit to implydata/druid-public that referenced this pull request Jun 4, 2019
…apache#7629) (apache#7684)

* Add basic tuning guide, getting started page, updated clustering docs

* Add note about caching, fix tutorial paths

* Adjust hadoop wording

* Add license

* Tweak

* Shrink overlord heaps, fix tutorial urls

* Tweak xlarge peon, update peon sizing

* Update Data peon buffer size

* Fix cluster start scripts

* Add upper level _common to classpath

* Fix cluster data/query confs

* Address PR comments

* Elaborate on connection pools

* PR comments

* Increase druid.broker.http.maxQueuedBytes

* Add guidelines for broker backpressure

* PR comments
jihoonson pushed a commit to implydata/druid-public that referenced this pull request Jun 26, 2019
…apache#7629)

* Add basic tuning guide, getting started page, updated clustering docs

* Add note about caching, fix tutorial paths

* Adjust hadoop wording

* Add license

* Tweak

* Shrink overlord heaps, fix tutorial urls

* Tweak xlarge peon, update peon sizing

* Update Data peon buffer size

* Fix cluster start scripts

* Add upper level _common to classpath

* Fix cluster data/query confs

* Address PR comments

* Elaborate on connection pools

* PR comments

* Increase druid.broker.http.maxQueuedBytes

* Add guidelines for broker backpressure

* PR comments
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants