-
Notifications
You must be signed in to change notification settings - Fork 474
Clarify hardware recs for throughput #5472
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 1 of 0 LGTMs obtained (waiting on @bdarnell and @rkruze)
4edce5d
to
e546174
Compare
- The ideal configuration is 4-16 vCPUs, 8-64 GB memory nodes (2-4 GB of memory per vCPU). | ||
- To optimize for throughput, use larger nodes, up to 16 vCPUs and 64 GB of RAM. Based on internal testing results, 16 vCPUs is the sweet spot for OLTP workloads. | ||
|
||
To increase throughput further, add more nodes to the cluster instead of increasing node size; higher vCPUs will have NUMA](https://en.wikipedia.org/wiki/Non-uniform_memory_access)(non-uniform memory access) implications. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd remove this note about NUMA. We haven't explored this but we have no reason to believe it's a major concern at this point. The immediate reason why going to 32 vCPUs and beyond loses efficiency is simple mutex contention.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also for the record, on AWS m5/c5 nodes, you can go up to 48 vCPUs in a single NUMA group, so NUMA isn't a concern on this platform until you get to 72 vCPUs.
e546174
to
289fbb2
Compare
289fbb2
to
25c1407
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 10 of 10 files at r2.
Reviewable status:complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @jseldess)
v19.1/deploy-cockroachdb-on-aws.md, line 72 at r2 (raw file):
- Run at least 3 nodes to ensure survivability. - Use `m` (general purpose), `c` (compute-optimized), or `i` (storage-optimized) [instances](https://aws.amazon.com/ec2/instance-types/), with SSD-backed [EBS volumes](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EBSVolumeTypes.html) or [Instance Store volumes](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ssd-instance-store.html). For example, Cockroach Labs has used `c5d.8xlarge` (36 vCPUs and 72 GiB of RAM per instance, NVMe SSD) for internal testing.
Our most-tested configuration is still the 4xlarge, i think.
v19.1/deploy-cockroachdb-on-google-cloud-platform.md, line 61 at r2 (raw file):
- Run at least 3 nodes to [ensure survivability](recommended-production-settings.html#topology). - Use `n1-standard` or `n1-highcpu` [predefined VMs](https://cloud.google.com/compute/pricing#predefined_machine_types), or [custom VMs](https://cloud.google.com/compute/pricing#custommachinetypepricing), with [Local SSDs](https://cloud.google.com/compute/docs/disks/#localssds) or [SSD persistent disks](https://cloud.google.com/compute/docs/disks/#pdspecs). For example, Cockroach Labs has used `n1-standard-32` (32 vCPUs and 60 GB of RAM per VM, local SSD) for internal testing.
Ditto, for n1-standard-16.
Increase optimal vCPUs to 32. Update GCP and AWS recs as well. Holding off on Azure until the upcoming cloud report, at which point we should revisit our hardware recs more broadly. Fixes #4711.
25c1407
to
595b04c
Compare
Increase optimal vCPUs to 32.
Update GCP and AWS recs as well. Holding off on Azure until
the upcoming cloud report, at which point we should revisit
our hardware recs more broadly.
Fixes #4711.