Skip to content

Conversation

jseldess
Copy link
Contributor

@jseldess jseldess commented Sep 24, 2019

Increase optimal vCPUs to 32.

Update GCP and AWS recs as well. Holding off on Azure until
the upcoming cloud report, at which point we should revisit
our hardware recs more broadly.

Fixes #4711.

@jseldess jseldess requested review from bdarnell and rkruze September 24, 2019 18:54
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Contributor

@rkruze rkruze left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @bdarnell and @rkruze)

@jseldess jseldess force-pushed the prod-checklist-update branch from 4edce5d to e546174 Compare September 24, 2019 18:59
- The ideal configuration is 4-16 vCPUs, 8-64 GB memory nodes (2-4 GB of memory per vCPU).
- To optimize for throughput, use larger nodes, up to 16 vCPUs and 64 GB of RAM. Based on internal testing results, 16 vCPUs is the sweet spot for OLTP workloads.

To increase throughput further, add more nodes to the cluster instead of increasing node size; higher vCPUs will have NUMA](https://en.wikipedia.org/wiki/Non-uniform_memory_access)(non-uniform memory access) implications.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd remove this note about NUMA. We haven't explored this but we have no reason to believe it's a major concern at this point. The immediate reason why going to 32 vCPUs and beyond loses efficiency is simple mutex contention.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also for the record, on AWS m5/c5 nodes, you can go up to 48 vCPUs in a single NUMA group, so NUMA isn't a concern on this platform until you get to 72 vCPUs.

@jseldess jseldess force-pushed the prod-checklist-update branch from e546174 to 289fbb2 Compare September 24, 2019 20:18
@jseldess jseldess force-pushed the prod-checklist-update branch from 289fbb2 to 25c1407 Compare September 24, 2019 20:21
Copy link
Contributor

@bdarnell bdarnell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 10 of 10 files at r2.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @jseldess)


v19.1/deploy-cockroachdb-on-aws.md, line 72 at r2 (raw file):

- Run at least 3 nodes to ensure survivability.

- Use `m` (general purpose), `c` (compute-optimized), or `i` (storage-optimized) [instances](https://aws.amazon.com/ec2/instance-types/), with SSD-backed [EBS volumes](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EBSVolumeTypes.html) or [Instance Store volumes](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ssd-instance-store.html). For example, Cockroach Labs has used `c5d.8xlarge` (36 vCPUs and 72 GiB of RAM per instance, NVMe SSD) for internal testing.

Our most-tested configuration is still the 4xlarge, i think.


v19.1/deploy-cockroachdb-on-google-cloud-platform.md, line 61 at r2 (raw file):

- Run at least 3 nodes to [ensure survivability](recommended-production-settings.html#topology).

- Use `n1-standard` or `n1-highcpu` [predefined VMs](https://cloud.google.com/compute/pricing#predefined_machine_types), or [custom VMs](https://cloud.google.com/compute/pricing#custommachinetypepricing), with [Local SSDs](https://cloud.google.com/compute/docs/disks/#localssds) or [SSD persistent disks](https://cloud.google.com/compute/docs/disks/#pdspecs). For example, Cockroach Labs has used `n1-standard-32` (32 vCPUs and 60 GB of RAM per VM, local SSD) for internal testing.

Ditto, for n1-standard-16.

Increase optimal vCPUs to 32.

Update GCP and AWS recs as well. Holding off on Azure until
the upcoming cloud report, at which point we should revisit
our hardware recs more broadly.

Fixes #4711.
@jseldess jseldess force-pushed the prod-checklist-update branch from 25c1407 to 595b04c Compare September 25, 2019 01:19
@jseldess jseldess merged commit 62f2864 into master Sep 25, 2019
@jseldess jseldess deleted the prod-checklist-update branch September 25, 2019 01:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Clarify hardware recs for throughput
4 participants