New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
doc/start: update hardware recs #47109
doc/start: update hardware recs #47109
Conversation
This PR picks up the parts of ceph#44466 that were not merged back in January, when that pull request was raised. Matters added here: * improved organzation of matter * emphasis of IOPs per core over cores per OSD Signed-off-by: Zac Dover <zac.dover@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, a worthy improvement to the slippery topic of hardware recommendations.
I've made a few comments/suggestions.
separate hosts to avoid resource contention. | ||
|
||
CephFS metadata servers (MDS) are CPU-intensive. CephFS metadata servers (MDS) | ||
should therefore have quad-core (or better) CPUs and high clock rates (GHz). OSD |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we might do well to be more clear that MDS nodes benefit a lot more from clock rate than from CPUs, so a 4-core 3.5 GHz model would be preferable to an 8-core 2.5 GHz SKU. I think the current MDS may be single-threaded, so maybe something like "(MDS) don't need more than 4 cores, but should have as high a clock rate (GHz) as possible". Or "frequency" instead of "clock rate", I think in terms of the latter, but the former might be more common with our audience.
should therefore have quad-core (or better) CPUs and high clock rates (GHz). OSD | ||
nodes need enough processing power to run the RADOS service, to calculate data | ||
placement with CRUSH, to replicate data, and to maintain their own copies of the | ||
cluster map. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to mention EC parity / hash computation?
the number of cores per OSD, but this cores-per-OSD metric is no longer as | ||
useful a metric as the number of cycles per IOP and the number of IOPs per OSD. | ||
For example, for NVMe drives, Ceph can easily utilize five or six cores on real | ||
clusters and up to about fourteen cores on single OSDs in isolation. So cores |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From discussion with the good Mr. Nelson I know what isolation means here, but I might ask if that info is useful to our readers, or if it might confuse them. I also am often uncertain re whether we're talking about physical cores or [hyper] threads; I suspect these numbers are the latter.
modest processors. If your host machines will run CPU-intensive processes in | ||
addition to Ceph daemons, make sure that you have enough processing power to | ||
run both the CPU-intensive processes and the Ceph daemons. (OpenStack Nova is | ||
one such example of a CPU-intensive process.) We recommend that you run |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe drop the parens, since that's a standalone sentence? Or am I being sententious? Maybe word this as "OpenStack nova-compute
or Proxmox" -- we seem to see a growing population of Ceph users by virtue of converged Proxmox deployments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be changed to qemu-kvm as an example, and not openstack nova. Nova itself isn't very CPU intensive, but qemu-kvm would cover almost all use cases where ceph would be co-located with virtual machines, including kubernetes situations were vms and osds are residing on the same host.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See Zac's comment below about a followup issue
https://tracker.ceph.com/issues/55938 - Anthony's comments are collected in this tracker bug, which is the June 2022 hardware recommendations documentation tracker (and the page on which I track all mid-2022 hardware recommendations documentation updates) |
This PR picks up the parts of
#44466
that were not merged back in January, when that
pull request was raised.
Matters added here:
Signed-off-by: Zac Dover zac.dover@gmail.com
Contribution Guidelines
To sign and title your commits, please refer to Submitting Patches to Ceph.
If you are submitting a fix for a stable branch (e.g. "pacific"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.
Checklist
Show available Jenkins commands
jenkins retest this please
jenkins test classic perf
jenkins test crimson perf
jenkins test signed
jenkins test make check
jenkins test make check arm64
jenkins test submodules
jenkins test dashboard
jenkins test dashboard cephadm
jenkins test api
jenkins test docs
jenkins render docs
jenkins test ceph-volume all
jenkins test ceph-volume tox
jenkins test windows