Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

doc/start: update hardware recs #47109

Merged

Conversation

zdover23
Copy link
Contributor

@zdover23 zdover23 commented Jul 14, 2022

This PR picks up the parts of
#44466
that were not merged back in January, when that
pull request was raised.

Matters added here:

  • improved organzation of CPU section
  • emphasis of IOPs per core over cores per OSD

Signed-off-by: Zac Dover zac.dover@gmail.com

Contribution Guidelines

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox
  • jenkins test windows

This PR picks up the parts of
ceph#44466
that were not merged back in January, when that
pull request was raised.

Matters added here:
* improved organzation of matter
* emphasis of IOPs per core over cores per OSD

Signed-off-by: Zac Dover <zac.dover@gmail.com>
@zdover23 zdover23 requested a review from a team as a code owner July 14, 2022 22:57
Copy link
Contributor

@anthonyeleven anthonyeleven left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, a worthy improvement to the slippery topic of hardware recommendations.
I've made a few comments/suggestions.

separate hosts to avoid resource contention.

CephFS metadata servers (MDS) are CPU-intensive. CephFS metadata servers (MDS)
should therefore have quad-core (or better) CPUs and high clock rates (GHz). OSD
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we might do well to be more clear that MDS nodes benefit a lot more from clock rate than from CPUs, so a 4-core 3.5 GHz model would be preferable to an 8-core 2.5 GHz SKU. I think the current MDS may be single-threaded, so maybe something like "(MDS) don't need more than 4 cores, but should have as high a clock rate (GHz) as possible". Or "frequency" instead of "clock rate", I think in terms of the latter, but the former might be more common with our audience.

should therefore have quad-core (or better) CPUs and high clock rates (GHz). OSD
nodes need enough processing power to run the RADOS service, to calculate data
placement with CRUSH, to replicate data, and to maintain their own copies of the
cluster map.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to mention EC parity / hash computation?

the number of cores per OSD, but this cores-per-OSD metric is no longer as
useful a metric as the number of cycles per IOP and the number of IOPs per OSD.
For example, for NVMe drives, Ceph can easily utilize five or six cores on real
clusters and up to about fourteen cores on single OSDs in isolation. So cores
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From discussion with the good Mr. Nelson I know what isolation means here, but I might ask if that info is useful to our readers, or if it might confuse them. I also am often uncertain re whether we're talking about physical cores or [hyper] threads; I suspect these numbers are the latter.

modest processors. If your host machines will run CPU-intensive processes in
addition to Ceph daemons, make sure that you have enough processing power to
run both the CPU-intensive processes and the Ceph daemons. (OpenStack Nova is
one such example of a CPU-intensive process.) We recommend that you run
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe drop the parens, since that's a standalone sentence? Or am I being sententious? Maybe word this as "OpenStack nova-compute or Proxmox" -- we seem to see a growing population of Ceph users by virtue of converged Proxmox deployments.

Copy link

@mheler mheler Jul 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be changed to qemu-kvm as an example, and not openstack nova. Nova itself isn't very CPU intensive, but qemu-kvm would cover almost all use cases where ceph would be co-located with virtual machines, including kubernetes situations were vms and osds are residing on the same host.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See Zac's comment below about a followup issue

@neha-ojha neha-ojha requested a review from markhpc July 15, 2022 14:32
@zdover23 zdover23 merged commit f43c7a6 into ceph:main Jul 16, 2022
@zdover23
Copy link
Contributor Author

https://tracker.ceph.com/issues/55938 - Anthony's comments are collected in this tracker bug, which is the June 2022 hardware recommendations documentation tracker (and the page on which I track all mid-2022 hardware recommendations documentation updates)

@zdover23
Copy link
Contributor Author

#47122 - Pacific backport
#47123 - Quincy backport

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants