Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

doc/rados: edit placement-groups.rst (2 of x) #51991

Conversation

zdover23
Copy link
Contributor

@zdover23 zdover23 commented Jun 10, 2023

Edit doc/rados/operations/placement-groups.rst.

https://tracker.ceph.com/issues/58485

Contribution Guidelines

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox
  • jenkins test windows

Copy link
Contributor

@anthonyeleven anthonyeleven left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You know the drill

Setting a Minimum Number of PGs and a Maximum Number of PGs
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

If a minimum is set, then Ceph will not itself reduce (or recommend that you
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/or/nor/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Accepted.

for you based on how much data is stored in the pool (see above, :ref:`pg-autoscaler`).
If you opt not to specify ``pg_num`` in this command, the cluster uses the PG
autoscaler to automatically configure the parameter in accordance with the
amount of data that is stored in the pool (see :ref:`pg-autoscaler` above).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, won't it use osd_pool_default_pg_num? As written this assumes that the autoscaler is enabled.

the additional of the balancer (which is also enabled by default), a
value of more like 50 PGs per OSD is probably reasonable. The
challenge (which the autoscaler normally does for you), is to:
The traditional rule of thumb has held that there should be 100 PGs per OSD.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tradition was 200, retconned to 100 a few years ago. I personally disagree with the party line here, but this isn't about me. Also, is rule of thumb a thing that might not be known to non-native English readers?

Suggest

Without the balancer, approximately 100 PG replicas on each OSD is the suggested target. With the balancer, however, an initial target of 50 PG replicas on each OSD is reasonable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Accepted with slight modifications.

consideration
- the number of PGs per pool should be proportional to the amount of
data in the pool
- there should be 50-100 PGs per pool, taking into account the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/pool/OSD/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Accepted.

- the number of PGs per pool should be proportional to the amount of
data in the pool
- there should be 50-100 PGs per pool, taking into account the
replication overhead or erasure-coding fan-out of each PG across OSDs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

each PG's replicas

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Accepted.


As long as there are one or two orders of magnitude more PGs than OSDs, the
distribution is likely to be even. For example: 256 PGs for 3 OSDs, 512 PGs for
10 OSDs, or 1024 PGs for 10 OSDs.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a note about how pg_num not being a power of 2 can work against uniform distribution. Don't go into why, if someone really cares they can read chapter 8 of my book, so long as they pre-dose with their favorite headache medication.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is covered later on this same page (and will be edited in the "3 of x" PR in this series).

10 OSDs, or 1024 PGs for 10 OSDs.

However, uneven data distribution can emerge due to factors other than the
ratio of OSDs to PGs. For example, since CRUSH does not take into account the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PGs to OSDs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Accepted.


However, uneven data distribution can emerge due to factors other than the
ratio of OSDs to PGs. For example, since CRUSH does not take into account the
size of the objects, the presence of a few very large objects can create an
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

size of RADOS objects

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Accepted in all instances.

at all times and even more during recovery. Sharing this overhead by
clustering objects within a placement group is one of the main reasons
they exist.
Every PG in the cluster imposes additional memory, network, and CPU demands
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/additional//

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Accepted.

clustering objects within a placement group is one of the main reasons
they exist.
Every PG in the cluster imposes additional memory, network, and CPU demands
upon OSDs and MONs. These needs must be met at all times and are still more
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and are increased during recovery.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Accepted.

Edit doc/rados/operations/placement-groups.rst.

https://tracker.ceph.com/issues/58485

Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com>
Signed-off-by: Zac Dover <zac.dover@proton.me>
@zdover23 zdover23 force-pushed the wip-doc-2023-06-09-rados-operations-placement-groups-2-of-x branch from 04a439d to 8bdd271 Compare June 11, 2023 00:06
@zdover23 zdover23 merged commit 9f217cf into ceph:main Jun 11, 2023
11 checks passed
@zdover23
Copy link
Contributor Author

#51996 - Reef backport
#51997 - Quincy backport

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants