Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

doc/rados/operations/health-checks: osd section #16611

Merged
merged 5 commits into from Aug 1, 2017

Conversation

Projects
None yet
5 participants
@liewegas
Copy link
Member

liewegas commented Jul 26, 2017

First paragraph: explain what the error means.

Second or later paragraph: describe steps to fix or mitigate.

Signed-off-by: Sage Weil sage@redhat.com

@liewegas liewegas requested a review from jcsp Jul 26, 2017

@liewegas liewegas added this to the luminous milestone Jul 26, 2017

@liewegas liewegas force-pushed the liewegas:wip-doc-health branch from 97774be to cfded30 Jul 26, 2017

One or more cluster flags of interest has been set. These flags include:

* *full* - the cluster is flagged as full and cannot service writes
* *pauserd*, *pausewr*, *pauserec* - paused reads, writes, or

This comment has been minimized.

Copy link
@xiexingguo

xiexingguo Jul 27, 2017

Member

pauserec is not available(functionable)

@liewegas liewegas force-pushed the liewegas:wip-doc-health branch from cfded30 to 6ea0b97 Jul 27, 2017

@liewegas liewegas requested a review from jdurgin Jul 27, 2017

@jcsp

jcsp approved these changes Jul 28, 2017

Copy link
Contributor

jcsp left a comment

All looks good to me


OSD_OUT_OF_ORDER_FULL
_____________________

The utilization thresholds for `backfillfull`, `nearfull`, `full`,
and/or `failsafe_full` are not ascending. In particular, we expect
`backfillfull < nearful`, `nearfull < full`, and `full <

This comment has been minimized.

Copy link
@jdurgin

jdurgin Aug 1, 2017

Member

typo: missing l in nearful


ceph osd dump | grep full_ratio

A short-term workaround restore write availability is to raise the full

This comment has been minimized.

Copy link
@jdurgin

jdurgin Aug 1, 2017

Member

s/restore/to restore/

The CRUSH map is using an older, non-optimal method for calculating
intermediate weight values for ``straw`` buckets.

The CRUSH map should be update dot use the newer method

This comment has been minimized.

Copy link
@jdurgin

jdurgin Aug 1, 2017

Member

s/ dot/d to/

This comment has been minimized.

Copy link
@tchaikov

tchaikov Aug 1, 2017

Contributor

s/use/using/


You can either raise the pool quota with::

ceph osd pool set-quota <poolname> max_object <num-objects>

This comment has been minimized.

Copy link
@jdurgin

jdurgin Aug 1, 2017

Member

'max_object' should be plural

is normally and indication that the PG count was increased without
also increasing the placement behavior.

This is sometime done deliberately to separate out the `split` step

This comment has been minimized.

Copy link
@jdurgin

jdurgin Aug 1, 2017

Member

'sometime' should be plural


SMALLER_PGP_NUM
_______________

One or more pools has a ``pgp_num`` value less than ``pg_num``. This
is normally and indication that the PG count was increased without

This comment has been minimized.

Copy link
@tchaikov

tchaikov Aug 1, 2017

Contributor

s/and/an/

liewegas added some commits Jul 27, 2017

doc/rados/operations/health-checks: osd section
First paragraph: explain what the error means.

Second or later paragraph: describe steps to fix or mitigate.

Signed-off-by: Sage Weil <sage@redhat.com>
mon/PGMap: 'incomplete' means data is unavailable
Well, data is certainly unavailable, and may also be
degraded in the sense that we can't peer.  I think
unavailable is the more severe of the two, though, so
let's put it there!

Signed-off-by: Sage Weil <sage@redhat.com>
mon/PGMap: put the _toofull states under DEGRADED_FULL
I think this was an oversight?

Signed-off-by: Sage Weil <sage@redhat.com>
mon/PGMap: include which pgs have unfound objects in detail
Signed-off-by: Sage Weil <sage@redhat.com>
doc/rados/operations/health-checks: add PG health check commentary
Include a link to pg-repair.rst, although there is no
content there yet.

Signed-off-by: Sage Weil <sage@redhat.com>

@liewegas liewegas force-pushed the liewegas:wip-doc-health branch from 1813df0 to dbb1dd3 Aug 1, 2017

@liewegas

This comment has been minimized.

Copy link
Member Author

liewegas commented Aug 1, 2017

fixed typos, thanks!

@liewegas liewegas merged commit 0afffa5 into ceph:master Aug 1, 2017

0 of 4 checks passed

Signed-off-by checking if commits are signed
Details
Unmodified Submodules checking if PR has modified submodules
Details
make check running make check
Details
make check (arm64) running make check
Details

@liewegas liewegas deleted the liewegas:wip-doc-health branch Aug 1, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.