doc: document repair/scrub features #9032

Merged
merged 2 commits into from Jan 10, 2017

Projects

None yet

4 participants

@theanalyst theanalyst commented on an outdated diff May 10, 2016
doc/rados/troubleshooting/troubleshooting-pg.rst
-due to an error during scrubbing. If the inconsistency is due to disk errors,
-check your disks.
+due to an error during scrubbing. As always, we can identify the inconsistent
+placement group(s) with::
+
+ $ ceph health detail
+ HEALTH_ERR 1 pgs inconsistent; 1 scrub errors
+ pg 0.6 is active+clean+inconsistent, acting [0,1,2]
+ 1 scrub errors
+
+Or if you prefer inspecting the output in a programmatic way::
+
+ $ rados list-inconsistent-pg rbd --format=json
+ ["0.6"]
+
+There is only one consistent state, but in the worse case, we could have
@theanalyst
theanalyst May 10, 2016 Member

s/worse/worst

@theanalyst
Member

lgtm apart from the nit

@tchaikov
Contributor

thanks @theanalyst

changelog

  • s/worse/worst/
@tchaikov
Contributor

@dzafman this PR follows the output format from #8983

@dachary
Member
dachary commented Nov 22, 2016

jenkins test this please (jenkins hung)

@dzafman

Update based on current code. See also the following json-schema files.

../doc/rados/command/list-inconsistent-obj.json
../doc/rados/command/list-inconsistent-snap.json

+
+Or if you prefer inspecting the output in a programmatic way::
+
+ $ rados list-inconsistent-pg rbd --format=json
@dzafman
dzafman Nov 23, 2016 Member

The --format=json is optional because that is the default output format.

+different inconsistencies in multiple perspectives found in more than one
+objects. If an object named ``foo`` in PG ``0.6`` is truncated, we will have::
+
+ $ rados list-inconsistent-obj 0.6 --format=json-pretty
@dzafman
dzafman Nov 23, 2016 Member

This output has changed since ceph/ceph#9613 merged.

@dachary dachary removed their assignment Nov 23, 2016
@tchaikov tchaikov assigned tchaikov and unassigned dzafman Nov 24, 2016
@tchaikov
Contributor

@dzafman comments addressed and repushed.

@tchaikov tchaikov assigned dzafman and unassigned tchaikov Nov 30, 2016
@dzafman

A few suggested improvements. We should also follow up with more examples unless you want to do it now.

+ inconsistencies.
+* The inconsistencies fall into two categories:
+
+ * ``errors``: inconsistencies between different shards:
@dzafman
dzafman Nov 30, 2016 Member

These errors indicate inconsistencies between shards without a determination of which shard(s) are bad. Check for the shards array errors, if available, to pinpoint the problem.

+
+ * ``data_digest_mismatch``: the digest of the replica read from OSD.2 is different from the ones of OSD.0 and OSD.1
+ * ``size_mismatch``: the size of the replica read from OSD.2 is 0, while the size reported by OSD.0 and OSD.1 is 968.
+ * ``union_shard_errors``: inconsistencies between the data and metadata read from a certain shard:
@dzafman
dzafman Nov 30, 2016 Member

The union_shard_errors is just the union of all shard specific errors in the shards array. This includes errors like read_error. Look at the shards array to determine which shard has which error(s).

+ * ``data_digest_mismatch``: the digest of the replica read from OSD.2 is different from the ones of OSD.0 and OSD.1
+ * ``size_mismatch``: the size of the replica read from OSD.2 is 0, while the size reported by OSD.0 and OSD.1 is 968.
+ * ``union_shard_errors``: inconsistencies between the data and metadata read from a certain shard:
+
@dzafman
dzafman Nov 30, 2016 Member

shards array errors are set for the given shard that has the problem.

+ * ``size_mismatch``: the size of the replica read from OSD.2 is 0, while the size reported by OSD.0 and OSD.1 is 968.
+ * ``union_shard_errors``: inconsistencies between the data and metadata read from a certain shard:
+
+ * ``data_digest_mismatch_oi``: the digest stored in the object-info is not ``0xffffffff``, which is calculated from the shard read from OSD.2
@dzafman
dzafman Nov 30, 2016 Member

The errors ending in oi indicate a comparison with selected_object_info.

@tchaikov tchaikov assigned tchaikov and unassigned dzafman Dec 1, 2016
tchaikov added some commits May 10, 2016
@tchaikov tchaikov doc: update rados.8 with list-inconsistent-* commands
Signed-off-by: Kefu Chai <kchai@redhat.com>
2203a13
@tchaikov tchaikov doc: update troubleshooting-pg with scrub/repair feature
update rados/troubleshooting/troubleshooting-pg.rst with the
scrub/repair feature

Fixes: http://tracker.ceph.com/issues/15786
Signed-off-by: David Zafman <dzafman@redhat.com>
Signed-off-by: Kefu Chai <kchai@redhat.com>
fa1f4ce
@tchaikov
Contributor
tchaikov commented Jan 6, 2017

@dzafman could you take a look again?

  • incorporated the improvement you suggested.
  • use "``" instead of "`" to quote the monospaced words.

i will add more examples when playing with #8931 and #9203.

@dzafman
dzafman approved these changes Jan 9, 2017 View changes
@tchaikov tchaikov merged commit 327791f into ceph:master Jan 10, 2017

3 checks passed

Signed-off-by all commits in this PR are signed
Details
Unmodifed Submodules submodules for project are unmodified
Details
default Build finished.
Details
@tchaikov tchaikov deleted the tchaikov:wip-15786 branch Jan 10, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment