Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

osd/scrub: Add stats to PG dump for number of objects scrubbed #42735

Merged
merged 4 commits into from
Jan 14, 2022

Conversation

amathuria
Copy link
Contributor

Addition of a new column in PG dump, OBJECTS_SCRUBBED, which keeps track of the number of objects scrubbed.

Signed-off-by: Aishwarya Mathuria amathuri@redhat.com

Checklist

  • References tracker ticket
  • Updates documentation if necessary
  • Includes tests for new functionality or reproducer for bug

Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox

@jdurgin
Copy link
Member

jdurgin commented Aug 17, 2021

can you add a test for this? perhaps the existing scrub tests could check that the field is populated?

@@ -1279,6 +1279,8 @@ void PgScrubber::scrub_compare_maps()
{
dout(10) << __func__ << " has maps, analyzing" << dendl;

int64_t primary_objects = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not needed (esp. so far from the usage point)

@@ -1319,6 +1321,8 @@ void PgScrubber::scrub_compare_maps()

dout(2) << __func__ << ": primary (" << m_pg->get_primary() << ") has "
<< m_primary_scrubmap.objects.size() << " items" << dendl;
primary_objects = m_primary_scrubmap.objects.size();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

int primary_objects = ...

src/osd/PG.h Show resolved Hide resolved
@neha-ojha
Copy link
Member

jenkins test api

@github-actions
Copy link

This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved

@stale
Copy link

stale bot commented Jan 9, 2022

This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days.
If you are a maintainer or core committer, please follow-up on this pull request to identify what steps should be taken by the author to move this proposed change forward.
If you are the author of this pull request, thank you for your proposed contribution. If you believe this change is still appropriate, please ensure that any feedback has been addressed and ask for a code review.

@stale stale bot added stale and removed stale labels Jan 9, 2022
Addition of a new column in PG dump, OBJECTS_SCRUBBED, which keeps track of the number of objects scrubbed.

Signed-off-by: Aishwarya Mathuria <amathuri@redhat.com>
…er of objects in a PG once a scrub finishes

Signed-off-by: Aishwarya Mathuria <amathuri@redhat.com>
Copy link
Contributor

@ronen-fr ronen-fr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM apart from minor comments.
'primary_objects' fix is the one I would consider a must.


TESTDATA="testdata.$$"

setup $dir || return 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note that standard_scrub_cluster() in qa/standalone/scrub/scrub_helpers.sh might save you from some of this boilerplate startup code

@@ -1370,6 +1370,8 @@ void PgScrubber::scrub_compare_maps()
{
dout(10) << __func__ << " has maps, analyzing" << dendl;

int64_t primary_objects = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not just declare primary_objects where it's used? (if at all)

@@ -1410,6 +1412,8 @@ void PgScrubber::scrub_compare_maps()

dout(2) << __func__ << ": primary (" << m_pg->get_primary() << ") has "
<< m_primary_scrubmap.objects.size() << " items" << dendl;
primary_objects = m_primary_scrubmap.objects.size();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would just skip 'primary_objects' altogether (the only good reason to have it would
have been to use it in the dout() in the previous line)

Signed-off-by: Aishwarya Mathuria <amathuri@redhat.com>
@amathuria
Copy link
Contributor Author

The pg dump output will now look like this:
Before scrub:

PG_STAT  OBJECTS  MISSING_ON_PRIMARY  DEGRADED  MISPLACED  UNFOUND  BYTES     OMAP_BYTES*  OMAP_KEYS*  LOG  DISK_LOG  STATE         STATE_STAMP                      VERSION  REPORTED  UP       UP_PRIMARY  ACTING   ACTING_PRIMARY  LAST_SCRUB  SCRUB_STAMP                      LAST_DEEP_SCRUB  DEEP_SCRUB_STAMP                 SNAPTRIMQ_LEN  LAST_SCRUB_DURATION  SCRUB_SCHEDULING  OBJECTS_SCRUBBED
1.63          10                   0         0          0        0  41943040            0           0   10        10  active+clean  2022-01-13T18:03:34.044849+0530    18'10     18:21  [1,0,2]           1  [1,0,2]               1         0'0  2022-01-13T18:03:32.887271+0530              0'0  2022-01-13T18:03:32.887271+0530              0                    0  queued for scrub                 0
1.62           3                   0         0          0        0  12582912            0           0    3         3  active+clean  2022-01-13T18:03:34.023840+0530     18'3     18:14  [2,0,1]           2  [2,0,1]               2         0'0  2022-01-13T18:03:32.887271+0530              0'0  2022-01-13T18:03:32.887271+0530              0                    0  queued for scrub                 0
1.61           5                   0         0          0        0  20971520            0           0    5         5  active+clean  2022-01-13T18:03:34.066341+0530     18'5     18:16  [0,1,2]           0  [0,1,2]               0         0'0  2022-01-13T18:03:32.887271+0530              0'0  2022-01-13T18:03:32.887271+0530              0                    0  queued for scrub                 0

After scrub:

PG_STAT  OBJECTS  MISSING_ON_PRIMARY  DEGRADED  MISPLACED  UNFOUND  BYTES     OMAP_BYTES*  OMAP_KEYS*  LOG  DISK_LOG  STATE         STATE_STAMP                      VERSION  REPORTED  UP       UP_PRIMARY  ACTING   ACTING_PRIMARY  LAST_SCRUB  SCRUB_STAMP                      LAST_DEEP_SCRUB  DEEP_SCRUB_STAMP                 SNAPTRIMQ_LEN  LAST_SCRUB_DURATION  SCRUB_SCHEDULING  OBJECTS_SCRUBBED
1.63          10                   0         0          0        0  41943040            0           0   10        10  active+clean  2022-01-13T18:06:12.230989+0530    18'10     18:30  [1,0,2]           1  [1,0,2]               1       18'10  2022-01-13T18:06:12.230810+0530              0'0  2022-01-13T18:03:32.887271+0530              0                    1  scrubbing for 0s                10
1.62           3                   0         0          0        0  12582912            0           0    3         3  active+clean  2022-01-13T18:05:53.489342+0530     18'3     18:23  [2,0,1]           2  [2,0,1]               2        18'3  2022-01-13T18:05:53.489059+0530              0'0  2022-01-13T18:03:32.887271+0530              0                    1  scrubbing for 0s                 3
1.61           5                   0         0          0        0  20971520            0           0    5         5  active+clean  2022-01-13T18:06:24.108837+0530     18'5     18:25  [0,1,2]           0  [0,1,2]               0        18'5  2022-01-13T18:06:24.108653+0530              0'0  2022-01-13T18:03:32.887271+0530              0                    1  scrubbing for 0s                 5

@ronen-fr
Copy link
Contributor

ronen-fr commented Jan 14, 2022

the "scrubbing for 0 seconds" in the final listing is strange. Not sure if caused by this PR or not, but if so - seems an additional stat update is needed

@amathuria
Copy link
Contributor Author

amathuria commented Jan 14, 2022

the "scrubbing for 0 seconds" in the final listing is strange. Not sure if caused by this PR or not, but if so - seems an additional stat update is needed

@ronen-fr I don't think it is caused by this PR but I think it's happening in cases where the scrub duration is lesser than 1 second. https://github.com/ceph/ceph/blob/master/src/osd/scrubber/pg_scrubber.cc#L2031 we are trying to get the difference in seconds and that might be returning 0 in such a scenario.
LAST_SCRUB_DURATION looks okay because we do a ceill here : https://github.com/ceph/ceph/blob/master/src/osd/scrubber/pg_scrubber.cc#L2138

Signed-off-by: Aishwarya Mathuria <amathuri@redhat.com>
@ronen-fr
Copy link
Contributor

My problem is with the "scrubbing for" persisting after scrubbing is done. It should have read "scheduled ...". That's why I fear we're missing an update.

@amathuria
Copy link
Contributor Author

My problem is with the "scrubbing for" persisting after scrubbing is done. It should have read "scheduled ...". That's why I fear we're missing an update.
Ah I see what you mean. Yes, I agree with that.

@amathuria
Copy link
Contributor Author

http://pulpito.front.sepia.ceph.com/yuriw-2022-01-13_18:06:52-rados-wip-yuri3-testing-2022-01-13-0809-distro-default-smithi/

Failures unrelated, tracked in:
https://tracker.ceph.com/issues/44587
https://tracker.ceph.com/issues/53843
https://tracker.ceph.com/issues/43887
https://tracker.ceph.com/issues/53807
https://tracker.ceph.com/issues/53886
https://tracker.ceph.com/issues/53424
https://tracker.ceph.com/issues/50830

Details:
Bug #44587: failed to write 34473 to cgroup.procs
Bug #53843: mgr/dashboard: Error - yargs parser supports a minimum Node.js version of 12
Bug #43887: ceph_test_rados_delete_pools_parallel failure
Bug #53807: Dead jobs in rados/cephadm/smoke-roleless
Bug #53886: ansible: Failed to update apt cache
Bug #53424: CEPHADM_DAEMON_PLACE_FAIL in orch:cephadm/mgr-nfs-upgrade/
Bug #50830: rgw-ingress does not install

@yuriw
Copy link
Contributor

yuriw commented Jan 14, 2022

jenkins test api

@ronen-fr
Copy link
Contributor

jenkins test make check

@neha-ojha
Copy link
Member

jenkins test api

@yuriw yuriw merged commit e419a29 into ceph:master Jan 14, 2022
ronen-fr added a commit to ronen-fr/ceph that referenced this pull request Jan 22, 2022
Signed-off-by: Ronen Friedman <rfriedma@redhat.com>
ronen-fr added a commit to ronen-fr/ceph that referenced this pull request Jan 22, 2022
A refactoring of the scrub backend code (all Scrub related code
that checks/manipulates objects' (meta)data).

Part of the refactoring was breaking long chunks of code into
separate functions. In order to avoid passing endless lists
of state parameters from/to these new methods, the scrub backend
state is now maintained within its own objects.

a change to note:
Instead of two scrub-map collections - one that holds all
incoming maps, and one that holds pointers to all of those incoming
plus the one we (the Primary) manage - just use one collection
for both incoming and our own.

Signed-off-by: Ronen Friedman <rfriedma@redhat.com>

Split from "osd/scrub: minor fixes split from main "scrub backend" commit"

Split from "osd/scrub: scrub components embedded documentation"

fix

Signed-off-by: Ronen Friedman <rfriedma@redhat.com>

osd/scrub: fix the scrubber backend to include all PR ceph#42735

Signed-off-by: Ronen Friedman <rfriedma@redhat.com>
ronen-fr added a commit to ronen-fr/ceph that referenced this pull request Jan 23, 2022
to publish the last scrub status report.
The change is needed following the merge of
PR ceph#42735.

Signed-off-by: Ronen Friedman <rfriedma@redhat.com>
yuriw pushed a commit that referenced this pull request Jan 28, 2022
to publish the last scrub status report.
The change is needed following the merge of
PR #42735.

Signed-off-by: Ronen Friedman <rfriedma@redhat.com>
yuriw pushed a commit that referenced this pull request Jan 28, 2022
A refactoring of the scrub backend code (all Scrub related code
that checks/manipulates objects' (meta)data).

Part of the refactoring was breaking long chunks of code into
separate functions. In order to avoid passing endless lists
of state parameters from/to these new methods, the scrub backend
state is now maintained within its own objects.

a change to note:
Instead of two scrub-map collections - one that holds all
incoming maps, and one that holds pointers to all of those incoming
plus the one we (the Primary) manage - just use one collection
for both incoming and our own.

Signed-off-by: Ronen Friedman <rfriedma@redhat.com>

Split from "osd/scrub: minor fixes split from main "scrub backend" commit"

Split from "osd/scrub: scrub components embedded documentation"

fix

Signed-off-by: Ronen Friedman <rfriedma@redhat.com>

osd/scrub: fix the scrubber backend to include all PR #42735

Signed-off-by: Ronen Friedman <rfriedma@redhat.com>
ljflores pushed a commit to ljflores/ceph that referenced this pull request Feb 14, 2022
to publish the last scrub status report.
The change is needed following the merge of
PR ceph#42735.

Signed-off-by: Ronen Friedman <rfriedma@redhat.com>
ljflores pushed a commit to ljflores/ceph that referenced this pull request Feb 14, 2022
A refactoring of the scrub backend code (all Scrub related code
that checks/manipulates objects' (meta)data).

Part of the refactoring was breaking long chunks of code into
separate functions. In order to avoid passing endless lists
of state parameters from/to these new methods, the scrub backend
state is now maintained within its own objects.

a change to note:
Instead of two scrub-map collections - one that holds all
incoming maps, and one that holds pointers to all of those incoming
plus the one we (the Primary) manage - just use one collection
for both incoming and our own.

Signed-off-by: Ronen Friedman <rfriedma@redhat.com>

Split from "osd/scrub: minor fixes split from main "scrub backend" commit"

Split from "osd/scrub: scrub components embedded documentation"

fix

Signed-off-by: Ronen Friedman <rfriedma@redhat.com>

osd/scrub: fix the scrubber backend to include all PR ceph#42735

Signed-off-by: Ronen Friedman <rfriedma@redhat.com>
ronen-fr added a commit to ronen-fr/ceph that referenced this pull request Mar 23, 2022
to publish the last scrub status report.
The change is needed following the merge of
PR ceph#42735.

Signed-off-by: Ronen Friedman <rfriedma@redhat.com>
(cherry picked from commit ab032e9)
ronen-fr added a commit to ronen-fr/ceph that referenced this pull request Jun 20, 2023
A refactoring of the scrub backend code (all Scrub related code
that checks/manipulates objects' (meta)data).

Part of the refactoring was breaking long chunks of code into
separate functions. In order to avoid passing endless lists
of state parameters from/to these new methods, the scrub backend
state is now maintained within its own objects.

a change to note:
Instead of two scrub-map collections - one that holds all
incoming maps, and one that holds pointers to all of those incoming
plus the one we (the Primary) manage - just use one collection
for both incoming and our own.

Signed-off-by: Ronen Friedman <rfriedma@redhat.com>

Split from "osd/scrub: minor fixes split from main "scrub backend" commit"

Split from "osd/scrub: scrub components embedded documentation"

fix

Signed-off-by: Ronen Friedman <rfriedma@redhat.com>

osd/scrub: fix the scrubber backend to include all PR ceph#42735

Signed-off-by: Ronen Friedman <rfriedma@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants