Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

jewel: osd: Implement asynchronous scrub sleep #15529

Merged
merged 2 commits into from Jun 22, 2017

Conversation

Projects
None yet
6 participants
@badone
Copy link
Contributor

commented Jun 7, 2017

badone added some commits Apr 24, 2017

osd: Implement asynchronous scrub sleep
Rather than blocking the main op queue just do an async sleep.

Fixes: http://tracker.ceph.com/issues/19497

Signed-off-by: Brad Hubbard <bhubbard@redhat.com>
(cherry picked from commit 7af3e86)
osd: Move scrub sleep timer to osdservice
PR 14886 erroneously creates a scrub sleep timer for every pg resulting
in a proliferation of threads. Move the timer to the osd service so
there can be only one.

Fixes: http://tracker.ceph.com/issues/19986

Signed-off-by: Brad Hubbard <bhubbard@redhat.com>
(cherry picked from commit f110a82)

Conflicts:
        src/osd/PG.cc - ceph_clock_now requires a CephContext argmunent
        in Jewel

@badone badone added this to the jewel milestone Jun 7, 2017

@jdurgin

jdurgin approved these changes Jun 7, 2017

@badone

This comment has been minimized.

Copy link
Contributor Author

commented Jun 18, 2017

Jenkins retest this please

@smithfarm smithfarm changed the title Jewel: osd: Implement asynchronous scrub sleep jewel: osd: Implement asynchronous scrub sleep Jun 20, 2017

@smithfarm

This comment has been minimized.

Copy link
Contributor

commented Jun 22, 2017

@jdurgin @badone This passed a rados suite with two failures, one of which is scrub-related. See http://pulpito.front.sepia.ceph.com/smithfarm-2017-06-22_11:36:26-rados-wip-jewel-backports-distro-basic-smithi/

I'll be grateful for any help you can provide diagnosing these failures.

@smithfarm

This comment has been minimized.

Copy link
Contributor

commented Jun 22, 2017

Note that this is one of the backports requested for inclusion in 10.2.8.

@jdurgin

This comment has been minimized.

Copy link
Member

commented Jun 22, 2017

@smithfarm http://pulpito.front.sepia.ceph.com/smithfarm-2017-06-22_11:36:26-rados-wip-jewel-backports-distro-basic-smithi/1316533/ failed due to running out of space on osd.3:

2017-06-22 13:25:29.894162 7f9bbcce7700 10 filestore(/var/lib/ceph/osd/ceph-3) error opening file /var/lib/ceph/osd/ceph-3/current/0.d_head/DIR_D/DIR_9/DIR_2/obj-sgQ6vtDUfFGJMMT__head_D616229D__0 with flags=66:
 (28) No space left on device
2017-06-22 13:25:29.894174 7f9bbcce7700  0 filestore(/var/lib/ceph/osd/ceph-3) write couldn't open 0.d_head/#0:b944686b:::obj-sgQ6vtDUfFGJMMT:head#: (28) No space left on device
2017-06-22 13:25:29.894180 7f9bbcce7700 10 filestore(/var/lib/ceph/osd/ceph-3) write 0.d_head/#0:b944686b:::obj-sgQ6vtDUfFGJMMT:head# 81523~1 = -28
2017-06-22 13:25:29.894183 7f9bbcce7700  0 filestore(/var/lib/ceph/osd/ceph-3)  error (28) No space left on device not handled on operation 0x7f9bdabfb060 (95452.1.0, or op 0, counting from 0)
2017-06-22 13:25:29.894186 7f9bbcce7700  0 filestore(/var/lib/ceph/osd/ceph-3) ENOSPC handling not implemented

this has popped up more recently on smithis due to their smaller disks. It is not a bug in the backports.

The rgw failure is unrelated to this pr - it's failing slo and dlo tests, which I'm not sure are implemented in hammer or jewel - @yehudasa ?

@yehudasa

This comment has been minimized.

Copy link
Member

commented Jun 22, 2017

@jdurgin @badone I assume that running against swift master branch? we added a couple of tests now, and these would fail in jewel.

@smithfarm

This comment has been minimized.

Copy link
Contributor

commented Jun 22, 2017

I assume that running against swift master branch?

Indeed. The swift task in jewel does:

git clone git://git.ceph.com/git/swift.git /home/ubuntu/cephtest/swift
radosgw-admin to create a user
radosgw-admin to create another user
cd /home/ubuntu/cephtest/swift && ./bootstrap

I looked at https://github.com/ceph/swift and I see only two branches: "master" and "stable/diablo". Could you make a "jewel" branch that doesn't have the incompatible tests? I'll patch the jewel task to use it.

@smithfarm

This comment has been minimized.

Copy link
Contributor

commented Jun 22, 2017

@jdurgin Given that the failures are unrelated, can this be merged?

@smithfarm smithfarm merged commit 38af498 into ceph:jewel Jun 22, 2017

5 checks passed

Signed-off-by all commits in this PR are signed
Details
Unmodifed Submodules submodules for project are unmodified
Details
Unmodified Submodules submodules for project are unmodified
Details
default Build finished.
Details
make check make check succeeded
Details
@smithfarm

This comment has been minimized.

Copy link
Contributor

commented Jun 22, 2017

Sorry, I didn't notice it was already reviewed.

@cbodley

This comment has been minimized.

Copy link
Contributor

commented Jun 22, 2017

@smithfarm i pushed a ceph-jewel branch with those new slo/dlo tests removed

@smithfarm

This comment has been minimized.

Copy link
Contributor

commented Jun 22, 2017

Argh!!! The swift task is in teuthology!

@smithfarm

This comment has been minimized.

Copy link
Contributor

commented Jun 22, 2017

Can we simply transplant it into ceph/ceph.git master/kraken/jewel/hammer ?

@cbodley

This comment has been minimized.

Copy link
Contributor

commented Jun 22, 2017

@smithfarm i also thought that was strange. i support moving it into ceph

@smithfarm

This comment has been minimized.

Copy link
Contributor

commented Jun 22, 2017

@cbodley I'm discussing that with @zmc now. The move has to be done in a way that preserves git history.

@smithfarm

This comment has been minimized.

Copy link
Contributor

commented Jun 22, 2017

Agreed that I will do the move tomorrow.

@smithfarm

This comment has been minimized.

Copy link
Contributor

commented Jun 22, 2017

Since teuthology tasks in teuthology itself take precedence over ones in ceph of the same name, we can add the task to the ceph branches, first, and rip out the task from teuthology, second.

@smithfarm

This comment has been minimized.

Copy link
Contributor

commented Jun 23, 2017

Phase one:

@badone

This comment has been minimized.

Copy link
Contributor Author

commented Jun 23, 2017

"The rgw failure is unrelated to this pr"

Should discussion and work to do with the rgw failure take place in a tracker rather than a merged and unrelated PR? Just asking...

@badone badone deleted the badone:wip-async-sleep-timer-fix-jewel branch Jun 23, 2017

@smithfarm

This comment has been minimized.

Copy link
Contributor

commented Jun 23, 2017

@badone Sorry for the hijack. Let's move the discussion to http://tracker.ceph.com/issues/20392

@badone

This comment has been minimized.

Copy link
Contributor Author

commented Jun 23, 2017

@smithfarm Not a problem of course mate, just thought it seemed a little odd.

@smithfarm smithfarm changed the title jewel: osd: Implement asynchronous scrub sleep jewel: osd: Implement asynchronous scrub sleep Jul 11, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.