New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

osd/PG: Add two new mClock implementations of the PG sharded operator queue #14997

Merged
merged 1 commit into from Jun 26, 2017

Conversation

@ivancich
Member

ivancich commented May 8, 2017

Create an mClock priority queue, which can in turn be used for two new implementations of the PG shards operator queue. The first (mClockOpClassQueue) prioritizes operations based on which class they belong to (recovery, scrub, snaptrim, client op, osd subop). The second (mClockClientQueue) also incorporates the client identifier, in order to promote fairness between clients.

Before merge would like to add tests (src/test/common/test_mclock_priority_queue.cc).

@ivancich ivancich requested review from liewegas and jdurgin May 8, 2017

} else {
return prioritized;
return io_queue::prioritized;

This comment has been minimized.

@TsaiJin

TsaiJin May 17, 2017

Contributor

It should be better if you also add comments about this type osd_op_queue in the config_opt.h

This comment has been minimized.

@ivancich

ivancich May 17, 2017

Member

I added some comments to config_opt.h about the mclock priority queues.

This comment has been minimized.

@shengqiu001

shengqiu001 Jun 21, 2017

i am a little confused after reading the code, may i ask if the reservation, limit..settings such as osd_op_queue_mclock_client_op_res functioning in your enqueue, and dequeue methods?

This comment has been minimized.

@ivancich

ivancich Jun 22, 2017

Member

@shengqiu001 I attempted to answer your question by replying to your post on the ceph-devel mailing list. That response can be found here: http://marc.info/?l=ceph-devel&m=149807908226510&w=2 . Please let me know if anything is unclear. Thanks!

// Ops will be removed f evaluates to true, f may have sideeffects
// Ops will be removed f evaluates to true, f may have side effects.
// Must visit operations from back to front, due to its use with
// OSD::SharedOpQueue::Pred .
virtual void remove_by_filter(

This comment has been minimized.

@liewegas

liewegas May 17, 2017

Member

Note that hte OSD no longer needs or uses this remove_by_filter thing (as of 3cc4827), unless it's been reintroduced by this series...

This comment has been minimized.

@ivancich

ivancich May 17, 2017

Member

OK, I'll clean up the OpQueue interface to remove that pure virtual method and clean up the subclasses. I use that function in mClockPrioirtyQueue, though, so I'll keep it there. It's how the two integrations bridge certain functionality.

@liewegas

this looks good ot me. my only request is that we squash things down a bit to clean up the history before merge!

also, i suggest removing the filter_* methods since they are not used.

@ivancich

This comment has been minimized.

Member

ivancich commented May 17, 2017

In addition to clean up remove_by_filter, I'll also squash the commits.

@ivancich

This comment has been minimized.

Member

ivancich commented May 17, 2017

I believe this version addresses all concerns raised above.

@ivancich ivancich changed the title from DNM osd/PG: Add two new mClock implementations of the PG sharded operator queue to osd/PG: Add two new mClock implementations of the PG sharded operator queue May 17, 2017

@ivancich ivancich changed the title from osd/PG: Add two new mClock implementations of the PG sharded operator queue to DNM osd/PG: Add two new mClock implementations of the PG sharded operator queue May 19, 2017

@ivancich

This comment has been minimized.

Member

ivancich commented May 19, 2017

I switched back to DNM since Brad Hubbard let me know about this: http://tracker.ceph.com/issues/19979
Will look into Friday AM.

@TsaiJin

This comment has been minimized.

Contributor

TsaiJin commented May 19, 2017

Hi, ivancich. In your implementation, you provide the same QoS for every client. Do you have any plans to support the specified QoS for a specified client.

@ivancich

This comment has been minimized.

Member

ivancich commented May 19, 2017

Hello TsaiJin. Yes. One member of the community has experimented with per-pool settings, so all clients of a given pool would have dmclock tag settings different from clients using other pools. And my next integration will allow clients to use a tag, where each tag would have its own settings.

@ivancich

This comment has been minimized.

Member

ivancich commented May 19, 2017

Regarding the RedMine issue that Brad Hubbard created (http://tracker.ceph.com/issues/19979), the issue is the the dmclock tests use gtest's EXPECT_DEATH_IF_SUPPORTED feature, where a test is done that expects a failed assert to take place. So the test passes but a segfault and core dump take place.

Is there any issue with this and should gtest's (EXPECT|ASSERT)_DEATH* features be avoided?

@ivancich ivancich changed the title from DNM osd/PG: Add two new mClock implementations of the PG sharded operator queue to osd/PG: Add two new mClock implementations of the PG sharded operator queue May 19, 2017

@tchaikov

This comment has been minimized.

Contributor

tchaikov commented May 19, 2017

@ivancich we also use {EXPECT,ASSERT}_DEATH() in ceph, and all of them are guarded using PrCtl (process control). but i am not sure how dmclock can use it, as it is in ceph/src not dmclock. or probably, we should not run the dmclock's tests as a part of ceph's "make check"?

TEST_F(MClockClientQueueTest, TestSize) {
ASSERT_TRUE(q.empty());
ASSERT_EQ(q.length(), 0u);

This comment has been minimized.

@tchaikov

tchaikov May 19, 2017

Contributor

nit, the lhs is the expected value, and the rhs is the actual one.

This comment has been minimized.

@ivancich

ivancich May 19, 2017

Member

I've addressed this and am testing it. I'll push the changes here after the tests.

@ivancich

This comment has been minimized.

Member

ivancich commented May 19, 2017

@tchaikov Thanks for the pointer to PrCtl. I've checked it out. I could incorporate a copy of it into dmClock. Or we could take dmClock out of ceph's "make check" as you suggest. What do others think?

@liewegas

This comment has been minimized.

Member

liewegas commented May 19, 2017

@ivancich

This comment has been minimized.

Member

ivancich commented May 22, 2017

dmclock's tests are now detached from ceph's tests.

@liewegas liewegas added the needs-qa label May 22, 2017

# note: add_test is not being called, so dmclock tests aren't part
# of ceph tests
add_subdirectory(dmclock/test)
add_subdirectory(dmclock/support/test)
endif(WITH_TESTS)

This comment has been minimized.

@smithfarm

smithfarm May 22, 2017

Contributor

The purpose of WITH_TESTS is not to conditionalize the cmake tests - it's conditionalizing the build of binaries packaged in the ceph-test RPM/deb (i.e. the binaries you are dropping in this PR).

(See https://github.com/ceph/ceph/blob/master/ceph.spec.in#L22 and https://github.com/ceph/ceph/blob/master/ceph.spec.in#L24 and check out how these bconds are used.)

@smithfarm

This comment has been minimized.

Contributor

smithfarm commented May 22, 2017

I'm concerned that the dmClock feature as currently merged in master does not use the WITH_TESTS cmake conditional correctly. See http://tracker.ceph.com/issues/19987 . . .

I am currently working on a fix (see #15174) but have not yet happenstanced onto the right combination of modifications to the dmClock cmake configuration.

Completely removing the dmClock tests (i.e. not building anything under src/dmclock/test regardless of the setting of WITH_TESTS) would be much preferable!

add_subdirectory(src)
add_subdirectory(sim)
add_subdirectory(support)
enable_testing()

This comment has been minimized.

@smithfarm

smithfarm May 22, 2017

Contributor

this whole block would also need to be in an if(WITH_DMCLOCK_TESTS) conditional, possibly

This comment has been minimized.

@ivancich

ivancich May 22, 2017

Member

I don't think so, as in this PR src/dmclock/CMakeLists.txt is never read by ceph's cmake.

This comment has been minimized.

@ivancich

ivancich May 22, 2017

Member

Please take a fresh look at this PR as the cmake files have changed substantially.

DESTINATION bin)
# note: add_test is not being called, so dmclock tests aren't part
# of ceph tests
add_subdirectory(dmclock/test)

This comment has been minimized.

@smithfarm

smithfarm May 22, 2017

Contributor

ah, I see that "add_subdirectory(dmclock)" has been dropped, but we still have "add_subdirectory(dmclock/test)" which causes cmake to run https://github.com/ivancich/ceph-fork/blob/wip-bring-in-dmclock-p2-gtest/src/dmclock/test/CMakeLists.txt - or am I missing something?

This comment has been minimized.

@ivancich

ivancich May 22, 2017

Member

Only if WITH_TESTS is true.

This comment has been minimized.

@smithfarm

smithfarm May 22, 2017

Contributor

That's the whole problem - WITH_TESTS is true by default. See https://github.com/ceph/ceph/blob/master/CMakeLists.txt#L453

(Actually, I need the dmclock tests to not be built regardless of the setting of WITH_TESTS. That's why I'm asking you to change this conditional to use something else, like WITH_DMCLOCK_TESTS. Thanks for considering this request.)

This comment has been minimized.

@smithfarm

smithfarm May 22, 2017

Contributor

This is a polite request, by the way :-) I'm happy to continue discussing it. I'm also preparing a PR against master to demonstrate what I mean.

This comment has been minimized.

@ivancich

ivancich May 22, 2017

Member

I understand and appreciate your efforts. If you could try out the current version of this PR it would be helpful, I think. I believe what happens is if WITH_TESTS is true, it will bring in dmclock's cmake files that deal with tests. However the tests will not be built as they're not part of any dependencies. The user would have the option of running "make dmclock-tests", for example, to build them. Does that meet your needs and not violate any established rules?

Out of curiosity, I'm interested in knowing why you do not want the tests to be built.

Also, I'm still not sure why the conditionals looking to see if gtest was a target didn't seem to work for you when you were examining the effects of the earlier PR.

This comment has been minimized.

@smithfarm

smithfarm May 22, 2017

Contributor

Out of curiosity, I'm interested in knowing why you do not want the tests to be built.

The failure with WITH_TESTS=ON was: https://paste2.org/paJcefcb

I will give this PR a try and let you know the result.

This comment has been minimized.

@smithfarm

smithfarm May 23, 2017

Contributor

I will give this PR a try and let you know the result.

Here [1] is the result of building this PR - if you click on the link at [1] and look at the area on the right where it says "Build Results". . . In that area, under the heading "ceph" are the WITH_TESTS=OFF builds and the builds under "ceph-test" are WITH_TESTS=ON. And you're right - the build does succeed in the newer openSUSE Tumbleweed, but we also need it to succeed in Leap 42.3, which is exhibiting the https://paste2.org/paJcefcb failure.

When you click on any build result (e.g. "failed") it takes you to a page showing the last few lines of the build log. From there you can click on another link to get the full build log, at the beginning of which it shows which packages/versions are installed in the build environment. Possibly the different result is caused by different versions of cmake in Leap 42.3 and Tumbleweed?

It's interesting that the WITH_TESTS=ON build succeeds in CentOS 7.3 - maybe the failure is reproducible in CentOS 7.2, though?

[1] https://build.opensuse.org/package/show/home:smithfarm:branches:filesystems:ceph:luminous/ceph

@smithfarm

This comment has been minimized.

Contributor

smithfarm commented May 25, 2017

@ivancich That last commit resolves the build failure in Leap 42.3 - thanks!

@ivancich

This comment has been minimized.

Member

ivancich commented May 30, 2017

@liewegas The default rados.yaml in ceph-qa-suite contains "osd op queue: debug_random".

I'll dig into the error.

@wjwithagen

This comment has been minimized.

Contributor

wjwithagen commented May 30, 2017

@ivancich
Perhaps you can split this PR into 2 PRs?

  1. fixing the testing stuff
  2. the addition on new code
    And I'll first test 1).
@ivancich

This comment has been minimized.

Member

ivancich commented May 31, 2017

@wjwithagen So I've made a PR with just the commits to fix the testing --
#15375 .

@ivancich ivancich changed the title from osd/PG: Add two new mClock implementations of the PG sharded operator queue to DNM osd/PG: Add two new mClock implementations of the PG sharded operator queue Jun 7, 2017

@ivancich

This comment has been minimized.

Member

ivancich commented Jun 7, 2017

Given the recent merging of the dmclock test and valgrind fixes into master, I'm rebasing and testing. Switched to DNM until that process is done.

@tchaikov

This comment has been minimized.

Contributor

tchaikov commented Jun 20, 2017

@ivancich is it planned that we will get this change into luminous? as its release is approaching.

osd/PG: Add two new mClock implementations of the PG sharded operator…
… queue

Create an mClock priority queue, which can in turn be used for two new
implementations of the PG shards operator queue. The first
(mClockOpClassQueue) prioritizes operations based on which class they
belong to (recovery, scrub, snaptrim, client op, osd subop). The
second (mClockClientQueue) also incorporates the client identifier, in
order to promote fairness between clients.

In addition, also remove OpQueue's remove_by_filter and all possible
associated subclass implementations and tests.

Signed-off-by: J. Eric Ivancich <ivancich@redhat.com>
@smithfarm

This comment has been minimized.

Contributor

smithfarm commented Jun 23, 2017

@ivancich Can you resolve the conflicts, please, and can this be merged ASAP, please?

@ivancich

This comment has been minimized.

Member

ivancich commented Jun 23, 2017

I've done quite a bit of testing with the rados suite. It fails occasionally in the same way master fails occasionally. So I don't think the problems are with the dmclock integration code. I believe it's ready for QA.

@ivancich ivancich changed the title from DNM osd/PG: Add two new mClock implementations of the PG sharded operator queue to osd/PG: Add two new mClock implementations of the PG sharded operator queue Jun 23, 2017

@tchaikov tchaikov added the needs-qa label Jun 23, 2017

@smithfarm

This comment has been minimized.

Contributor

smithfarm commented Jun 24, 2017

This is no longer urgent from my side - I only now noticed that the openSUSE build fixes were split off into #15375 which has already been merged.

@tchaikov

This comment has been minimized.

@tchaikov

This comment has been minimized.

Contributor

tchaikov commented Jun 26, 2017

@liewegas no leaks other than http://tracker.ceph.com/issues/20419 is found in the rados run. can we merge this?

@liewegas liewegas merged commit d96491b into ceph:master Jun 26, 2017

4 checks passed

Signed-off-by all commits in this PR are signed
Details
Unmodified Submodules submodules for project are unmodified
Details
arm64 make check arm64 make check succeeded
Details
make check make check succeeded
Details
@wjwithagen

This comment has been minimized.

Contributor

wjwithagen commented Jun 27, 2017

@ivancich @tchaikov

Something is dire missing to complete the test with success on FreeBSD.
Perhaps because not everything is build that has to do with dmclock?
Ivan would you mind suggesting things to test?
ATM i've commented the test out in my Jenkins/FreeBSD runs

        Start  57: unittest_mclock_priority_queue
Could not find executable
/home/jenkins/workspace/ceph-master/build/bin/unittest_mclock_priority_queue
Looked in the following places:
/home/jenkins/workspace/ceph-master/build/bin/unittest_mclock_priority_queue
/home/jenkins/workspace/ceph-master/build/bin/unittest_mclock_priority_queue
/home/jenkins/workspace/ceph-master/build/bin/Release/unittest_mclock_priority_queue
/home/jenkins/workspace/ceph-master/build/bin/Release/unittest_mclock_priority_queue
/home/jenkins/workspace/ceph-master/build/bin/Debug/unittest_mclock_priority_queue
/home/jenkins/workspace/ceph-master/build/bin/Debug/unittest_mclock_priority_queue
/home/jenkins/workspace/ceph-master/build/bin/MinSizeRel/unittest_mclock_priority_queue
/home/jenkins/workspace/ceph-master/build/bin/MinSizeRel/unittest_mclock_priority_queue
/home/jenkins/workspace/ceph-master/build/bin/RelWithDebInfo/unittest_mclock_priority_queue
/home/jenkins/workspace/ceph-master/build/bin/RelWithDebInfo/unittest_mclock_priority_queue
/home/jenkins/workspace/ceph-master/build/bin/Deployment/unittest_mclock_priority_queue
/home/jenkins/workspace/ceph-master/build/bin/Deployment/unittest_mclock_priority_queue
/home/jenkins/workspace/ceph-master/build/bin/Development/unittest_mclock_priority_queue
/home/jenkins/workspace/ceph-master/build/bin/Development/unittest_mclock_priority_queue
home/jenkins/workspace/ceph-master/build/bin/unittest_mclock_priority_queue
home/jenkins/workspace/ceph-master/build/bin/unittest_mclock_priority_queue
home/jenkins/workspace/ceph-master/build/bin/Release/unittest_mclock_priority_queue
home/jenkins/workspace/ceph-master/build/bin/Release/unittest_mclock_priority_queue
home/jenkins/workspace/ceph-master/build/bin/Debug/unittest_mclock_priority_queue
home/jenkins/workspace/ceph-master/build/bin/Debug/unittest_mclock_priority_queue
home/jenkins/workspace/ceph-master/build/bin/MinSizeRel/unittest_mclock_priority_queue
home/jenkins/workspace/ceph-master/build/bin/MinSizeRel/unittest_mclock_priority_queue
home/jenkins/workspace/ceph-master/build/bin/RelWithDebInfo/unittest_mclock_priority_queue
home/jenkins/workspace/ceph-master/build/bin/RelWithDebInfo/unittest_mclock_priority_queue
home/jenkins/workspace/ceph-master/build/bin/Deployment/unittest_mclock_priority_queue
home/jenkins/workspace/ceph-master/build/bin/Deployment/unittest_mclock_priority_queue
home/jenkins/workspace/ceph-master/build/bin/Development/unittest_mclock_priority_queue
home/jenkins/workspace/ceph-master/build/bin/Development/unittest_mclock_priority_queue
Unable to find executable:
/home/jenkins/workspace/ceph-master/build/bin/unittest_mclock_priority_queue
 43/169 Test  #57: unittest_mclock_priority_queue ..........***Not Run
0.00 sec
@tchaikov

This comment has been minimized.

Contributor

tchaikov commented Jun 28, 2017

@wjwithagen please make sure you run "make tests" before "make check" or "ctest".

@wjwithagen

This comment has been minimized.

Contributor

wjwithagen commented Jun 29, 2017

@tchaikov
Missed you comment, but will check with my Jenkins tests

@wjwithagen

This comment has been minimized.

Contributor

wjwithagen commented Jun 29, 2017

@tchaikov @ivancich
Right that fixes it. Thanx for the hint

@dingdangzhang

This comment has been minimized.

Contributor

dingdangzhang commented Jul 5, 2017

@ivancich, i am confused after testing mclock_opclass queue for client IO vs recovery IO. I only set parameters as follows.

OPTION(osd_op_queue, OPT_STR, "mclock_opclass")

OPTION(osd_op_queue_mclock_client_op_res, OPT_DOUBLE, 1000.0)
OPTION(osd_op_queue_mclock_client_op_wgt, OPT_DOUBLE, 500.0)
OPTION(osd_op_queue_mclock_client_op_lim, OPT_DOUBLE, 0.0)

OPTION(osd_op_queue_mclock_client_op_res, OPT_DOUBLE, 1000.0)
OPTION(osd_op_queue_mclock_client_op_wgt, OPT_DOUBLE, 500.0)
OPTION(osd_op_queue_mclock_client_op_lim, OPT_DOUBLE, 2000.0)

in my understanding, recovery iops don't exceed 2000/s when pg recovery. but There is no limit to recovering IO,thus seriously affecting the client IO.Is it my configuration problem or another?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment