Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test/rbd_mirror: "use of uninitialised value" valgrind warning #19437

Merged
merged 1 commit into from Dec 12, 2017

Conversation

trociny
Copy link
Contributor

@trociny trociny commented Dec 11, 2017

The on_call context serves as a barrier and should be completed
after the on_start_ctx context is assigned.

The warning was observed sporadically e.g. by repeating
WaitingOnNonLeaderAcquireLeader test under valgrind.

Signed-off-by: Mykola Golub to.my.trociny@gmail.com

The `on_call` context serves as a barrier and should be completed
after the `on_start_ctx` context is assigned.

The warning was observed sporadically e.g. by repeating
WaitingOnNonLeaderAcquireLeader test under valgrind.

Signed-off-by: Mykola Golub <to.my.trociny@gmail.com>
@liewegas
Copy link
Member

Should this target mimic-dev1? (Is the failure in http://pulpito.ceph.com/sage-2017-12-04_18:01:15-rbd-mimic-dev1-distro-basic-smithi/)

@trociny
Copy link
Contributor Author

trociny commented Dec 12, 2017

@liewegas The fix is for unittest_rbd_mirror, which we do not run on teuthology, only on jenkins. And I only stepped on it running locally under valgrind. So not sure we need to backport this.

As for failures you mentioned, actually the test scripts succeeded in all the cases but the teuthology failed due to the cluster health check failed. It does not look rbd related to me.

@trociny
Copy link
Contributor Author

trociny commented Dec 12, 2017

@liewegas One of the failures [1] was due to an osd crash:

2017-12-04 21:46:49.512 7f1e8c538700 -1 /build/ceph-13.0.0-3692-gc30faff/src/osd/PrimaryLogPG.cc: In function 'void PrimaryLogPG::execute_ctx(PrimaryLogPG::OpContext*)' thread 7f1e8c538700 ti
me 2017-12-04 21:46:49.510005
/build/ceph-13.0.0-3692-gc30faff/src/osd/PrimaryLogPG.cc: 3578: FAILED assert(0 == "out of order op")

 ceph version 13.0.0-3692-gc30faff (c30faff877234aee4ce27913540606aebeeff3ed) mimic (dev)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x102) [0x556994e73bb2]
 2: (PrimaryLogPG::execute_ctx(PrimaryLogPG::OpContext*)+0x1667) [0x556994a7de37]
 3: (PrimaryLogPG::do_op(boost::intrusive_ptr<OpRequest>&)+0x3098) [0x556994a81408]
 4: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, ThreadPool::TPHandle&)+0xcef) [0x556994a834cf]
 5: (OSD::dequeue_op(boost::intrusive_ptr<PG>, boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x367) [0x5569948d3017]
 6: (PGOpItem::run(OSD*, boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&)+0x5a) [0x556994b4aa6a]
 7: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0xe7b) [0x5569948d9aab]
 8: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x7bb) [0x556994e7743b]
 9: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x556994e79910]
 10: (()+0x76ba) [0x7f1eab6856ba]
 11: (clone()+0x6d) [0x7f1eaa6fc3dd]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

And some log entries before the crash that look relevant:

2017-12-04 21:46:49.500 7f1e8c538700 10 osd.0 pg_epoch: 32 pg[2.3c( v 28'22 (0'0,28'22] local-lis/les=12/13 n=4 ec=12/12 lis/c 12/12 les/c/f 13/13/0 12/12/12) [0,1] r=0 lpr=12 crt=28'22 lcod 28'21 mlcod 28'21 active+clean] do_op osd_op(client.4156.0:267 2.3c 2:3d7abdcf:::journal.103c7da64c08:head [watch watch cookie 139870777500704] snapc 0=[] ondisk+write+known_if_redirected e32) v9 may_write may_read -> write-ordered flags ondisk+write+known_if_redirected
2017-12-04 21:46:49.500 7f1e8c538700 20 osd.0 pg_epoch: 32 pg[2.3c( v 28'22 (0'0,28'22] local-lis/les=12/13 n=4 ec=12/12 lis/c 12/12 les/c/f 13/13/0 12/12/12) [0,1] r=0 lpr=12 crt=28'22 lcod 28'21 mlcod 28'21 active+clean]  op order client.4156 tid 267 last was 183
...
2017-12-04 21:46:49.508 7f1e8c538700 10 osd.0 pg_epoch: 32 pg[2.3c( v 32'23 (0'0,32'23] local-lis/les=12/13 n=4 ec=12/12 lis/c 12/12 les/c/f 13/13/0 12/12/12) [0,1] r=0 lpr=12 crt=32'23 lcod 28'22 mlcod 28'22 active+clean]  found existing watch watch(cookie 139870777500704 30s 172.21.15.41:0/2837864525) by client.4156
2017-12-04 21:46:49.508 7f1e8c538700 10 osd.0 pg_epoch: 32 pg[2.3c( v 32'23 (0'0,32'23] local-lis/les=12/13 n=4 ec=12/12 lis/c 12/12 les/c/f 13/13/0 12/12/12) [0,1] r=0 lpr=12 crt=32'23 lcod 28'22 mlcod 28'22 active+clean]  dropping ondisk_read_lock
2017-12-04 21:46:49.508 7f1e8c538700 20 osd.0 pg_epoch: 32 pg[2.3c( v 32'23 (0'0,32'23] local-lis/les=12/13 n=4 ec=12/12 lis/c 12/12 les/c/f 13/13/0 12/12/12) [0,1] r=0 lpr=12 crt=32'23 lcod 28'22 mlcod 28'22 active+clean]  op order client.4156 tid 259 last was 267
2017-12-04 21:46:49.508 7f1e8c538700 -1 osd.0 pg_epoch: 32 pg[2.3c( v 32'23 (0'0,32'23] local-lis/les=12/13 n=4 ec=12/12 lis/c 12/12 les/c/f 13/13/0 12/12/12) [0,1] r=0 lpr=12 crt=32'23 lcod 28'22 mlcod 28'22 active+clean] bad op order, already applied 267 > this 259

[1] http://qa-proxy.ceph.com/teuthology/sage-2017-12-04_18:01:15-rbd-mimic-dev1-distro-basic-smithi/1927044/teuthology.log

@liewegas
Copy link
Member

liewegas commented Dec 12, 2017 via email

Copy link

@dillaman dillaman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@dillaman dillaman merged commit 548bc8b into ceph:master Dec 12, 2017
@trociny trociny deleted the wip-mock-valgrind branch December 12, 2017 15:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants