Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

osd/ReplicatedBackend: add sanity check during build_push_op() #9491

Merged
merged 1 commit into from Jul 14, 2016

Conversation

Yan-waller
Copy link
Contributor

The caller may or may not tolerate build_push_op() to return a negative result,
and we shall honour the truth that it is up to him, not us.

Also this can silent the noise from the lower level API such as
buffer, which is less clear and can be rather confusing sometimes:

1: /usr/bin/ceph-osd() [0xada722]
2: (()+0xf100) [0x7f9889f22100]
3: (gsignal()+0x37) [0x7f988893b5f7]
4: (abort()+0x148) [0x7f988893cce8]
5: (__gnu_cxx::__verbose_terminate_handler()+0x165) [0x7f988923f9d5]
6: (()+0x5e946) [0x7f988923d946]
7: (()+0x5e973) [0x7f988923d973]
8: (()+0x5eb93) [0x7f988923db93]
9: (ceph::buffer::list::iterator::copy(unsigned int, char*)+0x137) [0xc51707]
10: (object_info_t::decode(ceph::buffer::list::iterator&)+0x7e) [0x7a236e]
11: (ReplicatedBackend::build_push_op(ObjectRecoveryInfo const&, ObjectRecoveryProgress const&, ObjectRecoveryProgress*, PushOp*, object_stat_sum_t*)+0x8dc) [0xa16c0c]
12: (ReplicatedBackend::prep_push(std::tr1::shared_ptr<ObjectContext>, hobject_t const&, pg_shard_t, eversion_t, interval_set<unsigned long>&, std::map<hobject_t, interval_set<unsigned long>, std::less<hobject_t>, std::allocator<std::pair<hobject_t const, interval_set<unsigned long> > > >&, PushOp*)+0x424) [0xa17e94]
13: (ReplicatedBackend::prep_push_to_replica(std::tr1::shared_ptr<ObjectContext>, hobject_t const&, pg_shard_t, PushOp*)+0x196) [0xa1eff6]
14: (ReplicatedBackend::start_pushes(hobject_t const&, std::tr1::shared_ptr<ObjectContext>, ReplicatedBackend::RPGHandle*)+0x1cd) [0xa1fcad]
15: (C_ReplicatedBackend_OnPullComplete::finish(ThreadPool::TPHandle&)+0x153) [0xa32173]
16: (GenContext<ThreadPool::TPHandle&>::complete(ThreadPool::TPHandle&)+0x9) [0x6c5ea9]
17: (ReplicatedPG::BlessedGenContext<ThreadPool::TPHandle&>::finish(ThreadPool::TPHandle&)+0x95) [0x8acbe5]
18: (GenContext<ThreadPool::TPHandle&>::complete(ThreadPool::TPHandle&)+0x9) [0x6c5ea9]
19: (ThreadPool::WorkQueueVal<GenContext<ThreadPool::TPHandle&>*, GenContext<ThreadPool::TPHandle&>*>::_void_process(void*, ThreadPool::TPHandle&)+0x62) [0x6c7f22]
20: (ThreadPool::worker(ThreadPool::WorkThread*)+0xa76) [0xbce296]
21: (ThreadPool::WorkThread::entry()+0x10) [0xbcf320]
22: (()+0x7dc5) [0x7f9889f1adc5]
23: (clone()+0x6d) [0x7f98889fc21d]

Signed-off-by: Yan Jun yan.jun8@zte.com.cn

@yuyuyu101
Copy link
Member

I know this change is fine. I'm just wondering could you reproduce this or give a explain why this will happen? Or it's just by accident

@Yan-waller
Copy link
Contributor Author

@yuyuyu101 I deleted some data of pgs by accident and then the osd went crashed as above...

@Yan-waller
Copy link
Contributor Author

@yuyuyu101 PING

store->getattrs(ch, ghobject_t(recovery_info.soid), out_op->attrset);
int r = store->omap_get_header(coll, ghobject_t(recovery_info.soid), &out_op->omap_header);
if(r < 0)
return r;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would be better to add log here

@Yan-waller
Copy link
Contributor Author

@yuyuyu101 thanks, I have fixed it.

int r = store->omap_get_header(coll, ghobject_t(recovery_info.soid), &out_op->omap_header);
if(r < 0) {
get_parent()->clog_error() << get_info().pgid
<< " get omap header failed: " << cpp_strerror(-r) << "\n";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need clog

The caller may or may not tolerate build_push_op() to return a
negative result, and we shall honour the truth that it is up
to him, not us.

Also this can silent the noise from the lower level API such as
buffer, which is less clear and can be rather confusing sometimes:

1: /usr/bin/ceph-osd() [0xada722]
2: (()+0xf100) [0x7f9889f22100]
3: (gsignal()+0x37) [0x7f988893b5f7]
4: (abort()+0x148) [0x7f988893cce8]
5: (__gnu_cxx::__verbose_terminate_handler()+0x165) [0x7f988923f9d5]
6: (()+0x5e946) [0x7f988923d946]
7: (()+0x5e973) [0x7f988923d973]
8: (()+0x5eb93) [0x7f988923db93]
9: (ceph::buffer::list::iterator::copy(unsigned int, char*)+0x137) [0xc51707]
10: (object_info_t::decode(ceph::buffer::list::iterator&)+0x7e) [0x7a236e]
11: (ReplicatedBackend::build_push_op(ObjectRecoveryInfo const&, ObjectRecoveryProgress const&, ObjectRecoveryProgress*, PushOp*, object_stat_sum_t*)+0x8dc) [0xa16c0c]
12: (ReplicatedBackend::prep_push(std::tr1::shared_ptr<ObjectContext>, hobject_t const&, pg_shard_t, eversion_t, interval_set<unsigned long>&, std::map<hobject_t, interval_set<unsigned long>, std::less<hobject_t>, std::allocator<std::pair<hobject_t const, interval_set<unsigned long> > > >&, PushOp*)+0x424) [0xa17e94]
13: (ReplicatedBackend::prep_push_to_replica(std::tr1::shared_ptr<ObjectContext>, hobject_t const&, pg_shard_t, PushOp*)+0x196) [0xa1eff6]
14: (ReplicatedBackend::start_pushes(hobject_t const&, std::tr1::shared_ptr<ObjectContext>, ReplicatedBackend::RPGHandle*)+0x1cd) [0xa1fcad]
15: (C_ReplicatedBackend_OnPullComplete::finish(ThreadPool::TPHandle&)+0x153) [0xa32173]
16: (GenContext<ThreadPool::TPHandle&>::complete(ThreadPool::TPHandle&)+0x9) [0x6c5ea9]
17: (ReplicatedPG::BlessedGenContext<ThreadPool::TPHandle&>::finish(ThreadPool::TPHandle&)+0x95) [0x8acbe5]
18: (GenContext<ThreadPool::TPHandle&>::complete(ThreadPool::TPHandle&)+0x9) [0x6c5ea9]
19: (ThreadPool::WorkQueueVal<GenContext<ThreadPool::TPHandle&>*, GenContext<ThreadPool::TPHandle&>*>::_void_process(void*, ThreadPool::TPHandle&)+0x62) [0x6c7f22]
20: (ThreadPool::worker(ThreadPool::WorkThread*)+0xa76) [0xbce296]
21: (ThreadPool::WorkThread::entry()+0x10) [0xbcf320]
22: (()+0x7dc5) [0x7f9889f1adc5]
23: (clone()+0x6d) [0x7f98889fc21d]

Signed-off-by: Yan Jun <yan.jun8@zte.com.cn>
@Yan-waller
Copy link
Contributor Author

@yuyuyu101 done, thanks a lot.

@yuyuyu101
Copy link
Member

lgtm

@Yan-waller
Copy link
Contributor Author

retest this please

@Yan-waller
Copy link
Contributor Author

@liewegas @yuriw Could you help to have this pr tested? thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants