New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
osd/ReplicatedBackend: add sanity check during build_push_op() #9491
Conversation
I know this change is fine. I'm just wondering could you reproduce this or give a explain why this will happen? Or it's just by accident |
@yuyuyu101 I deleted some data of pgs by accident and then the osd went crashed as above... |
@yuyuyu101 PING |
store->getattrs(ch, ghobject_t(recovery_info.soid), out_op->attrset); | ||
int r = store->omap_get_header(coll, ghobject_t(recovery_info.soid), &out_op->omap_header); | ||
if(r < 0) | ||
return r; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it would be better to add log here
7a1ea6b
to
df3d9b5
Compare
@yuyuyu101 thanks, I have fixed it. |
int r = store->omap_get_header(coll, ghobject_t(recovery_info.soid), &out_op->omap_header); | ||
if(r < 0) { | ||
get_parent()->clog_error() << get_info().pgid | ||
<< " get omap header failed: " << cpp_strerror(-r) << "\n"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need clog
The caller may or may not tolerate build_push_op() to return a negative result, and we shall honour the truth that it is up to him, not us. Also this can silent the noise from the lower level API such as buffer, which is less clear and can be rather confusing sometimes: 1: /usr/bin/ceph-osd() [0xada722] 2: (()+0xf100) [0x7f9889f22100] 3: (gsignal()+0x37) [0x7f988893b5f7] 4: (abort()+0x148) [0x7f988893cce8] 5: (__gnu_cxx::__verbose_terminate_handler()+0x165) [0x7f988923f9d5] 6: (()+0x5e946) [0x7f988923d946] 7: (()+0x5e973) [0x7f988923d973] 8: (()+0x5eb93) [0x7f988923db93] 9: (ceph::buffer::list::iterator::copy(unsigned int, char*)+0x137) [0xc51707] 10: (object_info_t::decode(ceph::buffer::list::iterator&)+0x7e) [0x7a236e] 11: (ReplicatedBackend::build_push_op(ObjectRecoveryInfo const&, ObjectRecoveryProgress const&, ObjectRecoveryProgress*, PushOp*, object_stat_sum_t*)+0x8dc) [0xa16c0c] 12: (ReplicatedBackend::prep_push(std::tr1::shared_ptr<ObjectContext>, hobject_t const&, pg_shard_t, eversion_t, interval_set<unsigned long>&, std::map<hobject_t, interval_set<unsigned long>, std::less<hobject_t>, std::allocator<std::pair<hobject_t const, interval_set<unsigned long> > > >&, PushOp*)+0x424) [0xa17e94] 13: (ReplicatedBackend::prep_push_to_replica(std::tr1::shared_ptr<ObjectContext>, hobject_t const&, pg_shard_t, PushOp*)+0x196) [0xa1eff6] 14: (ReplicatedBackend::start_pushes(hobject_t const&, std::tr1::shared_ptr<ObjectContext>, ReplicatedBackend::RPGHandle*)+0x1cd) [0xa1fcad] 15: (C_ReplicatedBackend_OnPullComplete::finish(ThreadPool::TPHandle&)+0x153) [0xa32173] 16: (GenContext<ThreadPool::TPHandle&>::complete(ThreadPool::TPHandle&)+0x9) [0x6c5ea9] 17: (ReplicatedPG::BlessedGenContext<ThreadPool::TPHandle&>::finish(ThreadPool::TPHandle&)+0x95) [0x8acbe5] 18: (GenContext<ThreadPool::TPHandle&>::complete(ThreadPool::TPHandle&)+0x9) [0x6c5ea9] 19: (ThreadPool::WorkQueueVal<GenContext<ThreadPool::TPHandle&>*, GenContext<ThreadPool::TPHandle&>*>::_void_process(void*, ThreadPool::TPHandle&)+0x62) [0x6c7f22] 20: (ThreadPool::worker(ThreadPool::WorkThread*)+0xa76) [0xbce296] 21: (ThreadPool::WorkThread::entry()+0x10) [0xbcf320] 22: (()+0x7dc5) [0x7f9889f1adc5] 23: (clone()+0x6d) [0x7f98889fc21d] Signed-off-by: Yan Jun <yan.jun8@zte.com.cn>
df3d9b5
to
83e0944
Compare
@yuyuyu101 done, thanks a lot. |
lgtm |
retest this please |
The caller may or may not tolerate build_push_op() to return a negative result,
and we shall honour the truth that it is up to him, not us.
Also this can silent the noise from the lower level API such as
buffer, which is less clear and can be rather confusing sometimes:
Signed-off-by: Yan Jun yan.jun8@zte.com.cn