Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

luminous: osd: PG: add custom_reaction Backfilled and release reservations after bac… #23493

Merged
merged 1 commit into from Oct 25, 2018

Conversation

VictorDenisov
Copy link
Contributor

@VictorDenisov VictorDenisov commented Aug 8, 2018

http://tracker.ceph.com/issues/24333


…kfill

After backfill completes, we directly go to the Recovered state without
releasing reservations. The outstanding reservations cause double reservation
issues.

Creating a custom_reaction Backfilled, allows us to release reservations,
before transiting to the Recovered state.

Signed-off-by: Neha Ojha nojha@redhat.com
(cherry picked from commit 1abc232)

  • References tracker ticket
  • Updates documentation if necessary
  • Includes tests for new functionality or reproducer for bug

@neha-ojha
Copy link
Member

Missing tracker reference:
Fixes: http://tracker.ceph.com/issues/24333

@neha-ojha neha-ojha changed the title PG: add custom_reaction Backfilled and release reservations after bac… luminous: PG: add custom_reaction Backfilled and release reservations after bac… Aug 9, 2018
@tchaikov tchaikov added this to the luminous milestone Aug 9, 2018
@VictorDenisov
Copy link
Contributor Author

VictorDenisov commented Aug 9, 2018

🤦‍♂️ yeah, forgot the ticket link, pardon

Copy link
Contributor

@smithfarm smithfarm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@VictorDenisov Please append the following text to the commit message:


Conflicts:
        src/osd/PG.cc
        src/osd/PG.h

This signifies to "posterity" that the commit did not cherry-pick cleanly and that manual conflict resolution was undertaken. Bonus points for adding a brief explanation of the conflict and/or what was done to resolve it.

Thanks!

@yuriw
Copy link
Contributor

yuriw commented Sep 15, 2018

@yuriw
Copy link
Contributor

yuriw commented Sep 20, 2018

@VictorDenisov
Copy link
Contributor Author

Ok. Looking into it.

@VictorDenisov
Copy link
Contributor Author

@neha-ojha I'm not allowed to see the last like that you gave me. Do I need to have any permissions for it?

@neha-ojha
Copy link
Member

@VictorDenisov
Copy link
Contributor Author

Yes. I downloaded the log. Thanks.

@yuriw
Copy link
Contributor

yuriw commented Sep 26, 2018

@VictorDenisov we need this PR for 12.2.9 pls let know if it's ready

@yuriw
Copy link
Contributor

yuriw commented Sep 26, 2018

@yuriw
Copy link
Contributor

yuriw commented Sep 28, 2018

@VictorDenisov pls add "needs-qa" ta when ready

@yuriw
Copy link
Contributor

yuriw commented Oct 1, 2018

@smithfarm smithfarm dismissed their stale review October 2, 2018 19:22

@VictorDenisov For next time: the "Conflicts" section always goes below the "(cherry picked from ...)" line. Thanks!

@smithfarm
Copy link
Contributor

@VictorDenisov Try git checkout luminous ; git log --grep Conflicts for some examples of what it should look like.

@VictorDenisov
Copy link
Contributor Author

@neha-ojha Is it the error that you are referring to:

2018-09-22T16:15:16.839 INFO:tasks.ceph.osd.4.smithi144.stderr: ceph version 10.2.11-2-gc2d7b39 (c2d7b39dda3c2f8102698c48eef561b8e5853be8)
2018-09-22T16:15:16.839 INFO:tasks.ceph.osd.4.smithi144.stderr: 1: (()+0xa48b8e) [0x55fadb6cfb8e]
2018-09-22T16:15:16.840 INFO:tasks.ceph.osd.4.smithi144.stderr: 2: (()+0x11390) [0x7f6dd2315390]
2018-09-22T16:15:16.840 INFO:tasks.ceph.osd.4.smithi144.stderr: 3: (gsignal()+0x38) [0x7f6dd02b1428]
2018-09-22T16:15:16.840 INFO:tasks.ceph.osd.4.smithi144.stderr: 4: (abort()+0x16a) [0x7f6dd02b302a]
2018-09-22T16:15:16.840 INFO:tasks.ceph.osd.4.smithi144.stderr: 5: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x26b) [0x55fadb7d180b]
2018-09-22T16:15:16.840 INFO:tasks.ceph.osd.4.smithi144.stderr: 6: (PG::RecoveryState::Crashed::Crashed(boost::statechart::state<PG::RecoveryState::Crashed, PG::RecoveryState::RecoveryMachine, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0>::my_context)+0xc7) [0x55fadb135207]
2018-09-22T16:15:16.840 INFO:tasks.ceph.osd.4.smithi144.stderr: 7: (()+0x4e8806) [0x55fadb16f806]
2018-09-22T16:15:16.840 INFO:tasks.ceph.osd.4.smithi144.stderr: 8: (boost::statechart::simple_state<PG::RecoveryState::ReplicaActive, PG::RecoveryState::Started, PG::RecoveryState::RepNotRecovering, (boost::statechart::history_mode)0>::react_impl(boost::statechart::event_base const&, void const*)+0x2d6) [0x55fadb19f496]
2018-09-22T16:15:16.840 INFO:tasks.ceph.osd.4.smithi144.stderr: 9: (boost::statechart::simple_state<PG::RecoveryState::RepNotRecovering, PG::RecoveryState::ReplicaActive, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0>::react_impl(boost::statechart::event_base const&, void const*)+0xb7) [0x55fadb1a0337]
2018-09-22T16:15:16.840 INFO:tasks.ceph.osd.4.smithi144.stderr: 10: (boost::statechart::state_machine<PG::RecoveryState::RecoveryMachine, PG::RecoveryState::Initial, std::allocator<void>, boost::statechart::null_exception_translator>::process_event(boost::statechart::event_base const&)+0x69) [0x55fadb17f809]
2018-09-22T16:15:16.840 INFO:tasks.ceph.osd.4.smithi144.stderr: 11: (PG::handle_peering_event(std::shared_ptr<PG::CephPeeringEvt>, PG::RecoveryCtx*)+0x395) [0x55fadb1532b5]
2018-09-22T16:15:16.841 INFO:tasks.ceph.osd.4.smithi144.stderr: 12: (OSD::process_peering_events(std::__cxx11::list<PG*, std::allocator<PG*> > const&, ThreadPool::TPHandle&)+0x2d4) [0x55fadb09c444]
2018-09-22T16:15:16.841 INFO:tasks.ceph.osd.4.smithi144.stderr: 13: (ThreadPool::BatchWorkQueue<PG>::_void_process(void*, ThreadPool::TPHandle&)+0x25) [0x55fadb0e5395]
2018-09-22T16:15:16.841 INFO:tasks.ceph.osd.4.smithi144.stderr: 14: (ThreadPool::worker(ThreadPool::WorkThread*)+0xdb1) [0x55fadb7c37f1]
2018-09-22T16:15:16.841 INFO:tasks.ceph.osd.4.smithi144.stderr: 15: (ThreadPool::WorkThread::entry()+0x10) [0x55fadb7c48f0]
2018-09-22T16:15:16.841 INFO:tasks.ceph.osd.4.smithi144.stderr: 16: (()+0x76ba) [0x7f6dd230b6ba]
2018-09-22T16:15:16.841 INFO:tasks.ceph.osd.4.smithi144.stderr: 17: (clone()+0x6d) [0x7f6dd038341d]
2018-09-22T16:15:16.841 INFO:tasks.ceph.osd.4.smithi144.stderr: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

@smithfarm
Copy link
Contributor

smithfarm commented Oct 3, 2018

@VictorDenisov @neha-ojha The cherry-pick in this PR looks very different from the original 1abc232

I tried it, and it's too difficult ("non-trivial") for me. Maybe it would be better to close this and let @neha-ojha do the backport herself? (Some backports are better left to the developers...)

Alternatively, maybe @neha-ojha could review the cherry-pick and suggest fixes?

@VictorDenisov
Copy link
Contributor Author

This code in master underwent several refactorings which are not in luminous. Basically the code of Backfilled handler is derived from DeferBackfill handler. Though I may have missed the logic behind the refactorings in master. @neha-ojha I would like to try fixing it for several days before giving up, but if it's urgent and you decide to backport yourself please add me to your new PR - I would like to see the correct solution.

@neha-ojha
Copy link
Member

@VictorDenisov I'd be happy to let you have another go at it. In the meanwhile, I will try to investigate what the cherry-pick in luminous should look like. Does that sound reasonable @smithfarm?

…kfill

After backfill completes, we directly go to the Recovered state without
releasing reservations. The outstanding reservations cause double reservation
issues.

Creating a custom_reaction Backfilled, allows us to release reservations,
before transiting to the Recovered state.

Signed-off-by: Neha Ojha <nojha@redhat.com>
(cherry picked from commit 1abc232)

Conflicts:
        src/osd/PG.cc
        src/osd/PG.h
@VictorDenisov
Copy link
Contributor Author

@neha-ojha I don't see how my change could cause crashed state transition. I looked at those refactorings in master that introduced cancel state: 762f3da, d4e4922. I don't see any significant changes in them. I rebased my change onto the latest luminous - maybe it can help.

@neha-ojha
Copy link
Member

@yuriw Can we please test this PR individually to confirm #23493 (comment)?

@yuriw
Copy link
Contributor

yuriw commented Oct 22, 2018

@yuriw yuriw merged commit 58bad60 into ceph:luminous Oct 25, 2018
@VictorDenisov VictorDenisov deleted the backport_24333 branch October 26, 2018 04:37
@smithfarm smithfarm changed the title luminous: PG: add custom_reaction Backfilled and release reservations after bac… luminous: osd: PG: add custom_reaction Backfilled and release reservations after bac… Oct 26, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
6 participants