New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

luminous: osd: PG: add custom_reaction Backfilled and release reservations after bac… #23493

Merged
merged 1 commit into from Oct 25, 2018

Conversation

Projects
None yet
6 participants
@VictorDenisov
Contributor

VictorDenisov commented Aug 8, 2018

http://tracker.ceph.com/issues/24333


…kfill

After backfill completes, we directly go to the Recovered state without
releasing reservations. The outstanding reservations cause double reservation
issues.

Creating a custom_reaction Backfilled, allows us to release reservations,
before transiting to the Recovered state.

Signed-off-by: Neha Ojha nojha@redhat.com
(cherry picked from commit 1abc232)

  • References tracker ticket
  • Updates documentation if necessary
  • Includes tests for new functionality or reproducer for bug
@neha-ojha

This comment has been minimized.

Member

neha-ojha commented Aug 9, 2018

Missing tracker reference:
Fixes: http://tracker.ceph.com/issues/24333

@neha-ojha neha-ojha changed the title from PG: add custom_reaction Backfilled and release reservations after bac… to luminous: PG: add custom_reaction Backfilled and release reservations after bac… Aug 9, 2018

@tchaikov tchaikov added this to the luminous milestone Aug 9, 2018

@VictorDenisov

This comment has been minimized.

Contributor

VictorDenisov commented Aug 9, 2018

🤦‍♂️ yeah, forgot the ticket link, pardon

@yuriw yuriw requested review from tchaikov and smithfarm Aug 14, 2018

@jdurgin jdurgin added the core label Aug 20, 2018

@jdurgin jdurgin added the needs-qa label Aug 20, 2018

@smithfarm

@VictorDenisov Please append the following text to the commit message:


Conflicts:
        src/osd/PG.cc
        src/osd/PG.h

This signifies to "posterity" that the commit did not cherry-pick cleanly and that manual conflict resolution was undertaken. Bonus points for adding a brief explanation of the conflict and/or what was done to resolve it.

Thanks!

@VictorDenisov VictorDenisov force-pushed the VictorDenisov:backport_24333 branch from 4ac90c2 to 90f3ae9 Sep 12, 2018

@yuriw

This comment has been minimized.

Contributor

yuriw commented Sep 15, 2018

@yuriw

This comment has been minimized.

Contributor

yuriw commented Sep 20, 2018

@VictorDenisov

This comment has been minimized.

Contributor

VictorDenisov commented Sep 25, 2018

Ok. Looking into it.

@VictorDenisov

This comment has been minimized.

Contributor

VictorDenisov commented Sep 25, 2018

@neha-ojha I'm not allowed to see the last like that you gave me. Do I need to have any permissions for it?

@neha-ojha

This comment has been minimized.

Member

neha-ojha commented Sep 25, 2018

@VictorDenisov

This comment has been minimized.

Contributor

VictorDenisov commented Sep 25, 2018

Yes. I downloaded the log. Thanks.

@yuriw

This comment has been minimized.

Contributor

yuriw commented Sep 26, 2018

@VictorDenisov we need this PR for 12.2.9 pls let know if it's ready

@yuriw

This comment has been minimized.

Contributor

yuriw commented Sep 26, 2018

@yuriw

This comment has been minimized.

Contributor

yuriw commented Sep 28, 2018

@VictorDenisov pls add "needs-qa" ta when ready

@yuriw

This comment has been minimized.

Contributor

yuriw commented Oct 1, 2018

@VictorDenisov For next time: the "Conflicts" section always goes below the "(cherry picked from ...)" line. Thanks!

@smithfarm

This comment has been minimized.

Contributor

smithfarm commented Oct 2, 2018

@VictorDenisov Try git checkout luminous ; git log --grep Conflicts for some examples of what it should look like.

@VictorDenisov VictorDenisov force-pushed the VictorDenisov:backport_24333 branch from 90f3ae9 to d6dcf51 Oct 3, 2018

@VictorDenisov

This comment has been minimized.

Contributor

VictorDenisov commented Oct 3, 2018

@neha-ojha Is it the error that you are referring to:

2018-09-22T16:15:16.839 INFO:tasks.ceph.osd.4.smithi144.stderr: ceph version 10.2.11-2-gc2d7b39 (c2d7b39dda3c2f8102698c48eef561b8e5853be8)
2018-09-22T16:15:16.839 INFO:tasks.ceph.osd.4.smithi144.stderr: 1: (()+0xa48b8e) [0x55fadb6cfb8e]
2018-09-22T16:15:16.840 INFO:tasks.ceph.osd.4.smithi144.stderr: 2: (()+0x11390) [0x7f6dd2315390]
2018-09-22T16:15:16.840 INFO:tasks.ceph.osd.4.smithi144.stderr: 3: (gsignal()+0x38) [0x7f6dd02b1428]
2018-09-22T16:15:16.840 INFO:tasks.ceph.osd.4.smithi144.stderr: 4: (abort()+0x16a) [0x7f6dd02b302a]
2018-09-22T16:15:16.840 INFO:tasks.ceph.osd.4.smithi144.stderr: 5: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x26b) [0x55fadb7d180b]
2018-09-22T16:15:16.840 INFO:tasks.ceph.osd.4.smithi144.stderr: 6: (PG::RecoveryState::Crashed::Crashed(boost::statechart::state<PG::RecoveryState::Crashed, PG::RecoveryState::RecoveryMachine, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0>::my_context)+0xc7) [0x55fadb135207]
2018-09-22T16:15:16.840 INFO:tasks.ceph.osd.4.smithi144.stderr: 7: (()+0x4e8806) [0x55fadb16f806]
2018-09-22T16:15:16.840 INFO:tasks.ceph.osd.4.smithi144.stderr: 8: (boost::statechart::simple_state<PG::RecoveryState::ReplicaActive, PG::RecoveryState::Started, PG::RecoveryState::RepNotRecovering, (boost::statechart::history_mode)0>::react_impl(boost::statechart::event_base const&, void const*)+0x2d6) [0x55fadb19f496]
2018-09-22T16:15:16.840 INFO:tasks.ceph.osd.4.smithi144.stderr: 9: (boost::statechart::simple_state<PG::RecoveryState::RepNotRecovering, PG::RecoveryState::ReplicaActive, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0>::react_impl(boost::statechart::event_base const&, void const*)+0xb7) [0x55fadb1a0337]
2018-09-22T16:15:16.840 INFO:tasks.ceph.osd.4.smithi144.stderr: 10: (boost::statechart::state_machine<PG::RecoveryState::RecoveryMachine, PG::RecoveryState::Initial, std::allocator<void>, boost::statechart::null_exception_translator>::process_event(boost::statechart::event_base const&)+0x69) [0x55fadb17f809]
2018-09-22T16:15:16.840 INFO:tasks.ceph.osd.4.smithi144.stderr: 11: (PG::handle_peering_event(std::shared_ptr<PG::CephPeeringEvt>, PG::RecoveryCtx*)+0x395) [0x55fadb1532b5]
2018-09-22T16:15:16.841 INFO:tasks.ceph.osd.4.smithi144.stderr: 12: (OSD::process_peering_events(std::__cxx11::list<PG*, std::allocator<PG*> > const&, ThreadPool::TPHandle&)+0x2d4) [0x55fadb09c444]
2018-09-22T16:15:16.841 INFO:tasks.ceph.osd.4.smithi144.stderr: 13: (ThreadPool::BatchWorkQueue<PG>::_void_process(void*, ThreadPool::TPHandle&)+0x25) [0x55fadb0e5395]
2018-09-22T16:15:16.841 INFO:tasks.ceph.osd.4.smithi144.stderr: 14: (ThreadPool::worker(ThreadPool::WorkThread*)+0xdb1) [0x55fadb7c37f1]
2018-09-22T16:15:16.841 INFO:tasks.ceph.osd.4.smithi144.stderr: 15: (ThreadPool::WorkThread::entry()+0x10) [0x55fadb7c48f0]
2018-09-22T16:15:16.841 INFO:tasks.ceph.osd.4.smithi144.stderr: 16: (()+0x76ba) [0x7f6dd230b6ba]
2018-09-22T16:15:16.841 INFO:tasks.ceph.osd.4.smithi144.stderr: 17: (clone()+0x6d) [0x7f6dd038341d]
2018-09-22T16:15:16.841 INFO:tasks.ceph.osd.4.smithi144.stderr: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
@smithfarm

This comment has been minimized.

Contributor

smithfarm commented Oct 3, 2018

@VictorDenisov @neha-ojha The cherry-pick in this PR looks very different from the original 1abc232

I tried it, and it's too difficult ("non-trivial") for me. Maybe it would be better to close this and let @neha-ojha do the backport herself? (Some backports are better left to the developers...)

Alternatively, maybe @neha-ojha could review the cherry-pick and suggest fixes?

@VictorDenisov

This comment has been minimized.

Contributor

VictorDenisov commented Oct 3, 2018

This code in master underwent several refactorings which are not in luminous. Basically the code of Backfilled handler is derived from DeferBackfill handler. Though I may have missed the logic behind the refactorings in master. @neha-ojha I would like to try fixing it for several days before giving up, but if it's urgent and you decide to backport yourself please add me to your new PR - I would like to see the correct solution.

@neha-ojha

This comment has been minimized.

Member

neha-ojha commented Oct 4, 2018

@VictorDenisov I'd be happy to let you have another go at it. In the meanwhile, I will try to investigate what the cherry-pick in luminous should look like. Does that sound reasonable @smithfarm?

PG: add custom_reaction Backfilled and release reservations after bac…
…kfill

After backfill completes, we directly go to the Recovered state without
releasing reservations. The outstanding reservations cause double reservation
issues.

Creating a custom_reaction Backfilled, allows us to release reservations,
before transiting to the Recovered state.

Signed-off-by: Neha Ojha <nojha@redhat.com>
(cherry picked from commit 1abc232)

Conflicts:
        src/osd/PG.cc
        src/osd/PG.h

@VictorDenisov VictorDenisov force-pushed the VictorDenisov:backport_24333 branch from d6dcf51 to f512f90 Oct 9, 2018

@VictorDenisov

This comment has been minimized.

Contributor

VictorDenisov commented Oct 9, 2018

@neha-ojha I don't see how my change could cause crashed state transition. I looked at those refactorings in master that introduced cancel state: 762f3da, d4e4922. I don't see any significant changes in them. I rebased my change onto the latest luminous - maybe it can help.

@neha-ojha

This comment has been minimized.

Member

neha-ojha commented Oct 10, 2018

@yuriw Can we please test this PR individually to confirm #23493 (comment)?

@yuriw

This comment has been minimized.

Contributor

yuriw commented Oct 22, 2018

@yuriw yuriw merged commit 58bad60 into ceph:luminous Oct 25, 2018

4 checks passed

Docs: build check OK - docs built
Details
Signed-off-by all commits in this PR are signed
Details
Unmodified Submodules submodules for project are unmodified
Details
make check make check succeeded
Details

@VictorDenisov VictorDenisov deleted the VictorDenisov:backport_24333 branch Oct 26, 2018

@smithfarm smithfarm changed the title from luminous: PG: add custom_reaction Backfilled and release reservations after bac… to luminous: osd: PG: add custom_reaction Backfilled and release reservations after bac… Oct 26, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment