Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

recover metablk in subtype order #200

Merged
merged 4 commits into from
Nov 7, 2023

Conversation

JacksonYao287
Copy link
Contributor

@JacksonYao287 JacksonYao287 commented Oct 9, 2023

In HO , when metaservice restart, we actually want to recover repl-dev before PGinfo and recover PGinfo before Shardinfo.
but metaservice will recover all the metablk only according to blkid and we have to use some pending method to solve this problem.
https://github.com/eBay/HomeObject/blob/dbde1bb6b2eb705fc55f7539be08d9c9fdbbd508/src/lib/homestore_backend/hs_pg_manager.cpp#L126

https://github.com/eBay/HomeObject/blob/dbde1bb6b2eb705fc55f7539be08d9c9fdbbd508/src/lib/homestore_backend/hs_homeobject.cpp#L105

this PR do some change in metaservice recovery logic to make sure all the metablk will be recovered by subtype, and the subtype order is that appear order in the metablk chain.

@JacksonYao287
Copy link
Contributor Author

after this PR is merge , we can change the recovery logic in this patch to a simple way

@codecov-commenter
Copy link

codecov-commenter commented Oct 9, 2023

Codecov Report

Attention: 28 lines in your changes are missing coverage. Please review.

Comparison is base (9dc9130) 68.98% compared to head (c660c6b) 68.91%.

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #200      +/-   ##
==========================================
- Coverage   68.98%   68.91%   -0.08%     
==========================================
  Files          93       93              
  Lines        7581     7611      +30     
  Branches      969      979      +10     
==========================================
+ Hits         5230     5245      +15     
- Misses       1901     1907       +6     
- Partials      450      459       +9     
Files Coverage Δ
src/include/homestore/meta_service.hpp 100.00% <ø> (ø)
src/lib/blkdata_svc/blk_read_tracker.hpp 100.00% <ø> (ø)
src/lib/meta/meta_sb.hpp 41.66% <ø> (ø)
src/lib/homestore.cpp 84.35% <0.00%> (ø)
src/lib/common/homestore_utils.cpp 75.55% <82.35%> (+4.12%) ⬆️
src/lib/meta/meta_blk_service.cpp 49.37% <48.88%> (+1.56%) ⬆️

... and 7 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

yamingk
yamingk previously requested changes Oct 9, 2023
src/lib/meta/meta_blk_service.cpp Outdated Show resolved Hide resolved
@JacksonYao287
Copy link
Contributor Author

I think it is better to clarify this PR again.

1 what is the current metaservice recovery process?

the current recovery process of metaservice is recovering all the metablks according to BlkID, and know nothing about the upper layer logic. so from the perspective of upper layer , no recovery order guaranteeed.

for the upper layer user, they should adapt the out-of-order of metaservice recovery and implement the upper layer recovery logic carefully by themselves. that is it.

2 what does this PR do?

this PR changes the default recovery order of metaservice from out-of-order to the order of the presence of subsystem in the metablk chain. the presence of subsystem means the presence of the first metablk of the subsystem in the metablk chain. for example , suppose we have 3 subsystems A, B and C , and in each subsystem we have 3 metablk( A1, A2 ,A3 ),(B1,B2,B3),(C1,C2,C3). the matablk chain is as follows:
A1->C1->C2->A2->B1->C3->B2->A3->B3
current recovery order will be ordered by the blkid among all the subsystems, which is not definite.

since A1 comes before C1 and C1 comes before B1, so subsystem A comes before C and C come before B. after the PR, the recovery order will be
(A1,A2,A3) -> (C1,C2,C3) ->(B1,B2,B3)
in each subsystem, it will be ordered by blkid. But among subsystems, it will be ordered by the presence of subsystem

this gives the upper layer a chance to untilize this recovery order to simplify the upper layer recovery logic. well , the upper lay can also ignore this change and do like before. note that , here, the upper layer can not control the order of metaservice recovery, and we also do not want to give the chance to upper layer to control until we really need.

so , in fact , what this PR does is to give a default natural order of metaservice recovery, and no extra performance issue will be involved.

3 is it necessary for the upper layer to control the recover order of metaservice in fine granularity

i don`t think it is necessary for now at least. the reason is :
1 upper layer user will take the responsibility to pass in the correct order.
2 inside homestore, there are already some metaservice user, such as CPmanager. how do we handle the recovery order of these subsystem with those passed in subsystem. although this will not be difficult to choose a policy and implement, but it seems not uniformed that the upper layer can only partially control the recovery order. they can not control the order of those existed subsystem in the homestore.

so my suggestion is we can delay to implement fine-granularity control of the metaservice recovery order until we have a clear scenario . for now, we can just use the newly added default recovery order to simplify HO Implementation

@xiaoxichen
Copy link
Collaborator

xiaoxichen
xiaoxichen previously approved these changes Oct 24, 2023
Copy link
Collaborator

@xiaoxichen xiaoxichen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm in implementation

A few improvement area in UT but the existing UT covered the logic

src/tests/test_meta_blk_mgr.cpp Outdated Show resolved Hide resolved
src/tests/test_meta_blk_mgr.cpp Show resolved Hide resolved
zichanglai
zichanglai previously approved these changes Oct 24, 2023
Copy link
Contributor

@zichanglai zichanglai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it will be better to use reference & for MetaBlkService::topological_sort parameter

@JacksonYao287 JacksonYao287 modified the milestone: MileStone3.11 Oct 28, 2023
add dependencied for metaservice

Signed-off-by: Jie Yao <jyao3@ebay.com>
xiaoxichen
xiaoxichen previously approved these changes Nov 2, 2023
Copy link
Collaborator

@xiaoxichen xiaoxichen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

src/lib/meta/meta_blk_service.cpp Outdated Show resolved Hide resolved
meta_subtype_vec_t independent_subtypes;
meta_subtype_vec_t ordered_subtypes;

if (hs_utils::topological_sort(m_dep_topo_graph, ordered_subtypes, &independent_subtypes)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we move this sort out of MetaBlkService::recover and put it into either MetaBlkService::start (before ::recover) or into a new function (e.g. validate_register_dep(...)) to indicate this is a validation of register dependency, since this is not part of recovery.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for now, we use hs_utils::topological_sort to both check circle and get the ordered_subtypes. if we call this in another function out of recovery, we need to pass the ordered_subtypes to MetaBlkService::recover, which need to change the signature of MetaBlkService::recover , and thus we need to do a lot of changes in the whole metaservice UT. can we do it in a seperate PR?

if we do not want to change the signature of MetaBlkService::recover, we can add ordered_subtypes as a member in MetaBlkService so that MetaBlkService::recover can read it, but it is generate from the m_dep_topo_graph and only be used once, i do not think it is a good fashion

Copy link
Collaborator

@xiaoxichen xiaoxichen Nov 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the sort in recovery or in validate_register_dep doesnt have any behavior change, more for the code style. It looks to me also seems wired that validate_register_dep returns a order and passed to MetaBlkService::recover

Maybe do the topo-sort twice is a proper solution for the sake of code-style...

Honestly I dont think we need to spend more cycle on this style/preference and blocking the PR (other comments needed to be addressed), as there is significant communication overhead to let other believe this is better compare to having another PR to fix to your prefer one and show the beauty of another implementation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't get why we want to put more stuff in a function that is apparent not part of its job, and also would cause some confusion to the reader. The validation simply is NOT part of recovery, it is part of registration, if it fails, it shouldn't start recover process. Another benefit is if some use case want to call ::recover multiple times (either to measure the speed of recovery or test the dep/non-dep order correctness), it shouldn't do the sort multiple times, but just do recover with its clients.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logically it is , the point is the cycle-detection is actually a topo-sort, so we either

  1. do the topo-sort twice , one for validation at the end of registration(validate_register_dep) , and one for recover.
  2. let recovery() accept ordered_subtypes and independent_subtypes, which returned from validate_register_dep and passed in here.
  3. Put the validation in recovery()

I think 2) will makes the flow more complicate and other caller of recovery will be more confusing. I am fine with 1 or 3. Though honestly dont see too much benefit.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no need for independent_subtypes, where we can get from m_sub_info.
This is not a major comment. Feel free to merge as it as. The rest looks good to me.
But please expect some refactoring as to this part in next PR.

src/lib/meta/meta_sb.hpp Outdated Show resolved Hide resolved
@JacksonYao287 JacksonYao287 merged commit 3911682 into eBay:master Nov 7, 2023
16 checks passed
@JacksonYao287 JacksonYao287 deleted the metaservice branch November 7, 2023 02:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants