Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wip-v0.94.6-recovery-optimize #8083

Closed
wants to merge 1 commit into from

Conversation

sysnote
Copy link
Contributor

@sysnote sysnote commented Mar 14, 2016

This PR is based on #7325
there is something wrong on the origin PR #7325 , and I fix it based on v0.94.6 version;
also fix compatibility problem: use osd_recovery_partial config to control partial recover, default false, when upgrade, recover whole object, after upgraded, change osd_recovery_partial to true, then it can recover partial.

here is such an situation we solved:
consider such a upgrade situation which we need to upgrade to this can_recover_partial version:
eg. a pg 3.67 [0, 1, 2]
1)firstly, we update osd.0(service ceph restart osd.0), and recover normally, everything goes on;
2)a write req(eg. req1, will write to obj1) is sent to primary(osd.0), and pglog record such a req;
3)then we update osd.1, req1 send to osd.1 fail, but will send to osd.2, when osd.2 is dealing with the req(just in function do_request), pg3.67 starts peering, then on osd.7, it call can_discard_request to check that req1 should be dropped;
4)so the req1 only write successfuly on osd.0, because min_size=2, osd.0 re-enqueue the req1;
5)when peering, primary find that req1's object obj1 is missing on osd.1 and osd.2, so recover the object;
6)because osd.0 and osd.1 is already updated, osd.0 will calculate partial data in prep_push_to_replica, and osd.1 can deal with the partial data very well,
7)but osd.2 has not been updated, on osd.2's code logic(submit_push_data), it will remove origin object first, then write the partial data from osd.0, so the origin data of the object is lost;

overwrite Ops can be recover as partial content. Otherwise, run recovery as
the normal process;
fix compatibility problem: use osd_recovery_partial config to control partial
recover, default false, when upgrade, recover whole object, after upgraded,
Ochange osd_recovery_partial to true, then it can recover partial
@yuyuyu101
Copy link
Member

for most of features, we don't want to impl it in released version firstly. and in my option we also don't want to backport this feature. so you should consider to rebase with master.

@sysnote
Copy link
Contributor Author

sysnote commented Mar 14, 2016

thank you for your reply, i will try to rebase with master firstly.

@mslovy
Copy link
Contributor

mslovy commented Mar 15, 2016

I think we can do it base on the compatible feature bit during peering process. It is more reasonable than a config option. A config option just tells whether we need that feature, and despite of whether it is true or false, we should always guarantee a successful upgrading

@sysnote
Copy link
Contributor Author

sysnote commented Mar 21, 2016

@mslovy i have promote a new PR rebased with master(#8229), and have removed the config option, just check peer_features in do_osd_ops.

@yuyuyu101 yuyuyu101 closed this Mar 23, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants