Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rgw: fix index suggest #22798

Closed
wants to merge 3 commits into from
Closed

Conversation

tianshan
Copy link
Contributor

@tianshan tianshan commented Jul 2, 2018

1.fix the test failure http://tracker.ceph.com/issues/24640
2.fix the race issue http://tracker.ceph.com/issues/24744
3.add test cases for races issues

since we do nothing to the not exists index, just continue to
process other indexes and pass the test case

Fixes: http://tracker.ceph.com/issues/24640

Signed-off-by: Tianshan Qu <tianshan@xsky.com>
a special sequence can cause this new situation.
IO sequence:
1.put index prepare
2.list, get stale index
3.check_disk_state, find the head obj not exist
4.write head obj
5.index complete
6.aio_operate dir_suggest_changes CEPH_RGW_REMOVE
step 6 will delete the index

Fixes: http://tracker.ceph.com/issues/24744

Signed-off-by: Tianshan Qu <tianshan@xsky.com>
1.recover index from put crash after complete
2.list raced with put, index_suggest should not delete index
3.list raced with delete, index_suggest should not recover index

Signed-off-by: Tianshan Qu <tianshan@xsky.com>
@tianshan
Copy link
Contributor Author

tianshan commented Jul 2, 2018

@ivancich please help review

@ivancich
Copy link
Member

ivancich commented Jul 2, 2018

@tianshan Will do. Thank you.

@tianshan
Copy link
Contributor Author

tianshan commented Jul 3, 2018

the fix for issue 2, seems miss a situation that delete raced with some other op, but the other op is canceled, so the stale index can never be deleted.
for the countless situation, we have an idea that use the epoch or ver to detect if index is changed after last list, if ver is smaller than cur_disk, just discard the index suggest. this solution maybe more clear to all situations.

@ivancich ivancich self-requested a review July 3, 2018 14:34
@ivancich
Copy link
Member

ivancich commented Jul 5, 2018

@tianshan Do I understand correctly that you'll be updating this PR with a fix that: a) handles the race condition, and b) is clearer? If that's true, I should wait for the update, right?

@ivancich
Copy link
Member

ivancich commented Jul 5, 2018

I verified that this version of the PR does not generate the unit test failure.
I checked for any regressions using the teuthology rgw suite and did not find any.
Here is the output without this fix: http://pulpito.ceph.com/ivancich-2018-07-04_19:14:06-rgw-wip-fix-index-suggest-premaster-distro-basic-smithi/
Here is the output with this fix: http://pulpito.ceph.com/ivancich-2018-07-03_22:14:02-rgw-wip-fix-index-suggest-master-distro-basic-smithi/

@ivancich
Copy link
Member

ivancich commented Jul 6, 2018

I ran the race condition inducing test (https://github.com/ivancich/rgw-race-inducer) for about 20 hours and the race did not take place.

@tianshan I remain unclear as to whether you plan on following up with an update to the pr that addresses the issues you raised above.

@tianshan
Copy link
Contributor Author

tianshan commented Jul 7, 2018

@ivancich the first commit fix the ut issue, and the second race condition is a new issue when I write some new ut cases, so I propose a fix. But after talk with my colleague, we think the fix maybe introduced some new issue, and discussed with a new fix idea.
I think we can separate the first commit to a new pr which address the ut failure only, than we can discussed the race condition step by step.

@tianshan
Copy link
Contributor Author

tianshan commented Jul 8, 2018

@ivancich the ut failure fix seperate to #22937, please help test again, may merge that first

@ivancich
Copy link
Member

This PR has been replaced by #22937. Closing.

@ivancich ivancich closed this Jul 17, 2018
ivancich added a commit to ivancich/ceph-fork that referenced this pull request Jul 24, 2018
which he later pulled. Nonetheless, due to extensive testing by Mark
Kogan <mkogan@redhat.com>, this commit seems necessary to address
customer's issue.

See PR 22798 (ceph#22798).

The message for the original commit is:
====

rgw: fix list op raced with put op maybe cause index delete

a special sequence can cause this new situation.
IO sequence:
1.put index prepare
2.list, get stale index
3.check_disk_state, find the head obj not exist
4.write head obj
5.index complete
6.aio_operate dir_suggest_changes CEPH_RGW_REMOVE

step 6 will delete the index

Fixes: http://tracker.ceph.com/issues/24744

====

Signed-off-by: J. Eric Ivancich <ivancich@redhat.com>
ivancich added a commit to ivancich/ceph-fork that referenced this pull request Jul 24, 2018
This comes from a commit submitted by Tianshan Qu <tianshan@xsky.com>,
which he later pulled. Nonetheless, due to extensive testing by Mark
Kogan <mkogan@redhat.com>, this commit seems necessary to address
customer's issue.

See PR 22798 (ceph#22798).

The message for the original commit is:
====
a special sequence can cause this new situation.
IO sequence:
1.put index prepare
2.list, get stale index
3.check_disk_state, find the head obj not exist
4.write head obj
5.index complete
6.aio_operate dir_suggest_changes CEPH_RGW_REMOVE

step 6 will delete the index

Fixes: http://tracker.ceph.com/issues/24744

====

Signed-off-by: J. Eric Ivancich <ivancich@redhat.com>
ivancich added a commit to ivancich/ceph-fork that referenced this pull request Jul 24, 2018
This comes from a commit submitted by Tianshan Qu <tianshan@xsky.com>,
which he later pulled. Nonetheless, due to extensive testing by Mark
Kogan <mkogan@redhat.com>, this commit seems necessary to address
customer's issue.

See PR 22798 (ceph#22798).

The message for the original commit is:
====
a special sequence can cause this new situation.
IO sequence:
1.put index prepare
2.list, get stale index
3.check_disk_state, find the head obj not exist
4.write head obj
5.index complete
6.aio_operate dir_suggest_changes CEPH_RGW_REMOVE

step 6 will delete the index

Fixes: http://tracker.ceph.com/issues/24744

====

Signed-off-by: J. Eric Ivancich <ivancich@redhat.com>

resolves: rhbz#1584220
jdurgin pushed a commit to jdurgin/ceph that referenced this pull request Nov 20, 2018
This comes from a commit submitted by Tianshan Qu <tianshan@xsky.com>,
which he later pulled. Nonetheless, due to extensive testing by Mark
Kogan <mkogan@redhat.com>, this commit seems necessary to address
customer's issue.

See PR 22798 (ceph#22798).

The message for the original commit is:
====
a special sequence can cause this new situation.
IO sequence:
1.put index prepare
2.list, get stale index
3.check_disk_state, find the head obj not exist
4.write head obj
5.index complete
6.aio_operate dir_suggest_changes CEPH_RGW_REMOVE

step 6 will delete the index

Fixes: http://tracker.ceph.com/issues/24744

====

Signed-off-by: J. Eric Ivancich <ivancich@redhat.com>

resolves: rhbz#1584220
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants