Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GC may stall due to riak_cs_delete_fsm deadlock #949

Merged
merged 3 commits into from
Aug 13, 2014

Conversation

kellymclaughlin
Copy link
Contributor

GC can stall when a riak_cs_delete_fsm worker process encounters a deadlock condition. It is related to the delete_concurrency setting, but only has the potential to affect files that were uploaded using multipart upload.

The problem rests on the fact that we treat the delete_blocks_remaining field of the object manifest as an ordered set except when that set is initially determined for a manifest that contains metadata indicating it was a multipart upload. The initial assignment for delete_blocks_remaining can be seen here. The code that fails to properly order the list of upload parts is here.

The reason this is an issue and that riak_cs_delete_fsm processes hang around indefinitely is that the ordsets:del_element call is used to remove elements of the delete_blocks_remaining set as the delete workers (whose count is controlled by delete_concurrency) respond that their assigned blocks have been deleted. If a call to ordsets:del_element is made with an element that does not exist in the set or more imporantly in this case is made on an unordered set where the element for deletion sorts less than any key ahead of it in the the set, the return value is identical to the input set. This can come into play when you have delete_concurrency greater than 1 and a block delete worker, call it worker A, assigned to delete a block identified by {UUID, BlockNumber} return a response to the riak_cs_delete_fsm process before another delete worker, worker B, whose assigned {UUID, BlockNumber} pair sorts greater than that of worker A, but appears before the pair of worker A int the set. More concisely, if the block assigned to worker A sorts less than that assigned to worker B, but appears after the block assigned to worker B in the delete_blocks_remaining set in the manifest and worker A responds before worker B, then the entry for worker A will not be removed from the delete_blocks_remaining set . Since the delete fsm relies on delete_blocks_remaining being empty as a termination condition, the process will hang forever and GC stalls out.

Here is a snippet of the return value from a call to riak_cs_lfs_utils:block_sequences_for_manifest for a multipart-uploaded file:

[{<<65,96,107,167,183,167,76,217,141,81,114,254,144,244,52,133>>,0},
{<<65,96,107,167,183,167,76,217,141,81,114,254,144,244,52,133>>,1},
{<<65,96,107,167,183,167,76,217,141,81,114,254,144,244,52,133>>,2},
{<<65,96,107,167,183,167,76,217,141,81,114,254,144,244,52,133>>,3},
{<<65,96,107,167,183,167,76,217,141,81,114,254,144,244,52,133>>,4},
{<<65,96,107,167,183,167,76,217,141,81,114,254,144,244,52,133>>,5},
{<<65,96,107,167,183,167,76,217,141,81,114,254,144,244,52,133>>,6},
{<<65,96,107,167,183,167,76,217,141,81,114,254,144,244,52,133>>,7},
{<<65,96,107,167,183,167,76,217,141,81,114,254,144,244,52,133>>,8},
{<<65,96,107,167,183,167,76,217,141,81,114,254,144,244,52,133>>,9},
{<<65,96,107,167,183,167,76,217,141,81,114,254,144,244,52,133>>,10},
{<<65,96,107,167,183,167,76,217,141,81,114,254,144,244,52,133>>,11},
{<<65,96,107,167,183,167,76,217,141,81,114,254,144,244,52,133>>,12},
{<<65,96,107,167,183,167,76,217,141,81,114,254,144,244,52,133>>,13},
{<<65,96,107,167,183,167,76,217,141,81,114,254,144,244,52,133>>,14},
{<<59,96,159,218,40,102,71,175,150,203,86,162,121,73,239,151>>,0},
{<<59,96,159,218,40,102,71,175,150,203,86,162,121,73,239,151>>,1},
{<<59,96,159,218,40,102,71,175,150,203,86,162,121,73,239,151>>,2},
{<<59,96,159,218,40,102,71,175,150,203,86,162,121,73,239,151>>,3},
{<<59,96,159,218,40,102,71,175,150,203,86,162,121,73,239,151>>,4},
{<<59,96,159,218,40,102,71,175,150,203,86,162,121,73,239,151>>,5},
{<<59,96,159,218,40,102,71,175,150,203,86,162,121,73,239,151>>,6},
{<<59,96,159,218,40,102,71,175,150,203,86,162,121,73,239,151>>,7},
{<<59,96,159,218,40,102,71,175,150,203,86,162,121,73,239,151>>,8},
{<<59,96,159,218,40,102,71,175,150,203,86,162,121,73,239,151>>,9},
{<<59,96,159,218,40,102,71,175,150,203,86,162,121,73,239,151>>,10},
{<<59,96,159,218,40,102,71,175,150,203,86,162,121,73,239,151>>,11},
{<<59,96,159,218,40,102,71,175,150,203,86,162,121,73,239,151>>,12},
{<<59,96,159,218,40,102,71,175,150,203,86,162,121,73,239,151>>,13},
{<<59,96,159,218,40,102,71,175,150,203,86,162,121,73,239,151>>,14},
{<<65,3,98,125,155,70,68,164,142,30,86,92,118,80,72,124>>,0},
{<<65,3,98,125,155,70,68,164,142,30,86,92,118,80,72,124>>,1},
{<<65,3,98,125,155,70,68,164,142,30,86,92,118,80,72,124>>,2},
{<<65,3,98,125,155,70,68,164,142,30,86,92,118,80,72,124>>,3},
{<<65,3,98,125,155,70,68,164,142,30,86,92,118,80,72,124>>,4},
{<<65,3,98,125,155,70,68,164,142,30,86,92,118,80,72,124>>,5},
{<<65,3,98,125,155,70,68,164,142,30,86,92,118,80,72,124>>,6},
{<<65,3,98,125,155,70,68,164,142,30,86,92,118,80,72,124>>,7},
{<<65,3,98,125,155,70,68,164,142,30,86,92,118,80,72,124>>,8},
{<<65,3,98,125,155,70,68,164,142,30,86,92,118,80,72,124>>,9},
{<<65,3,98,125,155,70,68,164,142,30,86,92,118,80,72,124>>,10},
{<<65,3,98,125,155,70,68,164,142,30,86,92,118,80,72,124>>,11},
{<<65,3,98,125,155,70,68,164,142,30,86,92,118,80,72,124>>,12},
{<<65,3,98,125,155,70,68,164,142,30,86,92,118,80,72,124>>,13},
{<<65,3,98,125,155,70,68,164,142,30,86,92,118,80,72,124>>,14},
{<<55,111,232,89,145,102,68,183,181,224,42,97,177,53,222,22>>,0},
{<<55,111,232,89,145,102,68,183,181,224,42,97,177,53,222,22>>,1},
{<<55,111,232,89,145,102,68,183,181,224,42,97,177,53,222,22>>,2},
{<<55,111,232,89,145,102,68,183,181,224,42,97,177,53,222,22>>,3},
{<<55,111,232,89,145,102,68,183,181,224,42,97,177,53,222,22>>,4},
{<<55,111,232,89,145,102,68,183,181,224,42,97,177,53,222,22>>,5},
{<<55,111,232,89,145,102,68,183,181,224,42,97,177,53,222,22>>,6},
{<<55,111,232,89,145,102,68,183,181,224,42,97,177,53,222,22>>,7},
{<<55,111,232,89,145,102,68,183,181,224,42,97,177,53,222,22>>,8},
{<<55,111,232,89,145,102,68,183,181,224,42,97,177,53,222,22>>,9},
{<<55,111,232,89,145,102,68,183,181,224,42,97,177,53,222,22>>,10},
{<<55,111,232,89,145,102,68,183,181,224,42,97,177,53,222,22>>,11},
{<<55,111,232,89,145,102,68,183,181,224,42,97,177,53,222,22>>,12},
{<<55,111,232,89,145,102,68,183,181,224,42,97,177,53,222,22>>,13},
{<<55,111,232,89,145,102,68,183,181,224,42,97,177,53,222,22>>,14},
{<<205,40,202,58,57,21,69,232,143,194,10,120,82,3,214,169>>,0},
{<<205,40,202,58,57,21,69,232,143,194,10,120,82,3,214,169>>,1}...

@kellymclaughlin kellymclaughlin added this to the 1.5.1 milestone Aug 11, 2014
@kellymclaughlin kellymclaughlin self-assigned this Aug 11, 2014
kellymclaughlin and others added 2 commits August 11, 2014 16:32
This commit addresses an issue where the use of delete concurrency for
removal of the blocks of a single object can cause a
riak_cs_delete_fsm worker process to deadlock and prevent the entire
Riak CS garbage collection from making progress. This bug only
manifests for objects that were uploaded using multipart upload.

The problem rests on the fact that the delete_blocks_remaining field
of the object manifest is treated as an ordered set except when that
set is initially determined for a manifest that contains metadata
indicating it was a multipart upload.

The reason this is an issue and that riak_cs_delete_fsm processes can
persist indefinitely is that the ordsets:del_element call is used to
remove elements of the delete_blocks_remaining set as the delete
workers (whose count is controlled by delete_concurrency) respond that
their assigned blocks have been deleted. If a call to
ordsets:del_element is made with an element that does not exist in the
set or more importantly in this case is made on an unordered set where
the element for deletion sorts less than any key ahead of it in the
the set, the return value is identical to the input set. This can come
into play when you have delete_concurrency greater than 1 and a block
delete worker, call it worker A, assigned to delete a block identified
by {UUID, BlockNumber} return a response to the riak_cs_delete_fsm
process before another delete worker, worker B, whose assigned {UUID,
BlockNumber} pair sorts greater than that of worker A, but appears
before the pair of worker A int the set. More concisely, if the block
assigned to worker A sorts less than that assigned to worker B, but
appears after the block assigned to worker B in the
delete_blocks_remaining set in the manifest and worker A responds
before worker B, then the entry for worker A will not be removed from
the delete_blocks_remaining set. Since the delete fsm relies on
delete_blocks_remaining being empty as a termination condition, the
process will hang forever and GC stalls out.

This commit resolves the issue by ensuring that
riak_cs_lfs_utils:block_sequences_for_manifest/1 only returns an
ordered set regardless of the input manifest.

Additionally this commit also adds some type specs to other key
functions in the riak_cs_lfs_utils module and refactors a code block
in riak_cs_delete_fsm:blocks_to_delete_from_manifest to not export
variables from within a case statement.
@kellymclaughlin
Copy link
Contributor Author

Thanks to @bowrocker for providing an eqc property to verify the changes to the block_sequences_for_manifest function provide properly sorted results.

PartBlocks = initial_blocks(PartManifest?PART_MANIFEST.content_length,
SafeBlockSize,
PartManifest?PART_MANIFEST.part_id),
lists:usort(Parts ++ PartBlocks)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nitpick, Parts is mostly a longer list than PartBlocks , so PartBlocks ++ Parts will be slightly lighter. And maybe the list doesn't need to be sorted here, because it is sorted inside ordsets:from_list .

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am glad you mentioned the sorting here. I did it this way very intentionally. Here is some test data to show my motivation for doing so: https://gist.github.com/kellymclaughlin/3768652d9af269c45897.

I will take your suggestion and push a commit to addres the order of the list append operation. Good idea.

@kuenishi
Copy link
Contributor

  • riak_test
  • code review

Recently repl_test fails non-deterministically, but that's another issue.

@kuenishi
Copy link
Contributor

+1 d09ed41

Regardless of my nitpick comment; either fixing or not fixing it would be fine.

@kuenishi
Copy link
Contributor

Note: I've been trying to reproduce the problem with my local env and failing for almost an hour, although I succeeded reproducing the problem with release/1.5 in minutes.

borshop added a commit that referenced this pull request Aug 13, 2014
GC may stall due to  riak_cs_delete_fsm deadlock

Reviewed-by: kuenishi
@kellymclaughlin
Copy link
Contributor Author

@borshop merge

borshop added a commit that referenced this pull request Aug 13, 2014
GC may stall due to  riak_cs_delete_fsm deadlock

Reviewed-by: bowrocker
@borshop borshop merged commit 272b852 into release/1.5 Aug 13, 2014
@kellymclaughlin kellymclaughlin deleted the bugfix/delete-fsm-stall branch August 13, 2014 20:02
@kuenishi kuenishi added the Bug label Sep 27, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants