Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Synchronous deletion of small objects #1174

Merged
merged 10 commits into from
Jul 7, 2015
Merged

Conversation

kuenishi
Copy link
Contributor

  • add an configuration item active_delete_threshold (default is 0)
  • small objects (UUIDs) above that threshold are synchronously deleted
    right after marked as pending_delete
    , while others are to be
    marked as scheduled_delete
  • Blocks of objects smaller than that threshold are synchronously deleted while its manifests will marked as scheduled_delete and stored there
  • The deletion involves invoking riak_cs_delete_fsm and removing the
    UUID and manifest from that object history
  • This does not change manifest handling semantics where any failure
    between marking as pending_delete and scheduled_delete.
  • If active deletion failed, it will be handled next time that object
    is updated as pending_delete manifests as before.
  • This commit introduces risk of blocks leak, in case where deleted
    UUIDs could be erased at sink side without deleting all blocks. If
    any RTQ drop were detected, block leak checker should be run against
    sink cluster.

Addresses #1138.

@kuenishi kuenishi added this to the 2.1.0 milestone Jun 29, 2015
@kuenishi kuenishi force-pushed the feature/active-delete branch 3 times, most recently from d6844a3 to 3e3f9a6 Compare June 29, 2015 07:15
@kuenishi kuenishi changed the title [wip] Feature/active delete Synchronous deletion of small objects Jun 29, 2015
* add an configuration item `active_delete_threshold` (default is 0)
* small objects (UUIDs) above that threshold are synchronously deleted
  right after marked as `pending_delete`, while others are to be
  marked as `scheduled_delete`
* The deletion involves invoking `riak_cs_delete_fsm` and removing the
  UUID and manifest from that object history
* This does not change manifest handling semantics where any failure
  between marking as `pending_delete` and `scheduled_delete`.
* If active deletion failed, it will be handled next time that object
  is updated as `pending_delete` manifests as before.
* This commit introduces risk of blocks leak, in case where deleted
  UUIDs could be erased at sink side without deleting all blocks. If
  any RTQ drop were detected, block leak checker should be run against
  sink cluster.
[{cleanup_manifests, false}]],
{ok, Pid} = riak_cs_delete_fsm_sup:start_delete_fsm(node(), Args),
Ref = erlang:monitor(process, Pid),
receive
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens when delete_fsm dies unexpectedly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will be catched by the last clause of receive...

@kuenishi
Copy link
Contributor Author

@shino
Copy link
Contributor

shino commented Jun 30, 2015

Some corner cases:

  1. When the last manifest (UUID) is actively deleted. If there is only one active UUID for certain key and it is deleted, what happens for the manifest history objects?
  2. If a riak node has stopped and is offline, then a UUIDs has been written to fallback nodes. After that, delete API is called before handoff finishes. If the UUID is completely removed from manifest history in primary vnodes, does handoff cause manifest ressurection without blocks?

@kuenishi
Copy link
Contributor Author

I'd like to cover those corner cases in next pull request, to keep this work small - fair?

@shino
Copy link
Contributor

shino commented Jun 30, 2015

Yup. I must not forget about them 😅

@kuenishi
Copy link
Contributor Author

Fmmm, but do you think covering them in documentation are enough, like mentioning "use this feature special care if you combine with mdc replication"?

@shino
Copy link
Contributor

shino commented Jun 30, 2015

Two corner cases above have nothing to do with MDC. They will occur in single cluster environment.

@kuenishi
Copy link
Contributor Author

oops, confused

@shino
Copy link
Contributor

shino commented Jun 30, 2015

I had to be clearer. I am afraid for manifest ressurrection is because it is end user facing issue. For something just bothering operators, we may be tolerent (depending on operators). However, for end user facing issue, such switch should not be turned on for public storage service based on riak cs.

On the other hand, empty manifest history is NOT end user facing, so I think it as minor.

-spec maybe_delete_small_objects([cs_uuid_and_manifest()], riak_client(), non_neg_integer()) ->
{[cs_uuid_and_manifest()], [cs_uuid()]}.
maybe_delete_small_objects(Manifests, RcPid, Threshold) ->
{ok, BagId} = riak_cs_riak_client:get_manifest_bag(RcPid),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will follow up with multibag code.

@kuenishi
Copy link
Contributor Author

Besides issues bellow, this is ready for review again as well as basho/riak_cs_multibag#26 . Let's discuss 3 options after this got merged.

How to handle scheduled_delete manifests not in GC bucket

As @shino pointed out, there are two corner cases where deleted manifests could resurrect if we delete UUIDs during active block deletion. Instead I decided to leave those manifests as scheduled_delete without moving to GC bucket. This will be fixed in next pull request following, but I want to leave here a sketch.

Option 1: leave it

GC won't collect them, as riak_cs_delete_fsm deletes only one specific UUID and manifest. The only way this could be deleted is a limitation by max_scheduled_delete_manifests after leeway period.

Option 2: Add a new state deleted and make some passive collector

Mark it when deleted - and collect them after any time manifests are to be updated. Only old deleted manifests after leeway period will be removed.

Option 3: Add a new state deleted and make some active collector

Other than GC make some new daemon process that walk around all manifests and clean them up as well as cleaning up siblings.

{PDManifests0, []}
end,

PDUUIDs = [UUID || {UUID, _} <- ToGC],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PDUUIDs is used only in the ok banch below and not in the other. Moving this line into ok branch would be better.

@kuenishi
Copy link
Contributor Author

kuenishi commented Jul 1, 2015

So we chose Option 1 as its assumption was wrong and all manifests will be collected after leeway period. I think I've addressed all your comments, @shino. Please take a look again. Also, I'm having problem with my riak_test installation, I'd be happy if you run it with multibag enabled.

@kuenishi
Copy link
Contributor Author

kuenishi commented Jul 3, 2015

All riak_test has passed, it's ready for review now, although it's Friday afternoon ...

@shino
Copy link
Contributor

shino commented Jul 6, 2015

Nice summary for resurrection vs lingering of manifest trade-off. Agree with adopting option 1.

@shino
Copy link
Contributor

shino commented Jul 6, 2015

Not for code, but wiki page https://github.com/basho/riak_cs/wiki/Active-Deletion-of-Small-Objects,

For section : https://github.com/basho/riak_cs/wiki/Active-Deletion-of-Small-Objects#things-to-be-cared-in-production

  • "replication" should be "MDC replication"
  • Third bullet point is not restricted to MDC, more general caution

@@ -200,6 +200,16 @@
hidden
]}.

%% @doc Small objects to be synchronously deleted including blocks on
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After some discussion, only blocks are deleted on the fly. Then, "including blocks" may lead to misunderstanding.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we mention lingering manifest here too?

@shino
Copy link
Contributor

shino commented Jul 6, 2015

In multiple delete, following errors happened.

2015-07-06 12:00:12.327 [error] <0.489.0>@riak_cs_gc:maybe_delete_small_objects:417 Active deletion failed. Reason: {'DOWN',#Ref<0.0.2.17170>,process,<0.2855.0>,normal}
2015-07-06 12:00:12.342 [error] <0.489.0>@riak_cs_gc:maybe_delete_small_objects:417 Active deletion failed. Reason: {maybe_delete_small_objects,{<0.2859.0>,{ok,{<<"test">>,<<"bbb">>,<<125,204,10,209,70,191,76,214,129,64,7,21,151,5,120,224>>,1,1}}}}
2015-07-06 12:00:12.350 [error] <0.489.0>@riak_cs_gc:maybe_delete_small_objects:417 Active deletion failed. Reason: {maybe_delete_small_objects,{<0.2863.0>,{ok,{<<"test">>,<<"dev1/bin/riak-cs">>,<<42,245,118,34,46,46,77,111,158,185,215,220,3,183,37,74>>,1,1}}}}
2015-07-06 12:00:12.362 [error] <0.489.0>@riak_cs_gc:maybe_delete_small_objects:417 Active deletion failed. Reason: {maybe_delete_small_objects,{<0.2867.0>,{ok,{<<"test">>,<<"dev1/bin/riak-cs-access">>,<<101,169,106,229,77,208,75,182,185,229,250,46,26,167,233,135>>,1,1}}}}
[repeating...]

In riak_cs_gc,

receive
    {maybe_delete_small_objects, {Pid, {ok, _}}} ->
        %% successfully deleted
        erlang:demonitor(Ref),

If delete fsm dies before demonitor, this process gets DOWN message.
Then next delete time, the DOWN message is received in catch-all clause.
At next next delete, it receives previous (=next) {maybe_delete_small_objects, _}
and results in error... repeating...

@shino
Copy link
Contributor

shino commented Jul 6, 2015

Executed basho_bench with cs2 driver for this branch and found
HTTP response of 400 status.

The gist [1] includes

  • HTTP Req/Res (follow TCP stream output from Wireshark),
  • Packet details (also from Wireshark) and
  • basho_bench config.

Execution with redbug

redbug:start("mochiweb_http:handle_invalid_msg_request -> return;stack",
             [{print_depth, 10}, {msgs, 1000}]).

shows, DOWN messages are consumed by mochiweb

15:07:46 <0.2912.0>({mochiweb_acceptor,init,3}) {mochiweb_http,
                                                 handle_invalid_msg_request,
                                                 [{'DOWN',#Ref<0.0.0.222599>,
                                                   process,<0.3186.0>,normal},
                                                  #Port<0.15499>,
                                                  {'GET',{abs_path,...},{...}},
                                                  []]}
  proc_lib:init_p_do_apply/3
  mochiweb_http:handle_invalid_msg_request/2

15:07:46 <0.2912.0>(dead) mochiweb_http:handle_invalid_msg_request/4 -> {exit,
                                                                         normal}

[1] https://gist.github.com/shino/9c17ec6b1eb0caa8a26c

@shino
Copy link
Contributor

shino commented Jul 6, 2015

It seems like the commit b4af3a9 fixed the above HTTP 400 response too. Great 😄

@shino
Copy link
Contributor

shino commented Jul 6, 2015

Great work! I will run all riak_test cases and +1 after a few remaining comments are reflected 🎆

@kuenishi
Copy link
Contributor Author

kuenishi commented Jul 7, 2015

@shino updated.

{ok, Pid} = riak_cs_delete_fsm_sup:start_delete_fsm(node(), Args),
Ref = erlang:monitor(process, Pid),
receive
{Pid, {ok, _}} ->
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Paste previous comment which was turned off by another change:


From riak_cs_delete_fsm.erl:

deleting({block_deleted, {ok, BlockID}, DeleterPid},
         State=#state{deleted_blocks=DeletedBlocks}) ->
    UpdState = deleting_state_update(BlockID, DeleterPid, DeletedBlocks+1, State),
    ManifestState = UpdState#state.manifest?MANIFEST.state,
    deleting_state_result(ManifestState, UpdState);
deleting({block_deleted, {error, {unsatisfied_constraint, _, BlockID}}, DeleterPid},
         State=#state{deleted_blocks=DeletedBlocks}) ->
    UpdState = deleting_state_update(BlockID, DeleterPid, DeletedBlocks, State),
    ManifestState = UpdState#state.manifest?MANIFEST.state,
    deleting_state_result(ManifestState, UpdState);

This shows fsm does not terminate after unsatisfied_constraint errors,
but continue to work without incrementing DeletedBlocks.

Counterpart GC worker code is like this:

handle_delete_fsm_reply({ok, {_, _, _, TotalBlocks, TotalBlocks}},
                        ?STATE{current_files=[CurrentManifest | RestManifests],
                               current_fileset=FileSet,
                               block_count=BlockCount} = State) ->
    ok = continue(),
    UpdFileSet = twop_set:del_element(CurrentManifest, FileSet),
    State?STATE{delete_fsm_pid=undefined,
                current_fileset=UpdFileSet,
                current_files=RestManifests,
                block_count=BlockCount+TotalBlocks};
handle_delete_fsm_reply({ok, {_, _, _, NumDeleted, _TotalBlocks}},
                        ?STATE{current_files=[_CurrentManifest | RestManifests],
                               block_count=BlockCount} = State) ->
    ok = continue(),
    State?STATE{delete_fsm_pid=undefined,
                current_files=RestManifests,
                block_count=BlockCount+NumDeleted};

Remove manifest from twop_set only if DeletedBlocks equals to TotalBlocks.

@shino
Copy link
Contributor

shino commented Jul 7, 2015

Add one comment (actually just a copy :) ). It will be the last.

@kuenishi kuenishi force-pushed the feature/active-delete branch 2 times, most recently from 0e88aa6 to fdd94bd Compare July 7, 2015 05:16
@shino
Copy link
Contributor

shino commented Jul 7, 2015

All r_t passed. Great option for certain users 🎆

borshop added a commit that referenced this pull request Jul 7, 2015
Synchronous deletion of small objects

Reviewed-by: shino
@kuenishi
Copy link
Contributor Author

kuenishi commented Jul 7, 2015

@borshop merge

@borshop borshop merged commit dae708d into develop Jul 7, 2015
@kuenishi kuenishi deleted the feature/active-delete branch July 7, 2015 07:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants