Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GC concurrency (rebased) #830

Merged
merged 18 commits into from
Apr 14, 2014
Merged

GC concurrency (rebased) #830

merged 18 commits into from
Apr 14, 2014

Conversation

shino
Copy link
Contributor

@shino shino commented Apr 3, 2014

Add concurrency to GC execution.
This issue addresses #716 and based on @kellymclaughlin 's work (h/t).

GC worker processes are introduced.
riak_cs_gc_d executes 2i reuqest and pass keys to workers.

A unit of work is

  • all keys in a single 2i response when paginated, or
  • keys divided to groups which have gc_batch_size number of keys each.

@kuenishi
Copy link
Contributor

kuenishi commented Apr 3, 2014

Quick result on make test

/home/kuenishi/src/riak_cs/rebar eunit skip_deps=true
Building riak-cs-ee
==> rel (eunit)
==> riak_cs (eunit)
Compiled src/riak_cs_access_console.erl
Compiled src/riak_cs_wm_bucket_versioning.erl
src/riak_cs_gc_d.erl:710: function eligible_manifest_key_sets/3 undefined
src/riak_cs_gc_d.erl:711: function eligible_manifest_key_sets/3 undefined
src/riak_cs_gc_d.erl:712: function eligible_manifest_key_sets/3 undefined
src/riak_cs_gc_d.erl:713: function eligible_manifest_key_sets/3 undefined
src/riak_cs_gc_d.erl:714: function eligible_manifest_key_sets/3 undefined
src/riak_cs_gc_d.erl:716: function eligible_manifest_key_sets/3 undefined
ERROR: eunit failed while processing /home/kuenishi/src/riak_cs: rebar_abort
make: *** [test] エラー 1
(v)kuenishi@debian:~/src/ri

@shino
Copy link
Contributor Author

shino commented Apr 3, 2014

Oops. I pushed the fix.

@kuenishi kuenishi added this to the 1.5.0 milestone Apr 7, 2014
%% now this is the easiest thing to do. If we need to manually
%% skip an object to GC, we can change the epoch start
%% time in app.config
link(Pid),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably this is not the scope of the review, but starting fsm and linking is not atomic, and link/1 fails if Pid does not exist. Then what happens to the RestKeys ? Will be deleted at next batch?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. If delete fsm finished normally or died accidnetaly before link/1, worker dies at link/1 call by noproc. So as riak_cs_gc_d. Then it will restart.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the whole batch fails involving other delete_fsm ? That might be insane situation, so I think we need some logs for better operation, at least at riak_cs_gc_d.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this doesn't happen because starting delete_fsm is enough slow so that link/1 gets in time before delete_fsm terminates. OK.

@kuenishi
Copy link
Contributor

kuenishi commented Apr 7, 2014

mm, no riak_test ... that would be another bunch of work.

@kuenishi
Copy link
Contributor

kuenishi commented Apr 7, 2014

Does the manual batch (esp. riak_cs_gc_d:manual_batch/1 ) block until the batch finishes? The behavior has been changed from old one. I don't think the operator does not wait until the whole batch finishes (=batches empty, 0 workers).

%% finished with this GC run
_ = case Caller of
undefined -> ok;
_ -> Caller ! {batch_finished, State}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

noproc have to be handled here, because possibly operator may shut down the console while waiting for riak-cs-gc batch command finish. Or at least following lager message output must be come before this line.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

! does not throw noproc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I should mention batch_caller is used only by riak_cs_gc_single_run_eqc (at least currently). I will do that.

@shino
Copy link
Contributor Author

shino commented Apr 7, 2014

Does the manual batch (esp. riak_cs_gc_d:manual_batch/1 ) block until the batch finishes?

Really? I thing the command returns after riak_cs_gc_d's state is changed to fetching_next_batch. If not, it would be a bug.

@shino
Copy link
Contributor Author

shino commented Apr 7, 2014

I should add one commnet. batch_caller was added for testing. It is used in riak_cs_gc_single_run_eqc.

ok_reply(fetching_next_batch, start_manual_batch(
lists:member(testing, Options),
Leeway,
State?STATE{batch_caller=CallerPid}));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would write like this, if batch_caller is meant for testing:

NewState = case lists:member(testing, Options) of
    true -> start_manual_batch(true, Leeway, State?STATE{batch_caller=CallerPid});
    false -> start_manual_batch(false, Leeway, State)
end

@kuenishi
Copy link
Contributor

kuenishi commented Apr 7, 2014

One thing I worry about is that we already have task-queue-and-parallel-workers style access archiver. Storage calculation is yet single-threaded but it might be better done in parallel. It might be good time to stop repeating ourselves, this time or next time.

Code looks correct except above mentions.

@kuenishi
Copy link
Contributor

kuenishi commented Apr 8, 2014

Really? I thing the command returns after riak_cs_gc_d's state is changed to fetching_next_batch.

Okay. I didn't find that function.

@kuenishi
Copy link
Contributor

kuenishi commented Apr 8, 2014

OK, I'm persuaded. +1 from me. I think we'd better have original author's review - @kellymclaughlin do you have any seconds to take a glance at this?

@kuenishi
Copy link
Contributor

kuenishi commented Apr 8, 2014

+1 b6ce119

borshop added a commit that referenced this pull request Apr 8, 2014
GC concurrency (rebased)

Reviewed-by: kuenishi
@shino
Copy link
Contributor Author

shino commented Apr 8, 2014

Remaining: add comment about batch_caller, fix dialyzer warning.

@shino
Copy link
Contributor Author

shino commented Apr 9, 2014

Remaining items at the above comments were addressed at bf89876 and 9431489.

@kuenishi
Copy link
Contributor

kuenishi commented Apr 9, 2014

+1 9431489

borshop added a commit that referenced this pull request Apr 9, 2014
GC concurrency (rebased)

Reviewed-by: kuenishi
@shino
Copy link
Contributor Author

shino commented Apr 14, 2014

@borshop merge

@borshop borshop merged commit 9431489 into develop Apr 14, 2014
@borshop borshop deleted the feature/gc-concurrency-rebased branch April 14, 2014 06:52
@shino shino mentioned this pull request Apr 14, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants