GC concurrency (rebased) #830

shino · 2014-04-03T05:05:55Z

Add concurrency to GC execution.
This issue addresses #716 and based on @kellymclaughlin 's work (h/t).

GC worker processes are introduced.
riak_cs_gc_d executes 2i reuqest and pass keys to workers.

A unit of work is

all keys in a single 2i response when paginated, or
keys divided to groups which have gc_batch_size number of keys each.

worker. GC worker resides in new module riak_cs_gc_worker.

refactoring daemon state machine, and finish build out of worker state machine.

Also remove two unused files

Remove unused command handling Avoid O(N^2)

Also add some error returns

kuenishi · 2014-04-03T07:10:48Z

Quick result on make test

/home/kuenishi/src/riak_cs/rebar eunit skip_deps=true
Building riak-cs-ee
==> rel (eunit)
==> riak_cs (eunit)
Compiled src/riak_cs_access_console.erl
Compiled src/riak_cs_wm_bucket_versioning.erl
src/riak_cs_gc_d.erl:710: function eligible_manifest_key_sets/3 undefined
src/riak_cs_gc_d.erl:711: function eligible_manifest_key_sets/3 undefined
src/riak_cs_gc_d.erl:712: function eligible_manifest_key_sets/3 undefined
src/riak_cs_gc_d.erl:713: function eligible_manifest_key_sets/3 undefined
src/riak_cs_gc_d.erl:714: function eligible_manifest_key_sets/3 undefined
src/riak_cs_gc_d.erl:716: function eligible_manifest_key_sets/3 undefined
ERROR: eunit failed while processing /home/kuenishi/src/riak_cs: rebar_abort
make: *** [test] エラー 1
(v)kuenishi@debian:~/src/ri

shino · 2014-04-03T07:22:45Z

Oops. I pushed the fix.

kuenishi · 2014-04-07T02:15:45Z

src/riak_cs_gc_worker.erl

+    %% now this is the easiest thing to do. If we need to manually
+    %% skip an object to GC, we can change the epoch start
+    %% time in app.config
+    link(Pid),


Probably this is not the scope of the review, but starting fsm and linking is not atomic, and link/1 fails if Pid does not exist. Then what happens to the RestKeys ? Will be deleted at next batch?

Good question. If delete fsm finished normally or died accidnetaly before link/1, worker dies at link/1 call by noproc. So as riak_cs_gc_d. Then it will restart.

So the whole batch fails involving other delete_fsm ? That might be insane situation, so I think we need some logs for better operation, at least at riak_cs_gc_d.

Maybe this doesn't happen because starting delete_fsm is enough slow so that link/1 gets in time before delete_fsm terminates. OK.

kuenishi · 2014-04-07T03:20:17Z

mm, no riak_test ... that would be another bunch of work.

kuenishi · 2014-04-07T03:54:36Z

Does the manual batch (esp. riak_cs_gc_d:manual_batch/1 ) block until the batch finishes? The behavior has been changed from old one. I don't think the operator does not wait until the whole batch finishes (=batches empty, 0 workers).

kuenishi · 2014-04-07T03:56:30Z

src/riak_cs_gc_d.erl

+    %% finished with this GC run
+    _ = case Caller of
+            undefined -> ok;
+            _ -> Caller ! {batch_finished, State}


noproc have to be handled here, because possibly operator may shut down the console while waiting for riak-cs-gc batch command finish. Or at least following lager message output must be come before this line.

! does not throw noproc.

But I should mention batch_caller is used only by riak_cs_gc_single_run_eqc (at least currently). I will do that.

shino · 2014-04-07T04:00:44Z

Does the manual batch (esp. riak_cs_gc_d:manual_batch/1 ) block until the batch finishes?

Really? I thing the command returns after riak_cs_gc_d's state is changed to fetching_next_batch. If not, it would be a bug.

shino · 2014-04-07T04:02:06Z

I should add one commnet. batch_caller was added for testing. It is used in riak_cs_gc_single_run_eqc.

kuenishi · 2014-04-07T04:19:17Z

src/riak_cs_gc_d.erl

+    ok_reply(fetching_next_batch, start_manual_batch(
+                                    lists:member(testing, Options),
+                                    Leeway,
+                                    State?STATE{batch_caller=CallerPid}));


I would write like this, if batch_caller is meant for testing:

NewState = case lists:member(testing, Options) of true -> start_manual_batch(true, Leeway, State?STATE{batch_caller=CallerPid}); false -> start_manual_batch(false, Leeway, State) end

kuenishi · 2014-04-07T04:33:13Z

One thing I worry about is that we already have task-queue-and-parallel-workers style access archiver. Storage calculation is yet single-threaded but it might be better done in parallel. It might be good time to stop repeating ourselves, this time or next time.

Code looks correct except above mentions.

kuenishi · 2014-04-08T05:52:34Z

Really? I thing the command returns after riak_cs_gc_d's state is changed to fetching_next_batch.

Okay. I didn't find that function.

kuenishi · 2014-04-08T06:15:50Z

OK, I'm persuaded. +1 from me. I think we'd better have original author's review - @kellymclaughlin do you have any seconds to take a glance at this?

kuenishi · 2014-04-08T06:16:57Z

+1 b6ce119

GC concurrency (rebased) Reviewed-by: kuenishi

shino · 2014-04-08T08:56:34Z

Remaining: add comment about batch_caller, fix dialyzer warning.

shino · 2014-04-09T06:29:15Z

Remaining items at the above comments were addressed at bf89876 and 9431489.

kuenishi · 2014-04-09T06:30:04Z

+1 9431489

GC concurrency (rebased) Reviewed-by: kuenishi

shino · 2014-04-14T06:51:00Z

@borshop merge

kellymclaughlin and others added 14 commits March 25, 2014 12:42

Separte the revised managerial aspects of the GC daemon from the GC

a39f80c

worker. GC worker resides in new module riak_cs_gc_worker.

Finish seperation of concerns among gc daemon and workers, finish

3f9ebc7

refactoring daemon state machine, and finish build out of worker state machine.

File cleanup

f2b6730

Start updates to riak_cs_gc_d_eqc

35fd5c9

Fix riak-cs-gc status output

d263f12

Also remove two unused files

Change state names of eqc_fsm according to riak_cs_gc_d

42cd64a

Simplify weights

641fda6

Refactor riak_cs_gc_d_eqc

849632d

Refactor

b2640b7

Remove unused command handling Avoid O(N^2)

Add pause handling to start batch command

5baa616

Also add some error returns

Add eqc test case for gc daemon and workers

67a392a

Remove unused record field, add some document and calm dialyzer

3bc4cbd

Strict interval/leeway validation

6443f3e

Do not start worker for empty key

8187a6c

Fix function name in test case which was changed in source

cf622ed

Fix timeout in EQC test

b6ce119

kuenishi added this to the 1.5.0 milestone Apr 7, 2014

kuenishi reviewed Apr 7, 2014
View reviewed changes

borshop added a commit that referenced this pull request Apr 8, 2014

Merge pull request #830 from basho/feature/gc-concurrency-rebased

ef47025

GC concurrency (rebased) Reviewed-by: kuenishi

shino added 2 commits April 9, 2014 11:46

Add comment on #gc_d_state.batch_caller

bf89876

Fix dialyzer warning

9431489

borshop added a commit that referenced this pull request Apr 9, 2014

Merge pull request #830 from basho/feature/gc-concurrency-rebased

9590b42

GC concurrency (rebased) Reviewed-by: kuenishi

borshop merged commit 9431489 into develop Apr 14, 2014

borshop deleted the feature/gc-concurrency-rebased branch April 14, 2014 06:52

shino mentioned this pull request Apr 14, 2014

Add concurrency to GC #716

Closed

shino mentioned this pull request Aug 6, 2014

Efficient garbage collection for >O(100)GB objects #944

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GC concurrency (rebased) #830

GC concurrency (rebased) #830

shino commented Apr 3, 2014

kuenishi commented Apr 3, 2014

shino commented Apr 3, 2014

kuenishi Apr 7, 2014

shino Apr 7, 2014

kuenishi Apr 7, 2014

kuenishi Apr 8, 2014

kuenishi commented Apr 7, 2014

kuenishi commented Apr 7, 2014

kuenishi Apr 7, 2014

shino Apr 8, 2014

shino Apr 8, 2014

shino commented Apr 7, 2014

shino commented Apr 7, 2014

kuenishi Apr 7, 2014

kuenishi commented Apr 7, 2014

kuenishi commented Apr 8, 2014

kuenishi commented Apr 8, 2014

kuenishi commented Apr 8, 2014

shino commented Apr 8, 2014

shino commented Apr 9, 2014

kuenishi commented Apr 9, 2014

shino commented Apr 14, 2014

GC concurrency (rebased) #830

GC concurrency (rebased) #830

Conversation

shino commented Apr 3, 2014

kuenishi commented Apr 3, 2014

shino commented Apr 3, 2014

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kuenishi commented Apr 7, 2014

kuenishi commented Apr 7, 2014

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

shino commented Apr 7, 2014

shino commented Apr 7, 2014

Choose a reason for hiding this comment

kuenishi commented Apr 7, 2014

kuenishi commented Apr 8, 2014

kuenishi commented Apr 8, 2014

kuenishi commented Apr 8, 2014

shino commented Apr 8, 2014

shino commented Apr 9, 2014

kuenishi commented Apr 9, 2014

shino commented Apr 14, 2014