-
Notifications
You must be signed in to change notification settings - Fork 96
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GC concurrency (rebased) #830
Conversation
worker. GC worker resides in new module riak_cs_gc_worker.
refactoring daemon state machine, and finish build out of worker state machine.
Also remove two unused files
Also add some error returns
Quick result on
|
Oops. I pushed the fix. |
%% now this is the easiest thing to do. If we need to manually | ||
%% skip an object to GC, we can change the epoch start | ||
%% time in app.config | ||
link(Pid), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably this is not the scope of the review, but starting fsm and linking is not atomic, and link/1
fails if Pid
does not exist. Then what happens to the RestKeys
? Will be deleted at next batch?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question. If delete fsm finished normally or died accidnetaly before link/1
, worker dies at link/1
call by noproc
. So as riak_cs_gc_d
. Then it will restart.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the whole batch fails involving other delete_fsm ? That might be insane situation, so I think we need some logs for better operation, at least at riak_cs_gc_d
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this doesn't happen because starting delete_fsm is enough slow so that link/1
gets in time before delete_fsm terminates. OK.
mm, no |
Does the manual batch (esp. |
%% finished with this GC run | ||
_ = case Caller of | ||
undefined -> ok; | ||
_ -> Caller ! {batch_finished, State} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
noproc
have to be handled here, because possibly operator may shut down the console while waiting for riak-cs-gc batch
command finish. Or at least following lager message output must be come before this line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
!
does not throw noproc
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But I should mention batch_caller
is used only by riak_cs_gc_single_run_eqc
(at least currently). I will do that.
Really? I thing the command returns after |
I should add one commnet. |
ok_reply(fetching_next_batch, start_manual_batch( | ||
lists:member(testing, Options), | ||
Leeway, | ||
State?STATE{batch_caller=CallerPid})); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would write like this, if batch_caller
is meant for testing:
NewState = case lists:member(testing, Options) of
true -> start_manual_batch(true, Leeway, State?STATE{batch_caller=CallerPid});
false -> start_manual_batch(false, Leeway, State)
end
One thing I worry about is that we already have task-queue-and-parallel-workers style access archiver. Storage calculation is yet single-threaded but it might be better done in parallel. It might be good time to stop repeating ourselves, this time or next time. Code looks correct except above mentions. |
Okay. I didn't find that function. |
OK, I'm persuaded. +1 from me. I think we'd better have original author's review - @kellymclaughlin do you have any seconds to take a glance at this? |
+1 b6ce119 |
GC concurrency (rebased) Reviewed-by: kuenishi
Remaining: add comment about |
+1 9431489 |
GC concurrency (rebased) Reviewed-by: kuenishi
@borshop merge |
Add concurrency to GC execution.
This issue addresses #716 and based on @kellymclaughlin 's work (h/t).
GC worker processes are introduced.
riak_cs_gc_d
executes 2i reuqest and pass keys to workers.A unit of work is
gc_batch_size
number of keys each.