-
Notifications
You must be signed in to change notification settings - Fork 96
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bugfix: Skip invalid state manifests in GC bucket #964
Conversation
Set milestone 1.5.1. If wrong, please correct me. |
See #827 Logged information includes: - Key in GC bucket - UUID (in manifest) - CS bucket and key (also in manifest) It's noteworthy that if riak_cs_delete_fsm replies ok with the same total blocks and deleted blocks to riak_cs_gc_worker, then the worker attempt to delete manifest entry in twop_set. Doing it before deleting manfiests and blocks completely poses orphan manifests/blocks. In this PR, riak_cs_delete_fsm replies with total blocks = 1 and deleted_blocks = 0.
01cb351
to
7163919
Compare
7163919
to
a0f62cc
Compare
This PR is to branch If patch needed, another branch (cherry-picked and conflicts resolved) |
…ests Because GC buckets are divided among bags and distribution is determined by manifests. Manifests at an certain bag goes to riak-cs-gc bucket in the Riak cluster by delete.
I'm not sure why, but an eqc test didn't work for me: ...riak_cs_gc_single_run_eqc:84: eqc_test_...*failed*
::{function_clause,
[{riak_cs_gc_single_run_eqc,dummy_start_delete_fsm,
[testnode247683@nausicaa,
[<0.27611.1>,
{<<"UUIDno_error11">>,
{lfs_manifest_v3,3,1048576,
{<<"bucket">>,<<"no_e"...>>},
undefined,"2014-09-09T03:50:22.000Z",undefined,...}},
<0.27610.1>,<<"no_error1">>,[]]],
[{file,"test/riak_cs_gc_single_run_eqc.erl"},{line,250}]},
{riak_cs_delete_fsm_sup,start_delete_fsm,
[testnode247683@nausicaa,
[<0.27611.1>,
{<<"UUIDno_error11">>,
{lfs_manifest_v3,3,1048576,
{<<"buck"...>>,<<...>>},
undefined,
[...],...}},
<0.27610.1>,<<"no_error1">>,[]]],
[]},
{riak_cs_gc_worker,initiating_file_delete,2,
[{file,"src/riak_cs_gc_worker.erl"},{line,138}]},
{gen_fsm,handle_msg,7,[{file,"gen_fsm.erl"},{line,494}]},
{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,227}]}]} |
{error, notfound}; | ||
change_state_to_active(Pbc, TargetBKey, [GCKey|Rest]) -> | ||
{ok, Obj0} = riakc_pb_socket:get(Pbc, ?GC_BUCKET, GCKey), | ||
Manifests = twop_set:to_list(binary_to_term(riakc_obj:get_value(Obj0))), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we assume twop_set
is in the path?
Nice catch 😄 . |
total_blocks=BlockCount}, | ||
start_block_servers(NewState); | ||
{error, invalid_state} -> | ||
_ = lager:warning("Invalid state manifest in GC bucket at ~p: ~p", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking possible operations when this log line found, that might be backing up the object and try deleting it again. Or restore the whole historical manifests via inspector. Anyway I think it'd be more friendly to print out bucket and key name explicitly. UUID and Manifest could follow them or in a debug line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant the bucket and key as S3 manner. GCKey should also be in the first log line.
Great work. Code itself looks pretty good to me. |
…-in-gc Bugfix: Skip invalid state manifests in GC bucket Reviewed-by: kuenishi
@borshop merge |
Prepare release notes (#964 is likely to merge) Reviewed-by: shino
Prepare release notes (#964 is likely to merge) Reviewed-by: kuenishi
Addresses #827 partly.
At present, the root cause why active state manifests are entered in GC bucket.
This PR does not process such manifests but just skip and log in order to
GC can proceed to other manifests instead of abort.
It's noteworthy that if
riak_cs_delete_fsm
replies ok with the same total blocks and deleted blocksto
riak_cs_gc_worker
, then the worker attempt to delete manifest entry intwop_set
.Doing it before deleting manfiests and blocks completely poses orphan manifests/blocks.
In this PR,
riak_cs_delete_fsm
replies with total blocks = 1 and deleted_blocks = 0.To address #827 completely, further investigation on root cause needed. Information of logs
by this PR will help.