Transform deserialization errors and CRC check failures into not_founds for 1.4 #638

Merged
merged 4 commits into from Aug 19, 2013

Projects

None yet

2 participants

@evanmcc
Contributor
evanmcc commented Aug 17, 2013

1.4 version of the corruption filtering changes for kv.

riak_test PR here: basho/riak_test#353

required core PR here: basho/riak_core#359

cc @engelsanchez

@engelsanchez engelsanchez commented on an outdated diff Aug 19, 2013
src/riak_kv_vnode.erl
end.
encode_handoff_item({B, K}, V) ->
%% before sending data to another node change binary version
%% to one supported by the cluster. This way we don't send
%% unsupported formats to old nodes
- ObjFmt = riak_core_capability:get({riak_kv, object_format}, v0),
- Value = riak_object:to_binary_version(ObjFmt, B, K, V),
- encode_binary_object(B, K, Value).
+ try
+ ObjFmt = riak_core_capability:get({riak_kv, object_format}, v0),
@engelsanchez
engelsanchez Aug 19, 2013 Contributor

Minor nitpick: leave this capability statement outside of try to avoid masking a non-corruption problem.

@engelsanchez engelsanchez commented on an outdated diff Aug 19, 2013
src/riak_kv_vnode.erl
- {reply, {error, Reason}, State#state{modstate=UpdModState}};
- Err ->
- {reply, {error, Err}, State}
+ try
+ {BKey, Val} = decode_binary_object(BinObj),
+ {B, K} = BKey,
+ case do_diffobj_put(BKey, riak_object:from_binary(B, K, Val),
+ State) of
+ {ok, UpdModState} ->
+ {reply, ok, State#state{modstate=UpdModState}};
+ {error, Reason, UpdModState} ->
+ {reply, {error, Reason}, State#state{modstate=UpdModState}};
+ Err ->
+ {reply, {error, Err}, State}
+ end
+ catch _:_ ->
@engelsanchez
engelsanchez Aug 19, 2013 Contributor

This block of code can fail in many ways that are not at all related to data corruption (decode, from_binary, inside the do_diffobj_put). Consider actually logging the error/reason pair to disambiguate or separating the code. Don't be afraid of logging too much: Lager has a throttle already, data corruption is as serious as it can get and we'll be the ones begging for this information in 6 months while debugging something completely unrelated, in ways you didn't think of today :)

@evanmcc
Contributor
evanmcc commented Aug 19, 2013

updated to add the requested logging.

@engelsanchez
Contributor

Should I be testing this port using the 1.3 riak_test? I've been trying the riak_test branch off 1.4 without success (the pemv-drop-bad-data branch)

@engelsanchez
Contributor

After the hashtree update and with basho/riak_core#359 and riak_test master, verified that corrupted objects are handled correctly and handoffs and get/put operations proceed with warnings in the logs. 👍 💃 ⛵️

@evanmcc evanmcc merged commit c675b3d into 1.4 Aug 19, 2013

1 check failed

default The Travis CI build failed
Details
@engelsanchez engelsanchez deleted the pemv-drop-bad-data branch Aug 20, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment