-
Notifications
You must be signed in to change notification settings - Fork 96
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle socket error by catching badmatch #1280
Conversation
bf494c9
to
3294b01
Compare
90f1346
to
788c6b8
Compare
788c6b8
to
b93681e
Compare
When a client cancelled or stopped uploading an object in the middle, the socket for TCP connection returns error when mochiweb tried to read payload. It had been throwing badmatch and that request was just failing, leaving already uploaded chunks and writing manifests. This commit adds a try-catch clause to catch the badmatch error thrown by mochiweb and calls amendment code that moves the exact manifest to GC bucket and marks it as scheduled_delete. A test is included in regression_tests.erl, where a socket error is emulated by client-side socket close in the middle of upload. There is another possibility of cancelled upload in multipart upload, where a part upload could be cancelled and left as it was, but it can be collected if the multipart upload was aborted or completed.
{ok, Sock} = gen_tcp:connect("127.0.0.1", 15018, [{active, false}]), | ||
FirstLine = io_lib:format("PUT /~s HTTP/1.1", [K]), | ||
Binary = binary:copy(<<"*">>, Actual), | ||
ReqHdr = [FirstLine, $\n, "Host: ", Host, $\n, Auth, $\n, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CR+LF is more common for line break in HTTP?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As this is a test code and dumb client, I didn't care much about that.
@@ -145,6 +147,85 @@ verify_cs512(UserConfig, BucketName) -> | |||
assert_notfound(UserConfig,BucketName), | |||
ok. | |||
|
|||
verify_cs770({UserConfig, {RiakNodes, _, _}}, BucketName) -> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice test case 👍
Interruptions for normal upload and copy upload work fine 👍 Client side, execute the below command and stop it by Ctrl-C:
Error log:
|
Maybe, the above error is caused when put fsm does not have manifest fsm in its state.
|
Now running riak test. |
Two cases failed in multibag flavor, local fix for them: diff --git a/riak_test/tests/regression_tests.erl b/riak_test/tests/regression_tests.erl
index 9b1dbce..b8a320b 100644
--- a/riak_test/tests/regression_tests.erl
+++ b/riak_test/tests/regression_tests.erl
@@ -174,7 +174,7 @@ verify_cs770({UserConfig, {RiakNodes, _, _}}, BucketName) ->
scheduled_delete =:= Mx?MANIFEST.state
end, 8, 4096),
- Pbc = rt:pbc('dev1@127.0.0.1'),
+ Pbc = rtcs:pbc(RiakNodes, objects, BucketName),
%% verify that object is also stored in latest GC bucket
Ms = all_manifests_in_gc_bucket(Pbc),
diff --git a/riak_test/tests/upgrade_downgrade_test.erl b/riak_test/tests/upgrade_downgrade_test.erl
index 5e82cfa..cff44d5 100644
--- a/riak_test/tests/upgrade_downgrade_test.erl
+++ b/riak_test/tests/upgrade_downgrade_test.erl
@@ -51,6 +51,9 @@ confirm() ->
ok = rt:upgrade(RiakNode, RiakCurrentVsn),
rt:wait_for_service(RiakNode, riak_kv),
ok = rtcs_config:upgrade_cs(N, AdminCreds),
+ rtcs:set_advanced_conf({cs, current, N},
+ [{riak_cs,
+ [{riak_host, {"127.0.0.1", rtcs_config:pb_port(1)}}]}]),
rtcs_exec:start_cs(N, current)
end
|| RiakNode <- RiakNodes], |
|
During lunch, test run with diff --git a/riak_test/tests/regression_tests.erl b/riak_test/tests/regression_tests.erl
index 9b1dbce..aea3d94 100644
--- a/riak_test/tests/regression_tests.erl
+++ b/riak_test/tests/regression_tests.erl
@@ -156,7 +156,7 @@ verify_cs770({UserConfig, {RiakNodes, _, _}}, BucketName) ->
{ok, Socket} = rtcs_object:upload(UserConfig,
{normal_partial, 3*1024*1024, 1024*1024},
BucketName, Key),
-
+timer:sleep(3*1000),
[[{UUID, M}]] = get_manifests(RiakNodes, BucketName, Key), But I'm not sure this is actually a race or not. |
Great catch! That race condition of socket close and writing and getting manifest from Riak. I pushed fix. |
Diff looks nice, all riak_test passed. Really nice fix and a good example of "don't let it crash" 😄 |
Handle socket error by catching badmatch Reviewed-by: shino
@borshop merge On Thursday, 14 January 2016, Shunichi Shinohara notifications@github.com
Sent from Gmail Mobile |
When a client cancelled or stopped uploading an object in the middle,
the socket for TCP connection returns error when mochiweb tried to
read payload. It had been throwing badmatch and that request was just
failing, leaving already uploaded chunks and writing manifests.
This commit adds a try-catch clause to catch the badmatch error thrown
by mochiweb and calls amendment code that moves the exact manifest to
GC bucket and marks it as scheduled_delete.
A test is included in regression_tests.erl, where a socket error is
emulated by client-side socket close in the middle of upload.
Partialsolution for #770.