-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make all kind of timeouts hitting Riak configurable #1021
Conversation
Can you elaborate on the "easy bug" you mentioned in the desription? |
I thought I had seen so many existing riak_tests failed and believed there should be an easy bug. But today I ran riak_test and then found just one test failing. I apologise about my bad memory if it isn't easy in advance. 10:57:54.007 [warning] gc_tests failed: {{badmatch,["/home/kuenishi/rt/riak_cs/current/dev/dev1/lib/riak_cs-1.5.2-5-g9827939/priv/tools/repair_gc_bucket.erl","/home/kuenishi/rt/riak_cs/current/dev/dev1/lib/riak_cs/priv/tools/repair_gc_bucket.erl"]},[{rtcs,repair_gc_bucket,2,[{file,"riak_test/src/rtcs.erl"},{line,565}]},{gc_tests,repair_gc_bucket,1,[{file,"riak_test/tests/gc_tests.erl"},{line,157}]},{gc_tests,'-confirm/0-lc$^0/1-0-',1,[{file,"riak_test/tests/gc_tests.erl"},{line,52}]},{gc_tests,confirm,0,[{file,"riak_test/tests/gc_tests.erl"},{line,52}]}]}
10:57:54.007 [error]
================ gc_tests failure stack trace =====================
{{badmatch,["/home/kuenishi/rt/riak_cs/current/dev/dev1/lib/riak_cs-1.5.2-5-g9827939/priv/tools/repair_gc_bucket.erl",
"/home/kuenishi/rt/riak_cs/current/dev/dev1/lib/riak_cs/priv/tools/repair_gc_bucket.erl"]},
[{rtcs,repair_gc_bucket,2,[{file,"riak_test/src/rtcs.erl"},{line,565}]},
{gc_tests,repair_gc_bucket,1,
[{file,"riak_test/tests/gc_tests.erl"},{line,157}]},
{gc_tests,'-confirm/0-lc$^0/1-0-',1,
[{file,"riak_test/tests/gc_tests.erl"},{line,52}]},
{gc_tests,confirm,0,[{file,"riak_test/tests/gc_tests.erl"},{line,52}]}]} |
- All configuration items gathered into `riak_cs_config:*_timeout/0` - Defaults varies from 5 to 60 seconds according to old code - New items has defaults by 60 seconds defined in `riak_cs.hrl` - You can set all configurations at once by setting `riakc_timeouts` in `riak_cs` section of `app.config` - Riak client calls that stem from manual operations does not have timeout values configurable #Please enter the commit message for your changes. Lines starting
Ok I think I've squashed all the issues causing test failures. I'll reverify tomorrow and probably push a rebased version of this branch that makes dialyzer happy and addresses the issues. |
e0e3b02
to
40389f2
Compare
All tests are passing so I'll give this one more riak_test run tomorrow to double-check it and then try to get it merged. In the meantime, happy for any feedback on my commits, but not strictly necessary. |
@@ -100,7 +100,8 @@ sum_bucket(BucketName) -> | |||
ok = riak_cs_riak_client:set_bucket_name(RcPid, BucketName), | |||
{ok, ManifestPbc} = riak_cs_riak_client:manifest_pbc(RcPid), | |||
ManifestBucket = riak_cs_utils:to_bucket_name(objects, BucketName), | |||
case riakc_pb_socket:mapred(ManifestPbc, ManifestBucket, Query) of | |||
Timeout = riak_cs_config:storage_calc_timeout(), | |||
case riakc_pb_socket:mapred(ManifestPbc, ManifestBucket, Query, Timeout) of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This very line causes this assertion in the cs743_regression_test
to fail which is actually a good thing because it means the calculation is actually returning instead of timing out. I tried even up to 10000 objects and the test still passed. I will update to the test to expect success now instead of timeout.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#743 and #759 were fix for crash caused by timeout. So the assertion here should be wrong place?assertEqual
to check calculation has successfully detected the timeout without crashing. This timeout setting which is not working any more should be changed to storage_calc_timeout
.
The increase in the default timeout for the mapreduce query used to retrieve storage calculation results causes the previous expectation of timeout in this test to no longer be valid. Update the test assertion to check that the response is not a timeout instead.
CI testing looks good and all the riak_tests are passing for me. I'll defer merging this until I get in tomorrow in case there is any feedback on my commits from today. |
?assertEqual(<<"{error,{timeout,[]}}">>, ErrorStr), | ||
ResultStr -> | ||
?assert(not is_integer(ResultStr)), | ||
?assertNotEqual(<<"{error,{timeout,[]}}">>, ResultStr), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#743 and #759 were fix for crash caused by timeout. So the assertion here should be ?assertEqual
to check calculation has successfully detected the timeout without crashing. This timeout setting which is not working any more should be changed to storage_calc_timeout
.
Changes except the one I commented above are all awesome, thanks @kellymclaughlin ! |
@@ -353,7 +353,7 @@ mark_manifests(RiakObject, Bucket, Key, UUIDsToMark, ManiFunction, RcPid) -> | |||
%% with vector clock. This allows us to do a PUT | |||
%% again without having to re-retrieve the object | |||
{ok, ManifestPbc} = riak_cs_riak_client:manifest_pbc(RcPid), | |||
riak_cs_pbc:put(ManifestPbc, UpdObj, [return_body]). | |||
riak_cs_pbc:put(ManifestPbc, UpdObj, [return_body], riak_cs_config:get_gckey_timeout()). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
put manifest?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, riak_cs_config:put_manifest_timeout()
might be more suitable here.
Only one tiny comment from me. This feature will be great help for production environment! Nice work @kuenishi @kellymclaughlin . |
@@ -493,9 +492,9 @@ make_bkeys(ManifestBucketName, Keys) -> | |||
-spec send_map_reduce_request(riak_client(), list()) -> streaming_req_response(). | |||
send_map_reduce_request(RcPid, BKeyTuples) -> | |||
%% TODO: change this: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need this comment any more because TODO has been done here?
Make all kind of timeouts hitting Riak configurable Reviewed-by: kellymclaughlin
@borshop merge |
This branch still has an easy bug, but sending it out to keep tracking. I'd be more than happy if somebody takes this (or it's also fine to omit and start a completely new work).