Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge CV's list-objects prefix optimization + Add key length limit [JIRA: RCS-275] #1233

Merged
merged 9 commits into from
Sep 7, 2015

Conversation

kuenishi
Copy link
Contributor

@kuenishi kuenishi commented Sep 2, 2015

@angrycub 's awesome branch is for 1.5, this is a rebased version (although I updated some flavor for 2.1). Note that max key length of S3 is 1024 .

This allows for an optimization for single object queries, causing the
fold to only retrieve the item of interest.
@kuenishi kuenishi added this to the 2.1.0 milestone Sep 2, 2015
@Basho-JIRA Basho-JIRA changed the title Merge CV's list-objects optimization Merge CV's list-objects optimization [JIRA: RCS-275] Sep 2, 2015
@shino
Copy link
Contributor

shino commented Sep 2, 2015

Just a note for self, AWS S3 response for too long key:

HTTP/1.1 400 Bad Request
x-amz-request-id: 7A0D868D37BA08C9
x-amz-id-2: 4vRTdtZV6GAms5oAd0W95d08/KkjBjlLPXSus5fIcb7QzpLqlyuUJYqjiCz73WVdi4B5AMhtHVU=
Content-Type: application/xml
Transfer-Encoding: chunked
Date: Wed, 02 Sep 2015 05:51:29 GMT
Connection: close
Server: AmazonS3

133
<?xml version="1.0" encoding="UTF-8"?>
<Error><Code>KeyTooLongError</Code><Message>Your key is too long</Message><Size>1029</Size><MaxSizeAllowed>1024</MaxSizeAllowed><RequestId>7A0D868D37BA08C9</RequestId><HostId>4vRTdtZV6GAms5oAd0W95d08/KkjBjlLPXSus5fIcb7QzpLqlyuUJYqjiCz73WVdi4B5AMhtHVU=</HostId></Error>
0

@shino
Copy link
Contributor

shino commented Sep 2, 2015

I've got two error when prefix is longer than 1024. One is expected (does not mean good):

15:11:09.453 [error] gen_fsm <0.984.0> in state prepare terminated with reason: bad argument in call to binary:copy(<<"ÿ">>, -1) in riak_cs_utils:big_end_key/1 line 447

but another is unexpected:

15:11:09.454 [error] FORMAT ERROR: [67,82,65,83,72,32,82,69,80,79,82,84,32,80,114,111,99,101,115,115,32,"<0.984.0>",32,119,105,116,104,32,"1",32,110,101,105,103,104,98,111,117,114,115,32,"exited",32,119,105,116,104,32,114,101,97,115,111,110,58,32,["bad argument in call to ",["binary",58,"copy",40,["<<","\"ÿ\"",">>"],44,32,"-1",41]," in ",[["riak_cs_utils",58,"big_end_key",47,"1"],[32,108,105,110,101,32,"447"]]]] []

@kuenishi
Copy link
Contributor Author

kuenishi commented Sep 2, 2015

Looks like I'd better add some guard both there and malformed_request/2 .

@shino
Copy link
Contributor

shino commented Sep 2, 2015

Include max key length check into this PR? If so, There has to be an option max_key_length like max_buckets_per_user for compatibility, isn't it?

@kuenishi
Copy link
Contributor Author

kuenishi commented Sep 2, 2015

No, it's already defined in this pull request as macro in riak_cs.hrl. Have we supported key length longer than 1024? If yes, I'll update some in this pull request as well as safe guards. At least we need this optimization, don't we?

@shino
Copy link
Contributor

shino commented Sep 2, 2015

I was just caremad because previous versions of riak cs supports keys
which are more than 1024 bytes, probably because such limit was just
not considered. In max_buckets_per_user case, the limit is made
configurable for users who don't care strict AWS S3 compatibility.
Then, whether it should be in this case, max key length. My current
feeling is 1024 is sufficiently long and it has not be configurable.

So, if you are going to add validation to key length, I agree with you.
I'm afraid a little it will be scope creep because almost all WM handlers are affected 🙈

@shino
Copy link
Contributor

shino commented Sep 2, 2015

P.S. For GET Object, AWS S3 responds with 400 KeyTooLongError

@shino shino changed the title Merge CV's list-objects optimization [JIRA: RCS-275] Merge CV's list-objects prefix optimization [JIRA: RCS-275] Sep 2, 2015
@kuenishi kuenishi changed the title Merge CV's list-objects prefix optimization [JIRA: RCS-275] Merge CV's list-objects prefix optimization + Add key length limit [JIRA: RCS-275] Sep 2, 2015
@kuenishi
Copy link
Contributor Author

kuenishi commented Sep 2, 2015

Will add some r_ts tomorrow.

@shino
Copy link
Contributor

shino commented Sep 2, 2015

PR for lager https://github.com/basho/lager/pull/292 (on top of 2.2.0)

@shino
Copy link
Contributor

shino commented Sep 2, 2015

It seems that list objects with prefix with >1024 bytes is still failing with 7c4aa3b

@shino
Copy link
Contributor

shino commented Sep 3, 2015

Memo: AWS S3 GET Bucket responds OK for >1024 bytes prefix

https://gist.github.com/shino/4b5400ace3ba1c8f4279

@kuenishi
Copy link
Contributor Author

kuenishi commented Sep 7, 2015

I think it's ready again.

-spec big_end_key(Prefix::binary() | undefined) -> binary().
big_end_key(undefined) ->
big_end_key(<<>>);
big_end_key(Prefix) when byte_size(Prefix) > ?MAX_S3_KEY_LENGTH ->
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why isn't the value in application env used here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's difficult to guess the "right" (not too bad) behavior in this situation...
The save start key and end one results in at most single entry. Then, is it better to use <<Prefix/binary, 255>> here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, just forgot about app env. Will push an update.

LocalCtx = LocalCtx0#key_context{bucket=Bucket, key=Key},
Ctx#context{bucket=Bucket,
local_context=LocalCtx}.
case byte_size(unicode:characters_to_binary(Key)) of
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think Key is just a sequence of bytes instead of sequence of Unicode codepoints.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Counter example: 600 0x81s, less then 1024.

% for i in {1..600}; do echo -n %81; done | read LT1k
% S3CURL=.s3curl.15018.alice s3curl.pl --put rebar.config --id cs -- -x 127.0.0.1:15018 -s http://test.s3.amazonaws.com/${LT1k}
<?xml version="1.0" encoding="UTF-8"?>
<Error><Code>KeyTooLongError</Code><Message>Your key is too long</Message>
<Size>1200</Size><MaxSizeAllowed>1024</MaxSizeAllowed><RequestId></RequestId></Error>

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought I've fixed it 😢

@kuenishi
Copy link
Contributor Author

kuenishi commented Sep 7, 2015

updated

MaxLen when byte_size(Prefix) > MaxLen ->
<<>>;
MaxLen ->
binary:copy(<<255>>, MaxLen - byte_size(Prefix))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fails if MaxLen is unlimited

@@ -105,6 +105,13 @@ test(UserConfig) ->
?assert(lists:member([{prefix, "0/"}], CommonPrefixes3)),
verify_object_list(ObjList4, 30),

%% Don't fail even if Prefix is longer than 1024, even if keys are
%% restricted to be shorter than it. That's S3.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LoL

@shino
Copy link
Contributor

shino commented Sep 7, 2015

Cool optimization! Great collaboration work @angrycub & @kuenishi 👍

borshop added a commit that referenced this pull request Sep 7, 2015
Merge CV's list-objects prefix optimization + Add key length limit [JIRA: RCS-275]

Reviewed-by: shino
@kuenishi
Copy link
Contributor Author

kuenishi commented Sep 7, 2015

@borshop merge

borshop added a commit that referenced this pull request Sep 7, 2015
Merge CV's list-objects prefix optimization + Add key length limit [JIRA: RCS-275]

Reviewed-by: shino
@Basho-JIRA
Copy link

Release note: GET Bucket API (aka listing objects in a bucket) with prefix specified had a room for optimization, that had been specifying too loose end key for folding objects in Riak. With this change more tight end key is being specified as end key for folding objects in Riak to omit unnecessary fold in vnodes. Limitation to max length of keys has been also introduced, which can be specified as 1024 by default. This means keys no longer than 1024 bytes could be PUT, GET nor DELETED any more unless max_key_length explicitly specified as more than 1024 in riak-cs.conf. Operators who want to preserve old behaviour about key length may also use unlimited for max_key_length.

_[posted via JIRA by Kota Uenishi]_

@kuenishi
Copy link
Contributor Author

kuenishi commented Sep 7, 2015

@borshop: retry

@borshop borshop merged commit 8642741 into develop Sep 7, 2015
@kuenishi kuenishi deleted the feature/optimize-list-objects branch September 7, 2015 09:09
borshop added a commit that referenced this pull request Sep 18, 2015
Merge CV's list-objects prefix optimization + Add key length limit [JIRA: RCS-275]

Reviewed-by: shino

Conflicts:
	src/riak_cs_s3_response.erl
	src/riak_cs_wm_object.erl
	src/riak_cs_wm_object_upload_part.erl
@kuenishi kuenishi mentioned this pull request Sep 18, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants