object list (Get Bucket) returns isTruncated=false too soon #781

shino · 2014-01-24T09:05:37Z

put 1003 objects
delete first two keys
list objects return only 1000 keys (correct result is 1001 keys)

Confirmed with release/1.4 branch (3de6acf).

$ for i in {0001..1003}; do s3cmd -c .s3cfg.8071.alice put /data/static/9999/${i} s3://test ; done
$ s3cmd -v -c .s3cfg.8071.alice del  s3://test/0001
$ s3cmd -v -c .s3cfg.8071.alice del  s3://test/0002
$ s3cmd -v -c .s3cfg.8071.alice ls s3://test | wc -l
1000

The text was updated successfully, but these errors were encountered:

shino · 2014-01-24T09:08:08Z

Just a quick look (no actual confirmation),

NumKeysRequested < riak_cs_list_objects_utils:manifests_and_prefix_length(ObjectsAndPrefixes)

at https://github.com/basho/riak_cs/blob/develop/src/riak_cs_list_objects_fsm_v2.erl#L263
becomes
1000 < 1002 - 2 then false.

reiddraper · 2014-01-24T16:53:16Z

Is this with fold_objects true or false?

shino · 2014-01-27T01:59:59Z

Is this with fold_objects true or false?

It's set to 'true':

(rcs-dev1@127.0.0.1)1> application:get_env(riak_cs, fold_objects_for_list_keys).
{ok,true}

shino · 2014-01-27T08:28:23Z

One more pattern found: when there is one and only one non-active object after 1001st key.

One example situation:

Active objects with keys from 0001 to 1499
One writing object with key 1500
Active objects with keys from 1501 to 2003
Then s3cmd ls returns 2000 keys (correct results is 2002).

(The specific number 1500 is not important, it's important that there are more than 1000 objects before non-active object)

memo: the key which is the same as marker is removed from fold_objects result:
https://github.com/basho/riak_cs/blob/develop/src/riak_cs_list_objects_fsm_v2.erl#L207

shino · 2014-01-27T09:32:19Z

Quick fix:

diff --git a/src/riak_cs_list_objects_fsm_v2.erl b/src/riak_cs_list_objects_fsm_v2.erl
index 4140a90..771fde6 100644
--- a/src/riak_cs_list_objects_fsm_v2.erl
+++ b/src/riak_cs_list_objects_fsm_v2.erl
@@ -260,7 +260,7 @@ respond(StateData=#state{req=Request},

 -spec truncated(non_neg_integer(), {list(), ordsets:ordset()}) -> boolean().
 truncated(NumKeysRequested, ObjectsAndPrefixes) ->
-    NumKeysRequested < riak_cs_list_objects_utils:manifests_and_prefix_length(ObjectsAndPrefixes) andalso
+    NumKeysRequested < riak_cs_list_objects_utils:manifests_and_prefix_length(ObjectsAndPrefixes) + 1 andalso
     %% this is because (strangely) S3 returns `false' for
     %% `isTruncated' if `max-keys=0', even if there are more keys.

reiddraper · 2014-01-27T17:06:46Z

Bug confirmed.

reiddraper · 2014-01-27T19:51:00Z

The above fix seems to be working, but it has the side-effect of returning isTruncated=True when there are exactly 1000 objects. This means that an extra request will be made by the client, which will then return 0 keys.

reiddraper · 2014-01-27T21:02:42Z

Here's the approach I'm testing now:

diff --git a/src/riak_cs_list_objects_fsm_v2.erl b/src/riak_cs_list_objects_fsm_v2.erl
index 12e7f0d..26bffbd 100644
--- a/src/riak_cs_list_objects_fsm_v2.erl
+++ b/src/riak_cs_list_objects_fsm_v2.erl
@@ -271,7 +271,7 @@ enough_results(#state{req=?LOREQ{max_keys=UserMaxKeys},
                       objects=Objects,
                       common_prefixes=CommonPrefixes}) ->
     riak_cs_list_objects_utils:manifests_and_prefix_length({Objects, CommonPrefixes})
-    >= UserMaxKeys
+    > UserMaxKeys
     orelse EndOfKeyspace.

 response_from_manifests_and_common_prefixes(Request,

reiddraper · 2014-01-27T21:58:29Z

This approach seems to be working well. I'm going to add a riak_test regression test as well.

reiddraper · 2014-01-27T23:41:48Z

Working branch is here. Think it's looking good, but need to rewrite the commit messages once I get some feedback.

shino · 2014-01-28T02:50:20Z

This means that an extra request will be made by the client, which will then return 0 keys.

Yes. I didn't like it. Your approach eliminates unnecessary turn around between client and CS, it seems better :)

riak_test test case looks nice and work as advertised (fails with current release/1.4 and succeeds with your branch) .

Would you mind adding another similar but slightly different scenario mentioned at comment above?
(#781 (comment))

One more pattern found: when there is one and only one non-active object after 1001st key.

shino · 2014-01-28T03:47:56Z

One more thought (rant?) ;)

In the case of

riak_cs_list_objects_utils:manifests_and_prefix_length({Objects, CommonPrefixes})
 =:= UserMaxKeys

the results of another cs_fold_bucket request are not included in
the response for client, because there are UserMaxKeys results before the request.

We already have the information whether riak has more keys or not as ReachedEnd.
The table of isTruncated with ReachedEnd included is as follows:

| ManifestPrefixLen     | ReachedEnd=false     | ReachedEnd=true |
|-----------------------+----------------------+-----------------|
| 0..(UserMaxKeys-1)    | true [more fold req] | false           |
| UserMaxKeys           | true                 | false           |
| (UserMaxKeys+1)..1002 | true                 | true            |

Therefore, isTruncated is true either:

not ReachedEnd or
UserMaxkeys < ManifestPrefixLen

Diff (local change):

    Modified   src/riak_cs_list_objects_fsm_v2.erl
diff --git a/src/riak_cs_list_objects_fsm_v2.erl b/src/riak_cs_list_objects_fsm_v2.erl
index 26bffbd..5a84046 100644
--- a/src/riak_cs_list_objects_fsm_v2.erl
+++ b/src/riak_cs_list_objects_fsm_v2.erl
@@ -215,7 +215,7 @@ handle_done(State=#state{object_buffer=ObjectBuffer,
     riak_cs_list_objects_utils:filter_prefix_keys(ObjectPrefixTuple, Request),
     ReachedEnd = ObjectBufferLength < NumKeysRequested,

-    Truncated = truncated(UserMaxKeys, ObjectPrefixTuple2),
+    Truncated = truncated(UserMaxKeys, ReachedEnd, ObjectPrefixTuple2),
     SlicedTaggedItems =
     riak_cs_list_objects_utils:manifests_and_prefix_slice(ObjectPrefixTuple2,
                                                           UserMaxKeys),
@@ -258,9 +258,13 @@ respond(StateData=#state{req=Request},
             end
     end.

--spec truncated(non_neg_integer(), {list(), ordsets:ordset(term())}) -> boolean().
-truncated(NumKeysRequested, ObjectsAndPrefixes) ->
-    NumKeysRequested < riak_cs_list_objects_utils:manifests_and_prefix_length(ObjectsAndPrefixes) andalso
+-spec truncated(non_neg_integer(), boolean(), {list(), ordsets:ordset(term())}) -> boolean().
+truncated(NumKeysRequested, ReachedEnd, ObjectsAndPrefixes) ->
+    ManifestPrefixLength = riak_cs_list_objects_utils:manifests_and_prefix_length(ObjectsAndPrefixes),
+    ((not ReachedEnd)
+     orelse
+     NumKeysRequested < ManifestPrefixLength)
+    andalso
     %% this is because (strangely) S3 returns `false' for
     %% `isTruncated' if `max-keys=0', even if there are more keys.
     %% The `Ceph' tests were nice to find this.
@@ -271,7 +275,7 @@ enough_results(#state{req=?LOREQ{max_keys=UserMaxKeys},
                       objects=Objects,
                       common_prefixes=CommonPrefixes}) ->
     riak_cs_list_objects_utils:manifests_and_prefix_length({Objects, CommonPrefixes})
-    > UserMaxKeys
+    >= UserMaxKeys
     orelse EndOfKeyspace.

reiddraper · 2014-01-28T17:00:24Z

In my mind, I had wanted to keep truncated ignorant of as much details as possible. Because of this, I wanted to have the 'lazy list' of results populated enough so that just seeing if TotalResults > UserRequested would be enough to tell if the list was truncated. I wanted the rest of the code to be responsible for making sure the ObjectsAndPrefixes was populated enough to do this.

shino · 2014-01-29T02:09:41Z

I had wanted to keep truncated ignorant of as much details as possible

I agree for simplicity.
Listing object API is intrinsically complex so it's good to keep the implementation simple.
Sorry for my "just an idea" comment...

shino · 2014-01-29T06:44:53Z

I'm trying to write my first EQC property at https://github.com/basho/riak_cs/compare/feature;list-objects-eqc-addition?expand=1
(It's unfinished, dirty, refactoring needed... Any comment/advice/review are appriciated 😄 )

That property prop_list_all_active_keys found a counter example:

1002 keys, all active then
listing (s3cmd ls) shows only first 1000 keys.

This is probably because riak_cs_list_objects_utils:manifests_and_prefix_slice/2 takes only 1000 (=UserMaxKeys) keys whether enough_results is true or not.

reiddraper · 2014-01-29T21:31:37Z

Damn. Nice find.

reiddraper · 2014-01-29T21:41:33Z

Strange that it works with 1001 keys, and not 1002.

reiddraper · 2014-01-29T21:46:34Z

1003 also works fine. Of note, there is a 1002 hardcoded in the the CS code for the number of keys to request in the fold objects bit.

reiddraper · 2014-01-29T22:52:52Z

So here's what I think is happening:

Code here:

ReachedEnd = ObjectBufferLength < NumKeysRequested,

If ObjectBufferLength and NumKeysRequested are both 1002 (1002 is the default for NumKeysRequest), then ReachedEnd becomes false. This means we'll do another 2i request. But first, we slice the result-set to 1000 results, and store it in our state. When the next 2i result comes back with 0 results, we only see that our state has 1000 results, so it does not appear to be truncated. Because of this, both 1001 and 1003 results work fine, because they're right on the edges of the condition. That being said, I'm not quite sure yet how to fix this. Maybe we should try and avoid slicing the result-set until the very last minute?

reiddraper · 2014-01-29T23:13:35Z

Reverting the change I made initially fixes the 1002 key issue. Re-running EQC though to see what else (in addition to the original bug) pops up.

reiddraper · 2014-01-29T23:32:24Z

Note: the EQC test can be quite ram-intensive.

shino · 2014-01-30T03:35:41Z

Maybe we should try and avoid slicing the result-set until the very last minute?

Sounds good.

Note: the EQC test can be quite ram-intensive.

Oops. I modified generators and pushed it. RAM usage (RSS) is at most 150MB. Also faster than yesterday's :)

reiddraper · 2014-01-30T04:56:31Z

Thanks. I think I'll try and take a look at this again with fresh eyes in the morning. It's really great to have a more powerful EQC test here, as it was becoming obvious that manual test cases were not going to find all of the edge-cases.

reiddraper · 2014-01-30T05:39:03Z

diff --git a/src/riak_cs_list_objects_fsm_v2.erl b/src/riak_cs_list_objects_fsm_v2.erl
index 4a4ae18..f84a3e4 100644
--- a/src/riak_cs_list_objects_fsm_v2.erl
+++ b/src/riak_cs_list_objects_fsm_v2.erl
@@ -213,16 +213,18 @@ handle_done(State=#state{object_buffer=ObjectBuffer,

     ObjectPrefixTuple2 =
     riak_cs_list_objects_utils:filter_prefix_keys(ObjectPrefixTuple, Request),
+
+    lager:error("ObjectBufferLength is ~p", [ObjectBufferLength]),
+    lager:error("NumKeysRequested is ~p", [NumKeysRequested]),
+
     ReachedEnd = ObjectBufferLength < NumKeysRequested,
+    lager:error("Reached end is ~p", [ReachedEnd]),

     Truncated = truncated(UserMaxKeys, ObjectPrefixTuple2),
-    SlicedTaggedItems =
-    riak_cs_list_objects_utils:manifests_and_prefix_slice(ObjectPrefixTuple2,
-                                                          UserMaxKeys),
+    lager:error("Truncated is ~p", [Truncated]),

-    {NewManis, NewPrefixes} =
-    riak_cs_list_objects_utils:untagged_manifest_and_prefix(SlicedTaggedItems),

+    {NewManis, NewPrefixes} = ObjectPrefixTuple2,
     NewStateData = RangeUpdatedStateData#state{objects=NewManis,
                                                common_prefixes=NewPrefixes,
                                                reached_end_of_keyspace=ReachedEnd,
@@ -238,14 +240,21 @@ update_profiling_and_last_request(State, ObjectBuffer, ObjectBufferLength) ->

 -spec respond(state(), list(), ordsets:ordset(), boolean()) ->
     fsm_state_return().
-respond(StateData=#state{req=Request},
+respond(StateData=#state{req=Request=?LOREQ{max_keys=UserMaxKeys}},
         Manifests, Prefixes, Truncated) ->
     case enough_results(StateData) of
         true ->
+
+            SlicedTaggedItems =
+            riak_cs_list_objects_utils:manifests_and_prefix_slice({Manifests,
+                                                                   Prefixes},
+                                                                  UserMaxKeys),
+            {NewManis, NewPrefixes} =
+            riak_cs_list_objects_utils:untagged_manifest_and_prefix(SlicedTaggedItems),
             Response =
             response_from_manifests_and_common_prefixes(Request,
                                                         Truncated,
-                                                        {Manifests, Prefixes}),
+                                                        {NewManis, NewPrefixes}),
             try_reply({ok, Response}, StateData);
         false ->
             RiakcPid = StateData#state.riakc_pid,
@@ -439,6 +448,7 @@ next_byte(<<Integer:8/integer>>) ->
                 state()) ->
     fsm_state_return().
 try_reply(Response, State) ->
+    lager:error("Try reply"),
     NewStateData = State#state{response=Response},
     reply_or_wait(Response, NewStateData).

Some debugging code in here, but this is seeming to work ok so far. Will try again in the AM too.

shino · 2014-01-30T15:44:00Z

Added delimiter option to EQC property and made PR #784 (squashed branch).

reiddraper · 2014-01-30T18:16:04Z

Added delimiter option to EQC property and made PR #784 (squashed branch).

My 'fixes' now fail with this branch. I'm going to take some time to try and understand the test for a bit now.

shino · 2014-01-30T23:57:39Z

My 'fixes' now fail with this branch.

Oh, I now confirmed a failure in with-delimiter case. I will check the reason of failure too.

reiddraper · 2014-01-31T17:36:51Z

Continuing a bit of the conversation on #784.

shino · 2014-02-03T08:02:37Z

Some comments although the branch is not squashed :)
(the commit I'm looking is c1e873f)

How do you think removing version suffix _v1 from list_objects_request_v1
and list_objects_response_v1?
src/riak_cs_list_objects_fsm_v2.erl L.410: element(2, lists:last(PrevRanges))
can be just Key (a variable at L.404),
(not related to bug fix of this issue) riak_cs_list_objects_fsm_v2:key_is_common_prefix
returns true for keys which includes delimiter, not necessarily ends with delimiter.
For example: Key=<<"a/b">>, Prefix=undefined, Delimiter=<<"/">>.
The similar logic in make_start_key_from_marker/1 seems proper because it checks
that delimiter is suffix.

shino · 2014-02-03T12:33:42Z

(not related to bug fix of this issue) riak_cs_list_objects_fsm_v2:key_is_common_prefix returns true for keys which includes delimiter, not necessarily ends with delimiter. For example: Key=<<"a/b">>, Prefix=undefined, Delimiter=<<"/">>. The similar logic in make_start_key_from_marker/1 seems proper because it checks that delimiter is suffix.

Sorry... This is completely my misunderstanding...
The key passed to riak_cs_list_objects_fsm_v2:key_is_common_prefix is full binary,
not trimmed to common prefix.

shino · 2014-02-05T02:17:54Z

fixed by #788

ghost assigned reiddraper Jan 24, 2014

reiddraper mentioned this issue Feb 3, 2014

List objects FSM v2 bugfix and tests #788

Merged

shino added this to the 1.4.5 milestone Feb 4, 2014

shino closed this as completed Feb 5, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

object list (Get Bucket) returns isTruncated=false too soon #781

object list (Get Bucket) returns isTruncated=false too soon #781

shino commented Jan 24, 2014

shino commented Jan 24, 2014

reiddraper commented Jan 24, 2014

shino commented Jan 27, 2014

shino commented Jan 27, 2014

shino commented Jan 27, 2014

reiddraper commented Jan 27, 2014

reiddraper commented Jan 27, 2014

reiddraper commented Jan 27, 2014

reiddraper commented Jan 27, 2014

reiddraper commented Jan 27, 2014

shino commented Jan 28, 2014

shino commented Jan 28, 2014

reiddraper commented Jan 28, 2014

shino commented Jan 29, 2014

shino commented Jan 29, 2014

reiddraper commented Jan 29, 2014

reiddraper commented Jan 29, 2014

reiddraper commented Jan 29, 2014

reiddraper commented Jan 29, 2014

reiddraper commented Jan 29, 2014

reiddraper commented Jan 29, 2014

shino commented Jan 30, 2014

reiddraper commented Jan 30, 2014

reiddraper commented Jan 30, 2014

shino commented Jan 30, 2014

reiddraper commented Jan 30, 2014

shino commented Jan 30, 2014

reiddraper commented Jan 31, 2014

shino commented Feb 3, 2014

shino commented Feb 3, 2014

shino commented Feb 5, 2014

object list (Get Bucket) returns isTruncated=false too soon #781

object list (Get Bucket) returns isTruncated=false too soon #781

Comments

shino commented Jan 24, 2014

shino commented Jan 24, 2014

reiddraper commented Jan 24, 2014

shino commented Jan 27, 2014

shino commented Jan 27, 2014

shino commented Jan 27, 2014

reiddraper commented Jan 27, 2014

reiddraper commented Jan 27, 2014

reiddraper commented Jan 27, 2014

reiddraper commented Jan 27, 2014

reiddraper commented Jan 27, 2014

shino commented Jan 28, 2014

shino commented Jan 28, 2014

reiddraper commented Jan 28, 2014

shino commented Jan 29, 2014

shino commented Jan 29, 2014

reiddraper commented Jan 29, 2014

reiddraper commented Jan 29, 2014

reiddraper commented Jan 29, 2014

reiddraper commented Jan 29, 2014

reiddraper commented Jan 29, 2014

reiddraper commented Jan 29, 2014

shino commented Jan 30, 2014

reiddraper commented Jan 30, 2014

reiddraper commented Jan 30, 2014

shino commented Jan 30, 2014

reiddraper commented Jan 30, 2014

shino commented Jan 30, 2014

reiddraper commented Jan 31, 2014

shino commented Feb 3, 2014

shino commented Feb 3, 2014

shino commented Feb 5, 2014