Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

Loading…

Fix delete resurrection #512 #518

Merged
merged 0 commits into from

3 participants

Andrew J. Stone Kelly McLaughlin Reid Draper
Andrew J. Stone
Owner

cs512 regression test

handle multiple active manifests during delete\n This fixes #512

slight refactoring and documentation

more documentation for the helpers of gc_active_manifests/3

fix typos and formatting

refactoring and code cleanup

more refactoring

Kelly McLaughlin
Collaborator

This is a pre-existing bug, but it is preventing me from running the client tests successfully. If you don't mind let's throw a fix for it in with this pr. The lager statement is just missing a formatting character so it is an easy fix.

Kelly McLaughlin
Collaborator

Now I'm getting 500 errors on object deletes from the python client tests. Need to investigate more to see what's going on.

Kelly McLaughlin
Collaborator

This is also a bug. Even deletes for nonexistent objects should return a 204 so the tuple on that line should start with true instead of false. Hooray for exposing latent bugs!

Kelly McLaughlin
Collaborator

Now I wonder if that lager message on L240 is going to be too spammy. Thoughts?

Kelly McLaughlin
Collaborator

With the fix for the 2nd pre-existing bug previously mentioned patched:

  • Eunit and eqc tests pass
  • Client and integration tests pass
  • Dialyzer output is clean
  • All riak_test tests pass
src/riak_cs_gc.erl
((7 lines not shown))
- case riak_cs_mp_utils:clean_multipart_unused_parts(M, RiakcPid) of
- same ->
- ActiveUUIDs = [M?MANIFEST.uuid],
- GCManiResponse = gc_specific_manifests(ActiveUUIDs,
- RiakObject,
- Bucket, Key,
- RiakcPid),
- return_active_uuids_from_gc_response(GCManiResponse,
- ActiveUUIDs);
- updated ->
- updated
+%% @doc Keep requesting manifests until
+%% 1) There are no more active manifests
+%% 2) All previously active multipart manifests have had their unused parts cleaned
+%% and become active+multipart_clean
+%% 3) All active manifests and active+multipart_clean manifests for multipart ar GC'd
Kelly McLaughlin Collaborator

s/ar/are

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Andrew J. Stone
Owner

I think the lager message for failures, especially delete failures, is handy. As long as logs are rotated, it shouldn't be a problem.

Reid Draper

Small style nit-pick, we tend to put newlines after any -> occurrence, except lambdas. So even small functions like:

foo() -> ok.

would be written as:

foo() ->
    ok.

same for case clauses.

Reid Draper

Please add specs to any newly added functions too. Running dialyzer after adding them will keep you honest too.

Kelly McLaughlin
Collaborator

Found an behavioral difference between CS and S3. To reproduce:

  1. Start a multipart upload to CS to overwrite an existing object
  2. While the multipart upload is progressing, delete the object at that key that the mp upload is writing
  3. Allow the mp upload to complete.

With CS the mp uploaded object is present and valid, but with S3 accessing the key returns 404 Not Found.

Kelly McLaughlin
Collaborator

Also seeing cases where a PUT seems to succeed, but when I attempt to fetch it I get a 404 Not Found. Here's the steps I have used to reproduce that behavior:

  1. Start a multipart upload to a key that does not exist.
  2. While the multipart upload is progressing, write a small object to the same key that the mp upload is writing
  3. While the multipart upload continues, attempt to fetch the object at the key.

I see the following s3cmd results when doing this:

13:09:44:s3tmp $ s3cmd -c ~/.s3cfg-moss put a s3://simple-bucket/a
WARNING: Module python-magic is not available. Guessing MIME types based on file extensions.
a -> s3://simple-bucket/a  [1 of 1]
 110 of 110   100% in    0s   825.79 B/s  done
13:09:51:s3tmp $ s3cmd -c ~/.s3cfg-moss --force get s3://simple-bucket/a adown
s3://simple-bucket/a -> adown  [1 of 1]
ERROR: S3 error: 404 (Object Not Found):

Once the multipart upload completes the problem from my last comment takes effect and a fetch with s3cmd yields the following:

13:09:56:s3tmp $ s3cmd -c ~/.s3cfg-moss info s3://simple-bucket/a
s3://simple-bucket/a (object):
   File size: 1695547392
   Last mod:  Tue, 02 Apr 2013 19:09:30 GMT
   MIME type: application/octet-stream
   MD5 sum:   69d1cbdeb926414b8462c0a60c0a21ee
   policy: none
   ACL:       test1: FULL_CONTROL
Andrew J. Stone
Owner

@kellymclaughlin I think that first bug is due to not deleting writing manifests as well as active manifests during delete. Are you sure that behavior has changed since 1.3 release? Since it's an inconsistency, I agree it should be fixed, just wondering if it's directly related to my changes.

Kelly McLaughlin
Collaborator

@andrewjstone Excellent point. Let me re-run those tests using the release/1.3 branch and see what happens.

Andrew J. Stone
Owner

@kellymclaughlin I'm not able to reproduce the second error. The reason being, I think, is that the new smaller file becomes the active one since it has the later start time. When I run info it shows me the small file during and after mp upload of the large file.

Kelly McLaughlin
Collaborator

I can reproduce the first error on release/1.3 and I agree it is probably the writing state manifests issue. We can solve that separately. I am not able to reproduce the second error on release/1.3. There are times when I would not see the error so it could be some sort of race condition. Let me try to get more information about it.

Kelly McLaughlin
Collaborator

Looks good, +1 to merge

src/riak_cs_gc.erl
((91 lines not shown))
+
+-spec gc_manifest(M :: lfs_manifest(),
+ RiakObject :: riakc_obj:riakc_obj(),
+ Bucket :: binary(),
+ Key :: binary(),
+ RiakcPid :: pid(),
+ UUIDs :: [binary()]) ->
+ [binary()] | no_return().
+gc_manifest(M, RiakObject, Bucket, Key, RiakcPid, UUIDs) ->
+ UUID = M?MANIFEST.uuid,
+ check(gc_specific_manifests([UUID], RiakObject, Bucket, Key, RiakcPid), [UUID | UUIDs]).
+
+check({ok, _}, Val) ->
+ Val;
+check({error, _}=Error, _Val) ->
+ throw(Error).

why do we need to use exception throwing here? Feels like this and the caller gc_manifests/5 could be pure code otherwise.

Andrew J. Stone Owner

It's a non-local return to escape the fold early in case of error in gc_manifests. The catch returns the error. Otherwise we'd have what I replaced, which is a fold with 2 clauses that must go through the list returning error in the accumulator each time. I wasn't sure which one I liked better, but this one has less code.

Andrew J. Stone Owner

I'm second guessing my change after talking with Reid. I think I'm going to change it to remove the exception.

Andrew J. Stone Owner

On the other hand, this works and has been tested. I still don't have client tests working. Merging as is. Will fix this in the near future.

Works for me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Andrew J. Stone andrewjstone merged commit 49cb5d2 into from
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Sorry, commit information is not available for this pull request.

Something went wrong with that request. Please try again.