Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Overly specific asserts on S3Client._remove #212

Closed
mcclurem opened this issue Feb 28, 2022 · 3 comments · Fixed by #307
Closed

Overly specific asserts on S3Client._remove #212

mcclurem opened this issue Feb 28, 2022 · 3 comments · Fixed by #307
Labels
good first issue Good for newcomers

Comments

@mcclurem
Copy link

Thanks a bunch for this library, I've been trying to make use of it with an s3-emulating service that is not 100% bit-for-bit compliant and I've run into issues in a couple places.
Specifically I've noticed that in S3Client._remove, you assert for response==204, the clone-service I'm using responds with a 200 when you successfully delete instead.
I'd love to submit a PR to fix this but I need some input.
I think it should be acceptable to check if the response is a 2xx code and by doing so you're more flexible. In addition, asserts outside of unit testing are considered hazardous, instead this should raise an exception of some kind.
Do you have a preferred exception to be raised here instead? or should it be left to boto to raise the client error?

@pjbull
Copy link
Member

pjbull commented Feb 28, 2022

@mcclurem Out of curiosity, which S3-emulating service are you using?

Happy to take a PR on this. Answers to specific questions below.


I think it should be acceptable to check if the response is a 2xx code and by doing so you're more flexible.

Yep, seems reasonable.

In addition, asserts outside of unit testing are considered hazardous, instead this should raise an exception of some kind.

Agreed that those should raise exceptions, not assert

Do you have a preferred exception to be raised here instead?

You can add a custom exception if there is not already one that makes sense here:
https://github.com/drivendataorg/cloudpathlib/blob/master/cloudpathlib/exceptions.py

@pjbull pjbull added the good first issue Good for newcomers label Mar 2, 2022
@aaossa
Copy link
Contributor

aaossa commented Sep 23, 2022

Hi, seems like checking for a 200 or 204 status code (instead of 204 alone) should be enough, since the action performed is a DELETE. About what to do with the exception, seems like any possible error should raise a ClientError (which is already handled) except if the file does not exist, in which case it won't raise an exception (boto3 docs).

def _remove(self, cloud_path: S3Path, missing_ok: bool = True) -> None:
try:
obj = self.s3.Object(cloud_path.bucket, cloud_path.key)
# will throw if not a file
obj.load()
resp = obj.delete()
assert resp.get("ResponseMetadata").get("HTTPStatusCode") == 204

But seems like that case was already covered as mentioned in this comment a couples line further:

# ensure directory deleted; if cloud_path did not exist at all
# resp will be [], so no need to check success
if resp:
assert resp[0].get("ResponseMetadata").get("HTTPStatusCode") == 200
else:
if not missing_ok:
raise FileNotFoundError(f"File does not exist: {cloud_path}")

A possibility would be to remove both assertions and raise FileNotFoundError when necessary (if missing_ok is False) in both sections. Would that be OK?

    def _remove(self, cloud_path: S3Path, missing_ok: bool = True) -> None:
        try:
            obj = self.s3.Object(cloud_path.bucket, cloud_path.key)

            # will throw if not a file
            obj.load()

            resp = obj.delete()
-           assert resp.get("ResponseMetadata").get("HTTPStatusCode") == 204
+           # Assumes that a non-successful code means that the file does not exist
+           if resp.get("ResponseMetadata").get("HTTPStatusCode") not in (200, 204) and not missing_ok:
+               raise FileNotFoundError(f"File does not exist: {cloud_path}")

        except ClientError:
            # try to delete as a direcotry instead
            bucket = self.s3.Bucket(cloud_path.bucket)

            prefix = cloud_path.key
            if prefix and not prefix.endswith("/"):
                prefix += "/"

            resp = bucket.objects.filter(Prefix=prefix).delete()

-            # ensure directory deleted; if cloud_path did not exist at all
-            # resp will be [], so no need to check success
-            if resp:
-                assert resp[0].get("ResponseMetadata").get("HTTPStatusCode") == 200
-            else:
-                if not missing_ok:
-                    raise FileNotFoundError(f"File does not exist: {cloud_path}")
+           # Assumes that a non-successful code means that the file does not exist
+           if not resp and not missing_ok:
+               raise FileNotFoundError(f"File does not exist: {cloud_path}")

Of course, this is just speculation, I would need to test the suggested implementation and confirm which statuses does the response get. This is just a draft ⚠️

By the way, when comparing the _remove method for the S3, Azure and GS clients seems like the S3 version is quite different, so the exception has no equivalent in the other clients.

@pjbull
Copy link
Member

pjbull commented Sep 24, 2022

@aaossa Yep, that implementation looks good to me initially, thanks. Like you said, it will need some testing. Maybe update the message when you raise FileNotFoundError to indicate we were trying to delete that file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants