Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dealing with Escaped Characters in S3 Key Names #3584

Closed
2 tasks
hossimo opened this issue Feb 6, 2023 · 9 comments
Closed
2 tasks

Dealing with Escaped Characters in S3 Key Names #3584

hossimo opened this issue Feb 6, 2023 · 9 comments
Assignees
Labels
feature-request This issue requests a feature. s3

Comments

@hossimo
Copy link

hossimo commented Feb 6, 2023

Describe the feature

I've asked this over on SO with no reply, and I apologize if this is located somewhere in the Docs and I haven't found it.

Often a file gets uploaded into my buckets that contain characters that need to be escaped. I have no control of what my users will call their file based on their workflows. This is all fine and good until you want to process that file and need to escape. This isn't hard but seems like there should be somewhere in the API to do this for you safely.

As an example, the below in the Use Case section with Pizza-Is-Better-Than-Liver.txt but fails with Pizza Is+AmazeBalls.txt

Obviously, when Pizza Is+AmazeBalls.txt fails it's because the key returned is actually Pizza+Is%2BAmazeBalls.txt but passing that directly to put_object_tagging fails since that's not the actual key of the object.

If it doesnt exist I would suggest something like the below proposed solution packed into some comon part of the library where it makes sense like boto3.client.escape(input: str) (sorry I dont fully understand the bakend, just an eample)

Use Case

import boto3

def lambda_handler(event, context):
    s3 = boto3.client("s3")

    for record in event["Records"]:
        bucket = record["s3"]["bucket"]["name"]
        objectName = record["s3"]["object"]["key"] 

        tags = []
        
        if "Pizza" in objectName:
            tags.append({"Key" : "Project", "Value" : "Great"})
        if "Hamburger" in objectName:
            tags.append({"Key" : "Project", "Value" : "Good"})
        if "Liver" in objectName:
            tags.append({"Key" : "Project", "Value" : "Yuck"})

        s3.put_object_tagging(
            Bucket=bucket,
            Key=objectName,
            Tagging={
                "TagSet" : tags
            }
        )
    return {
        'statusCode': 200,
    }

Proposed Solution

def format_path(path):
    path = path.replace("+", " ")
    path = path.replace("%2B", "+")
    path = path.replace("%21", "!")
    path = path.replace("%27", "'")
    path = path.replace("%28", "(")
    path = path.replace("%29", ")")
    path = path.replace("%26", "&")
    path = path.replace("%24", "$")
    path = path.replace("%40", "@")
    path = path.replace("%3D", "=")
    path = path.replace("%3B", ";")
    path = path.replace("%3A", ":")
    path = path.replace("%2C", ",")
    path = path.replace("%3F", "?")
    return path

Other Information

No response

Acknowledgements

  • I may be able to implement this feature request
  • This feature might incur a breaking change

SDK version used

through boto

Environment details (OS name and version, etc.)

boto3 1.26.41

@hossimo hossimo added feature-request This issue requests a feature. needs-triage This issue or PR still needs to be triaged. labels Feb 6, 2023
@tim-finnigan tim-finnigan self-assigned this Feb 10, 2023
@tim-finnigan tim-finnigan added investigating This issue is being investigated and/or work is in progress to resolve the issue. s3 and removed needs-triage This issue or PR still needs to be triaged. labels Feb 10, 2023
@tim-finnigan
Copy link
Contributor

Hi @hossimo thanks for reaching out. Regarding this point:

Obviously, when Pizza Is+AmazeBalls.txt fails it's because the key returned is actually Pizza+Is%2BAmazeBalls.txt but passing that directly to put_object_tagging fails since that's not the actual key of the object.

When I tried to pass a tag value containing a "%" to the put_object_tagging command, it fails with the error: botocore.exceptions.ClientError: An error occurred (InvalidTag) when calling the PutObjectTagging operation: The TagValue you have provided is invalid

So the failure is due to an invalid tag rather than an invalid key. If you passed a valid tag but invalid key, then you would get an error like:botocore.errorfactory.NoSuchKey: An error occurred (NoSuchKey) when calling the PutObjectTagging operation: The specified key does not exist.

So I think the PutObjectTagging API is behaving as expected. This is the S3 documentation I found on using special characters in key names: https://docs.aws.amazon.com/AmazonS3/latest/userguide/object-keys.html. I couldn't find anything related to special characters and tagging so maybe that needs to be clarified somewhere in the docs.

@tim-finnigan tim-finnigan added response-requested Waiting on additional information or feedback. and removed investigating This issue is being investigated and/or work is in progress to resolve the issue. labels Feb 10, 2023
@hossimo
Copy link
Author

hossimo commented Feb 10, 2023

Hey Tim, thanks for getting back on this.

Using put_object_tagging was just the trigger of the issue but not the cause.

It's not what was getting passed to the TagSet that was the issue; I have control of that. It's the key that gets supplied to the event parameter that is causing the issue if the file name that the user supplied to S3 gets escaped.

Hopefully a more concise example:

I have S3 passing events to SNS and SNS trigging the Lambda.

The user uploads a file to S3 called this is a test.txt (note the spaces)

This comes to Lambda as an SNS event containing the following message key payload. (I have expanded this from a String to JSON and snipped out all the parts we don't need for this example for clarity, including the SNS layer)

{
    "Records": [
      {
        "s3": {
          "bucket": {
            "name": "bucket"
          },
          "object": {
            "key": "this+is+a+test.txt"
          }
        }
      }
    ]
  }

Now notice that the ["Records"][0]["s3"]["object"]["key"] has been escaped to this+is+a+test.txt. This is fine, understandable and expected.

The issue is now, I want to apply some tags to this bucket/key I just received, but in order to do this you have to un-escape it first.

So here is an example using this above simplified SNS/Message block:

import boto3
import json


def lambda_handler(event, context):
    s3 = boto3.client("s3")
    event = json.loads(event)

    for record in event["Records"]:
        bucket = record["s3"]["bucket"]["name"]
        objectName = record["s3"]["object"]["key"] # this+is+a+test.txt

        tags = []
        
        if "Pizza" in objectName:
            tags.append({"Key" : "Project", "Value" : "Great"})
        if "Hamburger" in objectName:
            tags.append({"Key" : "Project", "Value" : "Good"})
        if "Liver" in objectName:
            tags.append({"Key" : "Project", "Value" : "Yuck"})


        ### This will fail
        try:
            s3.put_object_tagging(
                Bucket=bucket,
                Key=objectName, # this+is+a+test.txt
                Tagging={
                    "TagSet" : tags
                }
            )
        except Exception as e:
            print("FAIL")
            print(e)

        ## Un-Escape the Key
        objectName = format_path(objectName)

        ### This will pass
        try:
            s3.put_object_tagging(
                Bucket=bucket,
                Key=objectName, # this is a test.txt
                Tagging={
                    "TagSet" : tags
                }
            )
        except Exception as e:
            print("PASS")
            print(e)



def format_path(path):
    path = path.replace("+", " ")
    path = path.replace("%2B", "+")
    path = path.replace("%21", "!")
    path = path.replace("%27", "'")
    path = path.replace("%28", "(")
    path = path.replace("%29", ")")
    path = path.replace("%26", "&")
    path = path.replace("%24", "$")
    path = path.replace("%40", "@")
    path = path.replace("%3D", "=")
    path = path.replace("%3B", ";")
    path = path.replace("%3A", ":")
    path = path.replace("%2C", ",")
    path = path.replace("%3F", "?")
    return path

if __name__ == "__main__":
    event = """{
    \"Records\": [
      {
        \"s3\": {
          \"bucket\": {
            \"name\": \"bucket\"
          },
          \"object\": {
            \"key\": \"this+is+a+test.txt\"
          }
        }
      }
    ]
  }"""
    lambda_handler (event, None)

So now as long as you have a bucket called bucket and in that bucket, a file called this is a test.txt
When this code runs you get one output from the first try/catch block but not the second, as it should be.

FAIL
An error occurred (NoSuchKey) when calling the PutObjectTagging operation: The specified key does not exist.

This is all expected behaviour. the issue is I need to build and maintain the format_path() function. It seems this should be included in the API so if I missed something (I just did what I thought was right) or it changes for some reason this is a managed function that could be used by anyone to (un)escape their returned values.

What if a user uploads a valid windows file name this is á test == this+is+%C3%A1+test.txt (I didn't know that until just now)

I hope this makes it clearer.

Thanks again for your time.

@github-actions github-actions bot removed the response-requested Waiting on additional information or feedback. label Feb 10, 2023
@tim-finnigan
Copy link
Contributor

Hi @hossimo thanks for following up — have you tried using urllib3.parse* commands to do the decoding? For example: https://docs.python.org/3/library/urllib.parse.html#urllib.parse.unquote (and unquote_plus for plus signs).

@tim-finnigan tim-finnigan added the response-requested Waiting on additional information or feedback. label Feb 10, 2023
@hossimo
Copy link
Author

hossimo commented Feb 10, 2023

Ok well now I feel a little dumb I honestly didn't think of url encoded strings, in my head I was thinking of some special AWS sauce, maybe it was the pluses that threw me off.

Anyway that's a simple fix for my problem with.

Thanks.

@github-actions github-actions bot removed the response-requested Waiting on additional information or feedback. label Feb 11, 2023
@tim-finnigan
Copy link
Contributor

Ok I'm glad that fixes the issue — and I'm glad you opened this in case others run into it as well. I don't have much direct experience with S3->SNS->Lambda workflows but it might be worth noting this information somewhere in the service documentation.

@tim-finnigan tim-finnigan added the closing-soon This issue will automatically close in 4 days unless further comments are made. label Feb 11, 2023
@hossimo
Copy link
Author

hossimo commented Feb 11, 2023

Sorry, just one more thought.

While I agree there is a simple way around this and it is likely in a manual somewhere.

Wouldn't this be a safer (less likely someone would make a bonehead mistake like me) solution:

s3.put_object_tagging(
                Bucket=bucket,
                Key=objectName, # this+is+a+test.txt
                Decode=True,
                Tagging={
                    "TagSet" : tags
                }
            )

This way I'm shifting the blame, but it keeps this managed by the API. Sure on the backend it could just be calling urllib3.parse_plus but simply reading the docs gets you a quick and easy solution.

@github-actions github-actions bot removed the closing-soon This issue will automatically close in 4 days unless further comments are made. label Feb 11, 2023
@tim-finnigan
Copy link
Contributor

@hossimo we would have to forward requests for adding API parameters to the appropriate service team (we also recommend doing this directly through AWS Support if you have a support plan.)

I could forward your suggestion for the PutObjectTagging API to the S3 team but maybe instead it should be a request for the SNS team if it is an SNS API that is adding the special characters?

@tim-finnigan tim-finnigan added the response-requested Waiting on additional information or feedback. label Feb 13, 2023
@hossimo
Copy link
Author

hossimo commented Feb 13, 2023

Ok got it. I do have a support plan, I wasn't sure they were related to this as well.

I'll close this up for now.

Thanks again

@github-actions github-actions bot removed the response-requested Waiting on additional information or feedback. label Feb 13, 2023
@tim-finnigan
Copy link
Contributor

Ok thanks for letting me know, if there are any updates from the support ticket that you'd like to share here in the future please feel free to do so. Otherwise I can go ahead and close this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature-request This issue requests a feature. s3
Projects
None yet
Development

No branches or pull requests

2 participants