Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

URL decode S3 key name and set ClientRequestToken and JobTag to eTag before invoking Textract #3

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

matwerber1
Copy link

@matwerber1 matwerber1 commented Sep 1, 2019

Issue #, if available:
#2

Description of changes:

  1. Used urllib to decode the value of the S3 Key received from the S3 event. The S3 event sends an encoded key name, e.g. spaces are replaced with plus (+); without decoding, this leads to the wrong object name being sent when invoking Textract.

  2. Although the S3 key name used when invoking Textract may contain spaces or plusses, Textract will throw an error if the ClientRequestToken or JobTag has a space or plus. Since current code bases these two parameter values on the S3 object's name, this will lead to errors if the object name contains spaces or plusses. So, I changed these two parameters to instead derive their value from the S3 object's eTag (which is its MD5 value).

I tested with a file named "My Test.pdf" in S3. Prior to this PR, the Textract invokes failed and it now works.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@matwerber1
Copy link
Author

Just realized that if these changes are accepted, you'll also need to repackage the Lambda zip and upload to the my-python-packages bucket so that the folks using the deploy button get the updated code.

@bhakti-visotrust
Copy link

bhakti-visotrust commented Jan 15, 2021

Hey, I'm still getting the same error for specific tokens like "{}()[]/+$#!?<>;" in file name, and for jobTag I had to manually replace the spaces.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants