URL decode S3 key name and set ClientRequestToken and JobTag to eTag before invoking Textract #3
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Issue #, if available:
#2
Description of changes:
Used urllib to decode the value of the S3 Key received from the S3 event. The S3 event sends an encoded key name, e.g. spaces are replaced with plus (+); without decoding, this leads to the wrong object name being sent when invoking Textract.
Although the S3 key name used when invoking Textract may contain spaces or plusses, Textract will throw an error if the
ClientRequestToken
orJobTag
has a space or plus. Since current code bases these two parameter values on the S3 object's name, this will lead to errors if the object name contains spaces or plusses. So, I changed these two parameters to instead derive their value from the S3 object'seTag
(which is its MD5 value).I tested with a file named "My Test.pdf" in S3. Prior to this PR, the Textract invokes failed and it now works.
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.