URL decode S3 key name and set ClientRequestToken and JobTag to eTag before invoking Textract #3

matwerber1 · 2019-09-01T00:01:14Z

Issue #, if available:
#2

Description of changes:

Used urllib to decode the value of the S3 Key received from the S3 event. The S3 event sends an encoded key name, e.g. spaces are replaced with plus (+); without decoding, this leads to the wrong object name being sent when invoking Textract.
Although the S3 key name used when invoking Textract may contain spaces or plusses, Textract will throw an error if the ClientRequestToken or JobTag has a space or plus. Since current code bases these two parameter values on the S3 object's name, this will lead to errors if the object name contains spaces or plusses. So, I changed these two parameters to instead derive their value from the S3 object's eTag (which is its MD5 value).

I tested with a file named "My Test.pdf" in S3. Prior to this PR, the Textract invokes failed and it now works.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

matwerber1 · 2019-09-01T19:36:58Z

Just realized that if these changes are accepted, you'll also need to repackage the Lambda zip and upload to the my-python-packages bucket so that the folks using the deploy button get the updated code.

bhakti-visotrust · 2021-01-15T19:42:02Z

Hey, I'm still getting the same error for specific tokens like "{}()[]/+$#!?<>;" in file name, and for jobTag I had to manually replace the spaces.

URL decode S3 object's key name received in event

ceffcff

This was referenced Sep 1, 2019

S3 Event sends URL encoded key names, causing Lambda handler to fail on Textract API calls #2

Open

Process fails when input file contains spaces #1

Closed

matwerber1 force-pushed the master branch 2 times, most recently from 7e037a8 to 86ee87a Compare September 1, 2019 03:41

use s3 eTag for Textract request token and jobTag

54d3590

matwerber1 force-pushed the master branch from 86ee87a to 54d3590 Compare September 1, 2019 03:44

AutomagicalApps mentioned this pull request Dec 27, 2019

Issue with Policy Names #7

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

URL decode S3 key name and set ClientRequestToken and JobTag to eTag before invoking Textract #3

URL decode S3 key name and set ClientRequestToken and JobTag to eTag before invoking Textract #3

matwerber1 commented Sep 1, 2019 •

edited

Loading

matwerber1 commented Sep 1, 2019

bhakti-visotrust commented Jan 15, 2021 •

edited

Loading

URL decode S3 key name and set ClientRequestToken and JobTag to eTag before invoking Textract #3

Are you sure you want to change the base?

URL decode S3 key name and set ClientRequestToken and JobTag to eTag before invoking Textract #3

Conversation

matwerber1 commented Sep 1, 2019 • edited Loading

matwerber1 commented Sep 1, 2019

bhakti-visotrust commented Jan 15, 2021 • edited Loading

matwerber1 commented Sep 1, 2019 •

edited

Loading

bhakti-visotrust commented Jan 15, 2021 •

edited

Loading