Option to disable hive partitioning wild cards #232

niydt · 2021-08-06T05:46:41Z

The avro files we are trying to load into RedShift are stored in folders with "=" in their names, i.e.

    event_type=users.behaviors.app.FirstSession/.

When loading data from the following S3 prefix,

com.hoopladigital.brazecurrentsstaging/StagingCurrentFull/dataexport.prod-03.S3.integration.60d3692fcab9ca5f83919aab/event_type%3Dusers.behaviors.app.FirstSession

The lambda failed with this error:

            error: No Configuration Found for com.hoopladigital.brazecurrentsstaging/StagingCurrentFull/dataexport.prod-03.S3.integration.60d3692fcab9ca5f83919aab/event_type=*/date=*/399/prod-03

As shown in the error message above, the"event_type=/date=" portion of the error message was transformed assuming that we are taking advantage of the hive partitioning wildcards (https://github.com/awslabs/aws-lambda-redshift-loader#hive-partitioning-style-wildcards) and replaces the event_type value with *.

We don't want to use this feature- I need the lambda to use the exact folder name that I provided in the prefix. Is there a way for me to configure the lambda to not use hive partitioning wild cards?

line 1584 of index.js:
inputInfo.prefix = inputInfo.bucket + '/' + searchKey.transformHiveStylePrefix();

line 78 of index.js
transformHiveStylePrefix()

The text was updated successfully, but these errors were encountered:

IanMeyers · 2021-08-06T09:00:11Z

Understood. Could you live with the ability to turn this on and off at the function level? Meaning for the whole installation of the loader, it would or would not perform hive wildcard xforms? This would be relatively easy to support, while selectively doing it per prefix will require a bit more thinking.

IanMeyers · 2021-08-06T09:17:23Z

Also - how many of these event types do you have? If you could suppress specific prefixes from Hive wildcard transforms, would that be achievable or do you have too many event types to list?

jbrew8 · 2021-08-06T19:05:53Z

I am the user that reported this to AWS support (and they logged this issue on my behalf). Thank you for looking into this.

Disabling this feature at the function level would be fine as we currently do not have plans to use hive wildcards.

Managing a list of prefixes to exclude is also fine. We currently have 13 event types, and while this might increase slightly in the future, it should remain easy to maintain a list.

niydt · 2021-08-06T22:59:54Z

Hi Ian,

Thank you for looking into this issue. The problem was raised on jbrew8's behalf, and we would really appreciate any quick workarounds or solution implemented in the near future.

IanMeyers · 2021-08-07T12:20:40Z

OK. So here's my proposal. Please download version 2.8.0 from https://awslabs-code-us-east-1.s3.amazonaws.com/LambdaRedshiftLoader/AWSLambdaRedshiftLoader-2.8.0.zip, which has not yet been pushed to github. Set an environment variable SuppressWildcardExpansionPrefixList with value:

Set to * to suppress all Hive wildcard expansions
Set to a comma separated list of prefixes for which wildcard expansion should be suppressed

Given that you only have 13 prefixes, you could load those directly into the variable, but note that all environment variables together cannot exceed 4K.

I have tested this within my account and it works great, but would like to validate with you first before shipping.

jbrew8 · 2021-08-10T21:16:37Z

Thanks @IanMeyers . I'll give this a shot and let you know how it works.

jbrew8 · 2021-08-11T17:12:17Z

@IanMeyers I had a chance to try your changes- and it looks like they solve our problem. With the new environment variable set prefixes that contain an equals sign are treated literally, and the data is loaded into RedShift correctly. Thank you for your quick turn around on this issue.

IanMeyers · 2021-08-11T17:30:41Z

Wonderful – will push the changes. From: GitHub Notifications ***@***.***> Reply to: awslabs/aws-lambda-redshift-loader ***@***.***> Date: Wednesday, 11 August 2021 at 18:13 To: awslabs/aws-lambda-redshift-loader ***@***.***> Cc: "Meyers, Ian" ***@***.***>, Mention ***@***.***> Subject: Re: [awslabs/aws-lambda-redshift-loader] Option to disable hive partitioning wild cards (#232) @IanMeyers<https://github.com/IanMeyers> I had a chance to try your changes- and it looks like they solve our problem. With the new environment variable set prefixes that contain an equals sign are treated literally, and the data is loaded into RedShift correctly. Thank you for your quick turn around on this issue. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub<#232 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ABQLY4G3IRBPZCFQDKKJLCTT4KVPZANCNFSM5BVHDFYQ>. Triage notifications on the go with GitHub Mobile for iOS<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android<https://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email>. Amazon Development Centre (London) Ltd. Registered in England and Wales with registration number 04543232 with its registered office at 1 Principal Place, Worship Street, London EC2A 2FA, United Kingdom.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Option to disable hive partitioning wild cards #232

Option to disable hive partitioning wild cards #232

niydt commented Aug 6, 2021

IanMeyers commented Aug 6, 2021

IanMeyers commented Aug 6, 2021

jbrew8 commented Aug 6, 2021

niydt commented Aug 6, 2021

IanMeyers commented Aug 7, 2021

jbrew8 commented Aug 10, 2021

jbrew8 commented Aug 11, 2021

IanMeyers commented Aug 11, 2021 via email

Option to disable hive partitioning wild cards #232

Option to disable hive partitioning wild cards #232

Comments

niydt commented Aug 6, 2021

IanMeyers commented Aug 6, 2021

IanMeyers commented Aug 6, 2021

jbrew8 commented Aug 6, 2021

niydt commented Aug 6, 2021

IanMeyers commented Aug 7, 2021

jbrew8 commented Aug 10, 2021

jbrew8 commented Aug 11, 2021

IanMeyers commented Aug 11, 2021 via email