Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Aborted query notification in Redshift #239

Open
andreys70 opened this issue Feb 17, 2022 · 4 comments
Open

Aborted query notification in Redshift #239

andreys70 opened this issue Feb 17, 2022 · 4 comments

Comments

@andreys70
Copy link

andreys70 commented Feb 17, 2022

I faced a weird behavior of the Redshift cluster (the support ticket is still open with service team: Case ID 9386087231) where COPY command query sent by the LambdaRedshiftLoader getting aborted first, and then re-written, and competed successfully by the Redshift engine. The problem is that the only Aborted result reported back to the Lambda function and batch status marked as 'error'. I verified in the Redshift that data was loaded successfully in this scenario.
Also, the complexity of this issue is that there is multi cluster scenario, where data loaded to two Redshift clusters and only one is failing with the above scenario. This leads to a question I can't find an answer to in this repo: If I reprocess the 'error' batch does the data loaded to both clusters, or solution is smart enough to load failed batches only for failed Redshift clusters?

@IanMeyers
Copy link
Contributor

Yes, if you reprocess the batch it will use the configuration entry that you have set when the reprocess occurs, which if unchanged would load both clusters. You could create a single cluster configuration that you link to the prefix and change the old one if you want to change the load target.

@andreys70
Copy link
Author

andreys70 commented Feb 18, 2022

The s3Prefix is a primary key in config table, so it won't let me to create two separate load configurations, with different Redshift clusters, but for the same s3 prefix.
I had to trick the config a bit and created two separate entries with the single Redshift cluster config for each; one prod and one dr. I hope it is going to work:
1. bucket/source/schema/table_name/year=*/month=*/day=*/hour=*
2. bucket/source/schema/table_name/year=*/month=*/day=*/*

@IanMeyers
Copy link
Contributor

I think in that case it will potentially use both, or may just select the first one. I would instead change the prefix to a dummy value for the single cluster, and then flip it back to the dual cluster configuration once the reprocess is done.

@andreys70
Copy link
Author

I used your suggestion, Ian. The approach that I described above did not work, and as you said, it picked up the first matching prefix configuration.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants