-
Notifications
You must be signed in to change notification settings - Fork 722
Add Redshift overwrite methods #676
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
e30e18d to
e2df136
Compare
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
jaidisido
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking great, I had a couple of questions before shipping
| @@ -1,4 +1,5 @@ | |||
| """Amazon Redshift Module.""" | |||
| # pylint: disable=too-many-lines | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason why this is applied at the file level instead of the offending method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is actually regarding too many lines in module:
awswrangler/redshift.py:1:0: C0302: Too many lines in module (1507/1500) (too-many-lines)
|
|
||
|
|
||
| def test_to_sql_simple(redshift_table, redshift_con): | ||
| @pytest.mark.parametrize("overwrite_method", [None, "drop", "cascade", "truncate", "delete"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
| except redshift_connector.error.ProgrammingError as e: | ||
| # Caught "relation does not exist". | ||
| _logger.debug(str(e)) | ||
| con.rollback() | ||
| _begin_transaction(cursor=cursor) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you help me understand this part? If the truncate fails because the table/view does not exist, we roll back and then we begin the transaction regardless? What does begin transaction achieves?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exactly. TRUNC fails if there is no table, so except catches that, and rolls back the transaction. If TRUNC succeeds, it commits the current transaction so in both cases we have to begin a new one. That is an unfortunate side effect of using TRUNC - it is fast, but table overwrite in this case isn't atomic. A user aware of the consequences still might want to use it over the others though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that I think about it, it is probably worth to:
- Check the error message for the exact code of
relation ... does not existerror. - Put
_begin_transactioninto thefinallyblock just in case something inexceptgoes south.
Would you agree?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Understood, thanks. I agree with catching the exact message and putting the begin transaction in a finally block
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI I changed my mind on the latter - actually, we don't want to begin transaction if something in except block fails - that would just be a waste of transaction. Late night replies 😅
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
c1b538b to
338a1c8
Compare
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
Issue #671:
Description of changes:
Add
overwrite_method = "drop" | "cascade" | "truncate" | "delete"parameter to determine the way the table should be overwritten.drop- plainDROP ...(RESTRICTby default) - drops the table, but will fail if there are any views that depend on it.cascade-DROP ... CASCADE- drops the table, and all views that depend on it.truncate-TRUNCATE ...- truncates the table, but commits the current transaction, hence the overwrite is not atomic.delete-DELETE FROM ...- deletes all rows from the table within the current transaction. Slow relative to the other methods.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.