Add Redshift overwrite methods #676

kukushking · 2021-05-04T22:25:07Z

Issue #671:

Description of changes:
Add overwrite_method = "drop" | "cascade" | "truncate" | "delete" parameter to determine the way the table should be overwritten.

drop - plain DROP ... (RESTRICT by default) - drops the table, but will fail if there are any views that depend on it.
cascade - DROP ... CASCADE - drops the table, and all views that depend on it.
truncate - TRUNCATE ... - truncates the table, but commits the current transaction, hence the overwrite is not atomic.
delete - DELETE FROM ... - deletes all rows from the table within the current transaction. Slow relative to the other methods.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

jaidisido · 2021-05-04T22:29:03Z

AWS CodeBuild CI Report

CodeBuild project: GitHubCodeBuild8756EF16-sDRE8Pq0duHT
Commit ID: cb67d66
Result: FAILED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

jaidisido · 2021-05-05T12:05:18Z

AWS CodeBuild CI Report

CodeBuild project: GitHubCodeBuild8756EF16-sDRE8Pq0duHT
Commit ID: e30e18d
Result: FAILED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

jaidisido · 2021-05-05T12:47:22Z

AWS CodeBuild CI Report

CodeBuild project: GitHubCodeBuild8756EF16-sDRE8Pq0duHT
Commit ID: e2df136
Result: SUCCEEDED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

jaidisido

Looking great, I had a couple of questions before shipping

jaidisido · 2021-05-05T18:27:51Z

awswrangler/redshift.py

@@ -1,4 +1,5 @@
 """Amazon Redshift Module."""
+# pylint: disable=too-many-lines


Any reason why this is applied at the file level instead of the offending method?

Yes, this is actually regarding too many lines in module:
awswrangler/redshift.py:1:0: C0302: Too many lines in module (1507/1500) (too-many-lines)

jaidisido · 2021-05-05T18:50:09Z

tests/test_redshift.py



-def test_to_sql_simple(redshift_table, redshift_con):
+@pytest.mark.parametrize("overwrite_method", [None, "drop", "cascade", "truncate", "delete"])


jaidisido · 2021-05-05T18:58:14Z

awswrangler/redshift.py

+            except redshift_connector.error.ProgrammingError as e:
+                # Caught "relation does not exist".
+                _logger.debug(str(e))
+                con.rollback()
+            _begin_transaction(cursor=cursor)


Can you help me understand this part? If the truncate fails because the table/view does not exist, we roll back and then we begin the transaction regardless? What does begin transaction achieves?

Exactly. TRUNC fails if there is no table, so except catches that, and rolls back the transaction. If TRUNC succeeds, it commits the current transaction so in both cases we have to begin a new one. That is an unfortunate side effect of using TRUNC - it is fast, but table overwrite in this case isn't atomic. A user aware of the consequences still might want to use it over the others though.

Now that I think about it, it is probably worth to:

Check the error message for the exact code of relation ... does not exist error.

Put _begin_transaction into the finally block just in case something in except goes south.
Would you agree?

Understood, thanks. I agree with catching the exact message and putting the begin transaction in a finally block

FYI I changed my mind on the latter - actually, we don't want to begin transaction if something in except block fails - that would just be a waste of transaction. Late night replies 😅

jaidisido · 2021-05-06T11:56:34Z

AWS CodeBuild CI Report

CodeBuild project: GitHubCodeBuild8756EF16-sDRE8Pq0duHT
Commit ID: 1824c32
Result: FAILED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

jaidisido · 2021-05-06T11:56:49Z

AWS CodeBuild CI Report

CodeBuild project: GitHubCodeBuild8756EF16-sDRE8Pq0duHT
Commit ID: c1b538b
Result: FAILED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

jaidisido · 2021-05-06T12:19:47Z

AWS CodeBuild CI Report

CodeBuild project: GitHubCodeBuild8756EF16-sDRE8Pq0duHT
Commit ID: 338a1c8
Result: SUCCEEDED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

kukushking force-pushed the redshift_overwrite_methods branch from e30e18d to e2df136 Compare May 5, 2021 12:34

jaidisido self-requested a review May 5, 2021 17:46

jaidisido assigned kukushking May 5, 2021

jaidisido reviewed May 5, 2021

View reviewed changes

Add Redshift overwrite methods

338a1c8

kukushking force-pushed the redshift_overwrite_methods branch from c1b538b to 338a1c8 Compare May 6, 2021 12:06

jaidisido merged commit 2da3134 into aws:main May 6, 2021

		@@ -1,4 +1,5 @@
		"""Amazon Redshift Module."""
		# pylint: disable=too-many-lines



		def test_to_sql_simple(redshift_table, redshift_con):
		@pytest.mark.parametrize("overwrite_method", [None, "drop", "cascade", "truncate", "delete"])

Add Redshift overwrite methods #676

Add Redshift overwrite methods #676

Uh oh!

Conversation

kukushking commented May 4, 2021

Uh oh!

jaidisido commented May 4, 2021

AWS CodeBuild CI Report

Uh oh!

jaidisido commented May 5, 2021

AWS CodeBuild CI Report

Uh oh!

jaidisido commented May 5, 2021

AWS CodeBuild CI Report

Uh oh!

jaidisido left a comment

Choose a reason for hiding this comment

Uh oh!

jaidisido May 5, 2021

Choose a reason for hiding this comment

Uh oh!

kukushking May 5, 2021

Choose a reason for hiding this comment

Uh oh!

jaidisido May 5, 2021

Choose a reason for hiding this comment

Uh oh!

kukushking May 5, 2021

Choose a reason for hiding this comment

Uh oh!

jaidisido May 5, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kukushking May 5, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kukushking May 5, 2021

Choose a reason for hiding this comment

Uh oh!

jaidisido May 6, 2021

Choose a reason for hiding this comment

Uh oh!

kukushking May 6, 2021

Choose a reason for hiding this comment

Uh oh!

jaidisido commented May 6, 2021

AWS CodeBuild CI Report

Uh oh!

jaidisido commented May 6, 2021

AWS CodeBuild CI Report

Uh oh!

jaidisido commented May 6, 2021

AWS CodeBuild CI Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jaidisido May 5, 2021 •

edited

Loading

kukushking May 5, 2021 •

edited

Loading