Skip to content

Conversation

@kukushking
Copy link
Contributor

Issue #671:

Description of changes:
Add overwrite_method = "drop" | "cascade" | "truncate" | "delete" parameter to determine the way the table should be overwritten.

drop - plain DROP ... (RESTRICT by default) - drops the table, but will fail if there are any views that depend on it.
cascade - DROP ... CASCADE - drops the table, and all views that depend on it.
truncate - TRUNCATE ... - truncates the table, but commits the current transaction, hence the overwrite is not atomic.
delete - DELETE FROM ... - deletes all rows from the table within the current transaction. Slow relative to the other methods.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@jaidisido
Copy link
Contributor

AWS CodeBuild CI Report

  • CodeBuild project: GitHubCodeBuild8756EF16-sDRE8Pq0duHT
  • Commit ID: cb67d66
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@jaidisido
Copy link
Contributor

AWS CodeBuild CI Report

  • CodeBuild project: GitHubCodeBuild8756EF16-sDRE8Pq0duHT
  • Commit ID: e30e18d
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@kukushking kukushking force-pushed the redshift_overwrite_methods branch from e30e18d to e2df136 Compare May 5, 2021 12:34
@jaidisido
Copy link
Contributor

AWS CodeBuild CI Report

  • CodeBuild project: GitHubCodeBuild8756EF16-sDRE8Pq0duHT
  • Commit ID: e2df136
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@jaidisido jaidisido self-requested a review May 5, 2021 17:46
Copy link
Contributor

@jaidisido jaidisido left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking great, I had a couple of questions before shipping

@@ -1,4 +1,5 @@
"""Amazon Redshift Module."""
# pylint: disable=too-many-lines
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason why this is applied at the file level instead of the offending method?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is actually regarding too many lines in module:
awswrangler/redshift.py:1:0: C0302: Too many lines in module (1507/1500) (too-many-lines)



def test_to_sql_simple(redshift_table, redshift_con):
@pytest.mark.parametrize("overwrite_method", [None, "drop", "cascade", "truncate", "delete"])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Comment on lines 270 to 276
except redshift_connector.error.ProgrammingError as e:
# Caught "relation does not exist".
_logger.debug(str(e))
con.rollback()
_begin_transaction(cursor=cursor)
Copy link
Contributor

@jaidisido jaidisido May 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you help me understand this part? If the truncate fails because the table/view does not exist, we roll back and then we begin the transaction regardless? What does begin transaction achieves?

Copy link
Contributor Author

@kukushking kukushking May 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly. TRUNC fails if there is no table, so except catches that, and rolls back the transaction. If TRUNC succeeds, it commits the current transaction so in both cases we have to begin a new one. That is an unfortunate side effect of using TRUNC - it is fast, but table overwrite in this case isn't atomic. A user aware of the consequences still might want to use it over the others though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that I think about it, it is probably worth to:

  1. Check the error message for the exact code of relation ... does not exist error.
  2. Put _begin_transaction into the finally block just in case something in except goes south.
    Would you agree?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood, thanks. I agree with catching the exact message and putting the begin transaction in a finally block

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI I changed my mind on the latter - actually, we don't want to begin transaction if something in except block fails - that would just be a waste of transaction. Late night replies 😅

@jaidisido
Copy link
Contributor

AWS CodeBuild CI Report

  • CodeBuild project: GitHubCodeBuild8756EF16-sDRE8Pq0duHT
  • Commit ID: 1824c32
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@jaidisido
Copy link
Contributor

AWS CodeBuild CI Report

  • CodeBuild project: GitHubCodeBuild8756EF16-sDRE8Pq0duHT
  • Commit ID: c1b538b
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@kukushking kukushking force-pushed the redshift_overwrite_methods branch from c1b538b to 338a1c8 Compare May 6, 2021 12:06
@jaidisido
Copy link
Contributor

AWS CodeBuild CI Report

  • CodeBuild project: GitHubCodeBuild8756EF16-sDRE8Pq0duHT
  • Commit ID: 338a1c8
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@jaidisido jaidisido merged commit 2da3134 into aws:main May 6, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants