Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update S3ToRedshift Operator docs to indicate multiple key functionality #28705

Merged
merged 1 commit into from
Jan 6, 2023

Conversation

RachitSharma2001
Copy link
Contributor

@RachitSharma2001 RachitSharma2001 commented Jan 3, 2023

As shown in issue #27957, the current documentation for the S3ToRedshift Operator seems to indicate that only one key from S3 can be transferred to Redshift. However, as elaborated here and in the aws docs here, the COPY command from S3 to Redshift automatically looks for all keys that matches the given prefix, and then copies all of them to redshift.

For this PR, I wanted to update the docs to make this clear to Airflow users. I also added a system test that displays this functionality.

@@ -42,7 +42,7 @@ class S3ToRedshiftOperator(BaseOperator):
:param schema: reference to a specific schema in redshift database
:param table: reference to a specific table in redshift database
:param s3_bucket: reference to a specific S3 bucket
:param s3_key: reference to a specific S3 key
:param s3_key: reference either to a specific S3 key or a set of keys or folders sharing that prefix
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does “a set of keys” mean? Does it has to be a Python set, or is the term being used more liberally? If the latter case I think collection is a more common term.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose a better wording would be:

:param s3_key: key prefix that selects single or multiple objects from S3

@@ -207,6 +208,18 @@ def delete_security_group(sec_group_id: str, sec_group_name: str):
)
# [END howto_transfer_s3_to_redshift]

# [START howto_transfer_s3_to_redshift_multiple_keys]
transfer_s3_to_redshift_multiple = S3ToRedshiftOperator(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dont forget to add it to the chain command

@RachitSharma2001
Copy link
Contributor Author

Thank you all for the suggestions for the changes. I have updated the wording of the documentation in airflow/providers/amazon/aws/transfers/s3_to_redshift.py, and have added the requested change to the system test. Let me know if any other changes are needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants