-
Notifications
You must be signed in to change notification settings - Fork 16.4k
Adding snowflake_to_s3 transfer operator, Updates in s3_to_snowflake #20459
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contribution Guide (https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst)
|
|
Why just SnowflakeOperator is not enough? The operator logic is quite simple and I think we can expect the user to just copy the SQL query from the Snowflake documentation. |
@mik-laj We need this for the same reason why we have the SnowflakeOperator when we can use SnowflakeHook directly to run queries. And S3ToSnowflake related changes are required since we already have this operator merged, and currently query_ids and execution_info not exposed from this operator. |
|
@mik-laj @potiuk @turbaszek any update on this PR? |
|
@aa3pankaj I already tried to merge the SnowflakeToS3Operator and it seems that you would have to create a generic operator SnowflakeToStorageOperator. You can read the comments of my already closed pul request: #14415 |
| ) | ||
|
|
||
| sql_parts = [ | ||
| f"COPY INTO @{self.stage}/{self.prefix or ''}", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Stage is the identifier of the object, so it can contain spaces. We should try to ensure that this identifier is safely passed to the query being built, e.g. the use of spaces in the identifier does not cause problems. See: https://docs.snowflake.com/en/sql-reference/identifiers-syntax.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we expect user to pass valid identifier, anyway do you expect SnowflakeToS3Operator to validate stage name using some regex?
| @pytest.mark.parametrize("overwrite", [None, True, False]) | ||
| @pytest.mark.parametrize("single", [None, True, False]) | ||
| @mock.patch("airflow.providers.snowflake.hooks.snowflake.SnowflakeHook.run") | ||
| def test_execute(self, mock_run, schema, prefix, unload_sql, on_error, header, overwrite, single): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand this test. This doesn't test almost anything but is mostly a duplicate of operator logic. We should copy a few generated statements instead of repeating the statement generating code.
|
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions. |
^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.