-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add validation mode support in COPY INTO in snowflake #1689
Conversation
Codecov ReportBase: 97.72% // Head: 97.72% // No change to project coverage 👍
Additional details and impacted files@@ Coverage Diff @@
## main #1689 +/- ##
=======================================
Coverage 97.72% 97.72%
=======================================
Files 21 21
Lines 835 835
=======================================
Hits 816 816
Misses 19 19 Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
e292d69
to
593a61a
Compare
@sunank200 Currently, why we don't need ON_ERROR=CONTINUE for other filetypes - |
@utkarsharma2 This option was even available previously for all file types. The user could previously pass |
047be60
to
c4a6aec
Compare
7cfebab
to
597c120
Compare
597c120
to
8515c2c
Compare
@tatiana added the documentation for `SnowflakeLoadOptions`. |
832a3a6
to
8a56159
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for addressing all the feedback, @sunank200!
It will be essential to add details about this change in the 1.5 changelogs.
89a5427
to
dc5f4f2
Compare
Sure @tatiana |
# Description ## What is the current behavior? <!-- Please describe the current behavior that you are modifying. --> Currently, the load_file operator can hide errors when doing native Snowflake transfers. We should raise an exception and display the errors to the users, not hide this problem. As it stands, a user may be loading 10k rows into a table, end up with five rows, and not realise there is an issue. <!-- Issues are required for both bug fixes and features. Reference it using one of the following: closes: #ISSUE related: #ISSUE --> closes: #581 ## What is the new behaviour? <!-- Please describe the behaviour or changes that are being added by this PR. --> - Add validation mode as part of the `COPY INTO` command. Specify the supported validation mode; `RETURN_n_ROWS` or `RETURN_ERRORS` or `RETURN_ALL_ERRORS`. This instructs the COPY command to validate the data files instead of loading them into the specified table; i.e. the COPY command tests the files for errors but does not load them. Read more at: https://docs.snowflake.com/en/sql-reference/sql/copy-into-table.html#optional-parameters - Remove the `ON_ERROR=Continue` hardcoded for CSV file types - Add an option for the user to pass `ON_ERROR=Continue` as part of `copy_options` Example: ``` aql.load_file( input_file=File("s3://astro-sdk/python_sdk/example_dags/data/sample.csv", conn_id="aws_conn"), output_table=Table( conn_id=SNOWFLAKE_CONN_ID, ), load_options=[ SnowflakeLoadOptions( file_options={"SKIP_HEADER": 1, "SKIP_BLANK_LINES": True}, copy_options={"ON_ERROR": "CONTINUE"}, validation_mode="RETURN_ALL_ERRORS", ) ], ) ``` ## Does this introduce a breaking change? Yes ### Checklist - [x] Created tests which fail without the change (if possible) - [x] Extended the README / documentation, if necessary
Description
What is the current behavior?
Currently, the load_file operator can hide errors when doing native Snowflake transfers. We should raise an exception and display the errors to the users, not hide this problem.
As it stands, a user may be loading 10k rows into a table, end up with five rows, and not realise there is an issue.
closes: #581
What is the new behaviour?
COPY INTO
command. Specify the supported validation mode;RETURN_n_ROWS
orRETURN_ERRORS
orRETURN_ALL_ERRORS
. This instructs the COPY command to validate the data files instead of loading them into the specified table; i.e. the COPY command tests the files for errors but does not load them. Read more at:https://docs.snowflake.com/en/sql-reference/sql/copy-into-table.html#optional-parameters
ON_ERROR=Continue
hardcoded for CSV file typesON_ERROR=Continue
as part ofcopy_options
Example:
Does this introduce a breaking change?
Yes
Checklist