Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support loading metadata columns from stage into table for Snowflake #2023

Merged
merged 10 commits into from Aug 30, 2023

Conversation

pankajkoti
Copy link
Contributor

Adds support to load metadata columns like METADATA$FILENAME,
METADATA$FILE_ROW_NUMBER, etc. from stage into target table
while natively loading files into Snowflake tables. Read more at:
https://docs.snowflake.com/en/user-guide/querying-metadata#example-3-loading-metadata-columns-into-a-table

Note that, you cannot specify both validation_mode and
metadata_columns together in the Snowflake load options
because when we need to load metadata_columns, we need
to explicitly name the metadata columns in the COPY INTO
sql statement and such a transformed SQL statement does not
allow specifying VALIDATION_MODE with it. It's a limitation
for Snowflake queries.
The transformed SQL appears like in the snippet in the following link:
https://docs.snowflake.com/en/user-guide/querying-metadata#example-3-loading-metadata-columns-into-a-table

closes: #1982

@pankajkoti
Copy link
Contributor Author

The transformed COPY INTO command executed is like the below when metadata_columns are specified:

'COPY INTO <TABLE_NAME> FROM (SELECT $1,$2,METADATA$FILENAME,METADATA$FILE_ROW_NUMBER,METADATA$FILE_CONTENT_KEY,METADATA$FILE_LAST_MODIFIED,METADATA$START_SCAN_TIME FROM @SANDBOX.<schema_name>.<stage_name>/sample.csv) '

@pankajkoti
Copy link
Contributor Author

Sample load output with metadata columns

Screenshot 2023-08-28 at 1 09 30 PM

@codecov
Copy link

codecov bot commented Aug 28, 2023

Codecov Report

Patch coverage: 100.00% and project coverage change: +0.73% 🎉

Comparison is base (4176abf) 89.54% compared to head (09d2b10) 90.28%.
Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2023      +/-   ##
==========================================
+ Coverage   89.54%   90.28%   +0.73%     
==========================================
  Files          75       75              
  Lines        4296     4324      +28     
  Branches      531      537       +6     
==========================================
+ Hits         3847     3904      +57     
+ Misses        354      332      -22     
+ Partials       95       88       -7     
Flag Coverage Δ
PythonSDK 90.28% <100.00%> (+0.73%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Changed Coverage Δ
python-sdk/src/astro/databases/snowflake.py 85.33% <100.00%> (+1.13%) ⬆️
python-sdk/src/astro/options.py 97.61% <100.00%> (+0.05%) ⬆️

... and 6 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@pankajkoti pankajkoti force-pushed the 1982-support-loading-metadata-columns-from-stage branch from 34d9023 to 9f0810c Compare August 28, 2023 13:00
pankajkoti and others added 9 commits August 30, 2023 15:02
Adds support to load metadata columns like METADATA$FILENAME,
METADATA$FILE_ROW_NUMBER, etc. from stage into target table
while natively loading files into Snowflake tables. Read more at:
https://docs.snowflake.com/en/user-guide/querying-metadata#example-3-loading-metadata-columns-into-a-table

Note that, you cannot specify both `validation_mode` and
`metadata_columns` together in the Snowflake load options because
when we need to load `metadata_columns`, we need to explicitly
name the metadata columns in the `COPY INTO` sql statement and such
a transformed SQL statement does not allow specifying `VALIDATION_MODE`
with it. It's a limitation for Snowflake queries.
The transformed SQL appears like in the snippet in the following link:
https://docs.snowflake.com/en/user-guide/querying-metadata#example-3-loading-metadata-columns-into-a-table

closes: #1982
Co-authored-by: Utkarsh Sharma <utkarsharma2@gmail.com>
@pankajkoti pankajkoti force-pushed the 1982-support-loading-metadata-columns-from-stage branch from 000adc5 to ec9080e Compare August 30, 2023 09:32
@pankajkoti pankajkoti merged commit c4bccf8 into main Aug 30, 2023
34 checks passed
@pankajkoti pankajkoti deleted the 1982-support-loading-metadata-columns-from-stage branch August 30, 2023 10:08
@Andrew-Wichmann
Copy link

Nice! Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Include Snowflake staged file metadata in LoadFileOperator
4 participants