Improve the logs in case native transfers fallbacks to Pandas #1263

magdagultekin · 2022-11-17T10:30:32Z

Please describe the feature you'd like to see
When loading a CSV file from S3 to Snowflake a problem occurs, task fails but the table gets created and the data populated. However logs are not clear enough about it, please see below:

[2022-11-16, 11:07:46 EST] {base.py:517} WARNING - Loading files failed with Native Support. Falling back to Pandas-based load
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/astro/databases/base.py", line 508, in load_file_to_table_natively_with_fallback
    self.load_file_to_table_natively(
  File "/usr/local/lib/python3.9/site-packages/astro/databases/snowflake.py", line 623, in load_file_to_table_natively
    self.evaluate_results(rows)
  File "/usr/local/lib/python3.9/site-packages/astro/databases/snowflake.py", line 630, in evaluate_results
    raise DatabaseCustomError(rows)
astro.exceptions.DatabaseCustomError: [{'file': 's3://s3-dev-etldata-001/inbound/test/Out_CM_CU.csv', 'status': 'LOAD_FAILED', 'rows_parsed': 168598, 'rows_loaded': 0, 'error_limit': 168598, 'errors_seen': 168598, 'first_error': 'Numeric value \'"RECORDID"\' is not recognized', 'first_error_line': 1, 'first_error_character': 1, 'first_error_column_name': '"OUT_CM_CU"["RECORDID":1]'}]

The logs show 0 rows loaded but it's not true.

Refer to https://astronomer.slack.com/archives/C02B8SPT93K/p1668615614891269
Describe the solution you'd like

I'd love to see the logs improved to reflect the action properly - meaning mention fallback to Pandas and show how many rows were populated in fact.
The customer came back and said that he used enable_native_fallback=False and the data got loaded nevertheless. It's not an expected behaviour, right?
He also mentioned that the documentation confuses him, particularly this part (it doesn't mention fallbacking to Python - and by the way, what do you mean by that?):

Additional context
Task that was used:

s3_to_snowflake = aql.load_file(
    task_id="s3_to_snowflake",
    input_file=File(path=f"s3://{S3_BUCKET_NAME}/{S3_FILE_NAME}", filetype=FileType.CSV),
    output_table=Table(
        conn_id=SNOWFLAKE_CONN_ID,
        metadata=Metadata(database=SNOWFLAKE_DATABASE, schema=SNOWFLAKE_SCHEMA),
        name=SNOWFLAKE_TABLE_NAME,
    ),
    if_exists="replace",
)

Acceptance Criteria

The text was updated successfully, but these errors were encountered:

# Description ## What is the current behavior? Currently, we don't have any logging for native nor pandas _load_file_. This makes it harder for users to follow the operations. related: #1263 ## What is the new behavior? We now have log statements for native, pandas as well as fallback indication. ## Does this introduce a breaking change? No. ### Checklist - [ ] Created tests which fail without the change (if possible) - [ ] Extended the README / documentation, if necessary

phanikumv · 2022-11-29T10:10:17Z

Point 1 is addressed in #1312
Point 2 - @magdagultekin will re-check with customer and update here.

phanikumv · 2022-11-29T10:19:28Z

Please re-open in case point 2 is still an issue.

# Description ## What is the current behavior? Currently, we don't have any logging for native nor pandas _load_file_. This makes it harder for users to follow the operations. related: #1263 ## What is the new behavior? We now have log statements for native, pandas as well as fallback indication. ## Does this introduce a breaking change? No. ### Checklist - [ ] Created tests which fail without the change (if possible) - [ ] Extended the README / documentation, if necessary (cherry picked from commit 279c7d1)

magdagultekin added the feature New feature or request label Nov 17, 2022

phanikumv added this to the 1.3.0 milestone Nov 17, 2022

phanikumv added the product/python-sdk Label describing products label Nov 17, 2022

sunank200 added the priority/critical Critical priority label Nov 21, 2022

sunank200 self-assigned this Nov 23, 2022

sunank200 added priority/high High priority and removed priority/critical Critical priority labels Nov 28, 2022

phanikumv assigned feluelle Nov 28, 2022

feluelle mentioned this issue Nov 28, 2022

Improve logging for load_file #1312

Merged

2 tasks

phanikumv closed this as completed Nov 29, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve the logs in case native transfers fallbacks to Pandas #1263

Improve the logs in case native transfers fallbacks to Pandas #1263

magdagultekin commented Nov 17, 2022 •

edited by sunank200

Loading

phanikumv commented Nov 29, 2022

phanikumv commented Nov 29, 2022

Improve the logs in case native transfers fallbacks to Pandas #1263

Improve the logs in case native transfers fallbacks to Pandas #1263

Comments

magdagultekin commented Nov 17, 2022 • edited by sunank200 Loading

phanikumv commented Nov 29, 2022

phanikumv commented Nov 29, 2022

magdagultekin commented Nov 17, 2022 •

edited by sunank200

Loading