Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve the logs in case native transfers fallbacks to Pandas #1263

Closed
1 of 10 tasks
magdagultekin opened this issue Nov 17, 2022 · 2 comments
Closed
1 of 10 tasks

Improve the logs in case native transfers fallbacks to Pandas #1263

magdagultekin opened this issue Nov 17, 2022 · 2 comments
Assignees
Labels
feature New feature or request priority/high High priority product/python-sdk Label describing products
Milestone

Comments

@magdagultekin
Copy link

magdagultekin commented Nov 17, 2022

Please describe the feature you'd like to see
When loading a CSV file from S3 to Snowflake a problem occurs, task fails but the table gets created and the data populated. However logs are not clear enough about it, please see below:

[2022-11-16, 11:07:46 EST] {base.py:517} WARNING - Loading files failed with Native Support. Falling back to Pandas-based load
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/astro/databases/base.py", line 508, in load_file_to_table_natively_with_fallback
    self.load_file_to_table_natively(
  File "/usr/local/lib/python3.9/site-packages/astro/databases/snowflake.py", line 623, in load_file_to_table_natively
    self.evaluate_results(rows)
  File "/usr/local/lib/python3.9/site-packages/astro/databases/snowflake.py", line 630, in evaluate_results
    raise DatabaseCustomError(rows)
astro.exceptions.DatabaseCustomError: [{'file': 's3://s3-dev-etldata-001/inbound/test/Out_CM_CU.csv', 'status': 'LOAD_FAILED', 'rows_parsed': 168598, 'rows_loaded': 0, 'error_limit': 168598, 'errors_seen': 168598, 'first_error': 'Numeric value \'"RECORDID"\' is not recognized', 'first_error_line': 1, 'first_error_character': 1, 'first_error_column_name': '"OUT_CM_CU"["RECORDID":1]'}]

The logs show 0 rows loaded but it's not true.

Refer to https://astronomer.slack.com/archives/C02B8SPT93K/p1668615614891269
Describe the solution you'd like

  1. I'd love to see the logs improved to reflect the action properly - meaning mention fallback to Pandas and show how many rows were populated in fact.

  2. The customer came back and said that he used enable_native_fallback=False and the data got loaded nevertheless. It's not an expected behaviour, right?
    He also mentioned that the documentation confuses him, particularly this part (it doesn't mention fallbacking to Python - and by the way, what do you mean by that?):

Additional context
Task that was used:

s3_to_snowflake = aql.load_file(
    task_id="s3_to_snowflake",
    input_file=File(path=f"s3://{S3_BUCKET_NAME}/{S3_FILE_NAME}", filetype=FileType.CSV),
    output_table=Table(
        conn_id=SNOWFLAKE_CONN_ID,
        metadata=Metadata(database=SNOWFLAKE_DATABASE, schema=SNOWFLAKE_SCHEMA),
        name=SNOWFLAKE_TABLE_NAME,
    ),
    if_exists="replace",
)

Acceptance Criteria

  • Test if enable_native_fallback=False works as expected.
  • All checks and tests in the CI should pass
  • Unit tests (90% code coverage or more, once available)
  • Integration tests (if the feature relates to a new database or external service)
  • Example DAG
  • Docstrings in reStructuredText for each of methods, classes, functions and module-level attributes (including Example DAG on how it should be used)
  • Exception handling in case of errors
  • Logging (are we exposing useful information to the user? e.g. source and destination)
  • Improve the documentation (README, Sphinx, and any other relevant)
  • How to use Guide for the feature (example)
@magdagultekin magdagultekin added the feature New feature or request label Nov 17, 2022
@phanikumv phanikumv added this to the 1.3.0 milestone Nov 17, 2022
@phanikumv phanikumv added the product/python-sdk Label describing products label Nov 17, 2022
@sunank200 sunank200 added the priority/critical Critical priority label Nov 21, 2022
@sunank200 sunank200 self-assigned this Nov 23, 2022
@sunank200 sunank200 added priority/high High priority and removed priority/critical Critical priority labels Nov 28, 2022
phanikumv pushed a commit that referenced this issue Nov 29, 2022
# Description

## What is the current behavior?

Currently, we don't have any logging for native nor pandas _load_file_.
This makes it harder for users to follow the operations.

related: #1263

## What is the new behavior?

We now have log statements for native, pandas as well as fallback
indication.

## Does this introduce a breaking change?

No.

### Checklist
- [ ] Created tests which fail without the change (if possible)
- [ ] Extended the README / documentation, if necessary
@phanikumv
Copy link
Collaborator

Point 1 is addressed in #1312
Point 2 - @magdagultekin will re-check with customer and update here.

@phanikumv
Copy link
Collaborator

Please re-open in case point 2 is still an issue.

sunank200 pushed a commit that referenced this issue Dec 1, 2022
# Description

## What is the current behavior?

Currently, we don't have any logging for native nor pandas _load_file_.
This makes it harder for users to follow the operations.

related: #1263

## What is the new behavior?

We now have log statements for native, pandas as well as fallback
indication.

## Does this introduce a breaking change?

No.

### Checklist
- [ ] Created tests which fail without the change (if possible)
- [ ] Extended the README / documentation, if necessary

(cherry picked from commit 279c7d1)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature or request priority/high High priority product/python-sdk Label describing products
Projects
None yet
Development

No branches or pull requests

4 participants