Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix run_raw_sql() operartor #1700

Merged
merged 3 commits into from
Feb 3, 2023
Merged

Fix run_raw_sql() operartor #1700

merged 3 commits into from
Feb 3, 2023

Conversation

utkarsharma2
Copy link
Collaborator

@utkarsharma2 utkarsharma2 commented Feb 2, 2023

Description

What is the current behavior?

When we pass multiple dataframes to run_raw_sql() the expected behavior is that the dataframes should be loaded to different temp tables. Currently, that's not happening, and all the data frames are loaded to the same temp table. Later dataframe overrides a former one.

closes: #1687

What is the new behavior?

Every dataframe is loaded to a different temp table.

Does this introduce a breaking change?

Nope

Checklist

  • Created tests which fail without the change (if possible)
  • Extended the README / documentation, if necessary

@codecov
Copy link

codecov bot commented Feb 2, 2023

Codecov Report

Base: 93.63% // Head: 93.10% // Decreases project coverage by -0.53% ⚠️

Coverage data is based on head (709e024) compared to base (009de42).
Patch coverage: 100.00% of modified lines in pull request are covered.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1700      +/-   ##
==========================================
- Coverage   93.63%   93.10%   -0.53%     
==========================================
  Files          90       90              
  Lines        4712     4714       +2     
  Branches      468      468              
==========================================
- Hits         4412     4389      -23     
- Misses        213      234      +21     
- Partials       87       91       +4     
Impacted Files Coverage Δ
...thon-sdk/src/astro/sql/operators/base_decorator.py 94.77% <100.00%> (+0.07%) ⬆️
python-sdk/src/astro/databases/snowflake.py 86.10% <0.00%> (-6.65%) ⬇️
python-sdk/src/astro/databases/base.py 90.30% <0.00%> (-3.09%) ⬇️
python-sdk/src/astro/sql/operators/load_file.py 93.20% <0.00%> (+1.94%) ⬆️
python-sdk/src/astro/utils/compat/functools.py 80.00% <0.00%> (+20.00%) ⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

Copy link
Contributor

@pankajastro pankajastro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@utkarsharma2 how does it will behave after this change if I'll pass multiple Dataframe and a non-temp table? Does it will accept a list of output tables??

@utkarsharma2
Copy link
Collaborator Author

@pankajastro there is no impact for output_table it will still be the same. This change addresses an issue due to which if multiple dataframes were passed to function, we were not putting them in separate temp tables, instead, we were overriding/replacing the same temp table and populating it with a new data frame.

Example:

df_1 = Dataframe({"col1": [1, 2], "col2": [3, 4]})
df_2 = Dataframe({"col3": [1, 2], "col3": [3, 4]})

@aql.run_raw_sql()
def run(tbl_1, tbl_2):
   return "select col1 from {{tbl_1}}"   # <--- this will fail

Because the data of df_2 is loaded to temp_table dropping the data of df_1 so tbl_1 and tbl_2 are the same table.

@feluelle
Copy link
Member

feluelle commented Feb 4, 2023

Good catch! 🚀

cc @tatiana: I think this is the PR you had meant yesterday 👍

utkarsharma2 added a commit that referenced this pull request Feb 8, 2023
# Description
## What is the current behavior?
When we pass multiple dataframes to run_raw_sql() the expected behavior
is that the dataframes should be loaded to different temp tables.
Currently, that's not happening, and all the data frames are loaded to
the same temp table. Later dataframe overrides a former one.

closes: #1687


## What is the new behavior?
Every dataframe is loaded to a different temp table.

## Does this introduce a breaking change?
Nope

### Checklist
- [ ] Created tests which fail without the change (if possible)
- [ ] Extended the README / documentation, if necessary
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

run_raw_sql() not creating different tables when passed dataframes
4 participants