Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add interface to get results in standard formats from run_raw_sql #1583

Closed
9 tasks
jlaneve opened this issue Jan 13, 2023 · 1 comment · Fixed by #1584
Closed
9 tasks

Add interface to get results in standard formats from run_raw_sql #1583

jlaneve opened this issue Jan 13, 2023 · 1 comment · Fixed by #1584
Assignees
Labels
feature New feature or request priority/high High priority product/python-sdk Label describing products
Milestone

Comments

@jlaneve
Copy link
Collaborator

jlaneve commented Jan 13, 2023

Please describe the feature you'd like to see

When I use aql.run_raw_sql, I find myself writing handlers like:

@aql.run_raw_sql(..., handler = lambda x: x.fetchall())
@aql.run_raw_sql(..., handler = pd.DataFrame(x.fetchall(), columns = x.keys()))

It seems like there are a few natural results formats (list, dataframe) that you'd want to get results in, but the user is responsible for making these handlers. Additionally, I can't just use the same handler for both SELECT and CREATE TABLE functions because sqlalchemy's ResultProxy errors if there are no results and you call x.fetchall()

Describe the solution you'd like
Add two fields to the decorator/operator:

  • results_format: let the user specify a common format they want their results (list, dataframe)
  • fail_on_empty: let the user decide what the behavior should be if there are no results

Are there any alternatives to this feature?

We could also consider exporting common handlers instead of forcing the user to define them.

Additional context
Add any other context about the feature request here.

I'm considering this from the perspective of the Astro Cloud IDE, in which our requirements are:

  • get the results as a DataFrame
  • use the same decorator/code for all run_raw_sql's, regardless of wether it's a SELECT or INSERT
    • this is because we're generating a DAG and parsing the SQL code to do this dynamically is unreliable

Acceptance Criteria

  • All checks and tests in the CI should pass
  • Unit tests (90% code coverage or more, once available)
  • Integration tests (if the feature relates to a new database or external service)
  • Example DAG
  • Docstrings in reStructuredText for each of methods, classes, functions and module-level attributes (including Example DAG on how it should be used)
  • Exception handling in case of errors
  • Logging (are we exposing useful information to the user? e.g. source and destination)
  • Improve the documentation (README, Sphinx, and any other relevant)
  • How to use Guide for the feature (example)
@jlaneve jlaneve added the feature New feature or request label Jan 13, 2023
@pankajastro pankajastro added the product/python-sdk Label describing products label Jan 14, 2023
@phanikumv phanikumv added the priority/high High priority label Jan 16, 2023
@utkarsharma2 utkarsharma2 added this to the 1.4.1 milestone Jan 17, 2023
@utkarsharma2
Copy link
Collaborator

We need to update/add documentation for these new params. Will take care in a separate PR.

@pankajastro pankajastro modified the milestones: 1.4.1, 1.5.0 Jan 18, 2023
@kaxil kaxil linked a pull request Jan 19, 2023 that will close this issue
2 tasks
utkarsharma2 added a commit that referenced this issue Jan 23, 2023
# Description

related: #1583

This PR extends the `run_raw_sql` to make it easier to get results.
Previously, you'd have to write a `handler` yourself. The idea behind
this change is to abstract some of the common handlers (get the results
as a list, dataframe).

<!--
Issues are required for both bug fixes and features.
Reference it using one of the following:

closes: #ISSUE
related: #ISSUE
-->

Specifically, it:
- introduces two new fields to the run_raw_sql operator:
- `results_format`: let the user specify a common format they want their
results (list, dataframe)
- `fail_on_empty`: let the user decide what the behavior should be if
there are no results

## Does this introduce a breaking change?

No, everything is incremental


### Checklist
- [ ] Created tests which fail without the change (if possible)
- [ ] Extended the README / documentation, if necessary

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: utkarsh sharma <utkarsharma2@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature or request priority/high High priority product/python-sdk Label describing products
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants