Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add response_size to run_raw_sql and warn about db thrashing #815

Merged
merged 2 commits into from
Sep 9, 2022

Conversation

tatiana
Copy link
Collaborator

@tatiana tatiana commented Sep 9, 2022

Fix #791

@tatiana tatiana added the product/python-sdk Label describing products label Sep 9, 2022
@tatiana tatiana force-pushed the issue-791 branch 2 times, most recently from ac137bd to a04ef59 Compare September 9, 2022 11:33
@codecov
Copy link

codecov bot commented Sep 9, 2022

Codecov Report

Merging #815 (24b5520) into main (af802fd) will increase coverage by 0.10%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##             main     #815      +/-   ##
==========================================
+ Coverage   93.17%   93.27%   +0.10%     
==========================================
  Files          46       46              
  Lines        1933     1962      +29     
  Branches      242      247       +5     
==========================================
+ Hits         1801     1830      +29     
  Misses        103      103              
  Partials       29       29              
Impacted Files Coverage Δ
python-sdk/src/astro/settings.py 100.00% <100.00%> (ø)
...thon-sdk/src/astro/sql/operators/base_decorator.py 94.28% <100.00%> (+0.11%) ⬆️
python-sdk/src/astro/sql/operators/raw_sql.py 89.28% <100.00%> (+2.92%) ⬆️
python-sdk/src/astro/sql/operators/merge.py 100.00% <0.00%> (ø)
python-sdk/src/astro/sql/operators/append.py 100.00% <0.00%> (ø)
python-sdk/src/astro/sql/operators/load_file.py 97.18% <0.00%> (+0.04%) ⬆️
python-sdk/src/astro/sql/operators/dataframe.py 93.33% <0.00%> (+0.09%) ⬆️
python-sdk/src/astro/sql/operators/export_file.py 94.59% <0.00%> (+0.65%) ⬆️
python-sdk/src/astro/airflow/datasets.py 94.44% <0.00%> (+11.11%) ⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.


#: Reduce responses sizes returned by aql.run_raw_sql to avoid trashing the Airflow DB if the BaseXCom is used.
RAW_SQL_MAX_RESPONSE_SIZE = conf.getint(
section="astro_sdk", key="run_raw_sql_response_size", fallback=-1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
section="astro_sdk", key="run_raw_sql_response_size", fallback=-1
section="astro_sdk", key="run_raw_sql_response_size", fallback=0

WDYT of making it consistent with response_limit i.e. setting it to 0?

Copy link
Collaborator Author

@tatiana tatiana Sep 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@feluelle I find the proposed change more confusing to users. It seems more intuitive that response_size set to 0 means to return zero results, as opposed to disabling response_size checking. The -1 gives clarity: that we are not checking response_size.

But I'm all for consistency, so if we agree to keep as it was, we can change response_limit to behave in the same way.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But why would you want to set it to 0 ever (to return zero results)? There is no use-case for that or?

We can also use int | None if this is better? 😅

Copy link
Collaborator Author

@tatiana tatiana Sep 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ATM, I can't think of an obvious situation when the user would like to set response_size=0, but I believe it is a fair assumption that if a user did this, they'd expect the decorator to return zero results, as opposed to not limit the response size.

I'd be happy with using None to represent no response size. However, this would make the code more complex, including:

RAW_SQL_MAX_RESPONSE_SIZE = conf.getint(
section="astro_sdk", key="run_raw_sql_response_size", fallback=-1
)

Do you think this change is worth the additional complexity?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tatiana @feluelle For me, -1 is better than explicitly saying to return all results and 0 means don't send any response back.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just made response_limit consistent with the proposed implementation for response_size:

  • -1 means disabled
  • 0 means it will raise an exception if the response size is 0 or higher

if self.handler:
response = self.handler(result)
if self.response_limit and len(response) > self.response_limit:
raise IllegalLoadToDatabaseException() # pragma: no cover
return response
if self.response_size >= 0:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

..then changing this to:

Suggested change
if self.response_size >= 0:
if self.response_size > 0:

Copy link
Collaborator Author

@tatiana tatiana Sep 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(discussion in the previous thread)

Behaviour before:
- 0 meant the user did not set a response_limit
- user was not able to raise an exception if the response was 0

Behaviour now:
- 0 means the user expects response_limit to raise an exception if the response is 0 or more
- -1 is used to disable response_limit

This change is being done so we are consistent with the behaviour of response_size
@@ -27,7 +27,8 @@ def __init__(
handler: Function | None = None,
database: str | None = None,
schema: str | None = None,
response_limit: int = 0,
response_limit: int = -1,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tatiana Probably too late for the question, but what's the difference between response_limit and response_size?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, shouldn't we assign default values from RAW_SQL_MAX_RESPONSE_SIZE?

Copy link
Collaborator Author

@tatiana tatiana Sep 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@utkarsharma2:

  • response_limit: raises an exception if the response limit is passed, used by the get_value_list dynamic task template. It was introduced by Add get_value_list for Dynamic Task  #673, to avoid passing the limit of allowed parallel tasks of Airflow itself. It is a hard limit which is not user-defined. If there are more values than this limit, the DTT will fail.
  • response_size: "cuts" or "trims" the number of items in the response of the run_raw_sql decorator. This was done as a follow up from @kaxil's comment 5b93df8#r83305025. This PR avoids introducing a breaking change, so we log a warning if the user uses the default backend and does not set this size. If the user sets this size, we cut the response without raising an exception.

Since we do not want to introduce a breaking change, I couldn't just reuse response_limit.

Both response_limit and response_size are defined in the operators which use them.

We're already using RAW_SQL_MAX_RESPONSE_SIZE as a default for the run_raw_sql operator, so I didn't see the benefit of setting the same default in both places. But I can do this if you feel strong about it

@tatiana tatiana changed the title Trim responses of run_raw_sql to desired size Add response_size to run_raw_sql and warn about db thrashing Sep 9, 2022
Copy link
Collaborator

@utkarsharma2 utkarsharma2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@tatiana tatiana merged commit 97500ed into main Sep 9, 2022
@tatiana tatiana deleted the issue-791 branch September 9, 2022 14:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
product/python-sdk Label describing products
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Set a configurable limit to run_raw_sql rows output
3 participants