-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add response_size to run_raw_sql and warn about db thrashing #815
Conversation
ac137bd
to
a04ef59
Compare
Codecov Report
@@ Coverage Diff @@
## main #815 +/- ##
==========================================
+ Coverage 93.17% 93.27% +0.10%
==========================================
Files 46 46
Lines 1933 1962 +29
Branches 242 247 +5
==========================================
+ Hits 1801 1830 +29
Misses 103 103
Partials 29 29
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
|
||
#: Reduce responses sizes returned by aql.run_raw_sql to avoid trashing the Airflow DB if the BaseXCom is used. | ||
RAW_SQL_MAX_RESPONSE_SIZE = conf.getint( | ||
section="astro_sdk", key="run_raw_sql_response_size", fallback=-1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
section="astro_sdk", key="run_raw_sql_response_size", fallback=-1 | |
section="astro_sdk", key="run_raw_sql_response_size", fallback=0 |
WDYT of making it consistent with response_limit
i.e. setting it to 0
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@feluelle I find the proposed change more confusing to users. It seems more intuitive that response_size
set to 0 means to return zero results, as opposed to disabling response_size
checking. The -1 gives clarity: that we are not checking response_size
.
But I'm all for consistency, so if we agree to keep as it was, we can change response_limit
to behave in the same way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But why would you want to set it to 0 ever (to return zero results)? There is no use-case for that or?
We can also use int | None
if this is better? 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ATM, I can't think of an obvious situation when the user would like to set response_size=0
, but I believe it is a fair assumption that if a user did this, they'd expect the decorator to return zero results, as opposed to not limit the response size.
I'd be happy with using None to represent no response size. However, this would make the code more complex, including:
astro-sdk/python-sdk/src/astro/settings.py
Lines 37 to 39 in e5bc9bb
RAW_SQL_MAX_RESPONSE_SIZE = conf.getint( | |
section="astro_sdk", key="run_raw_sql_response_size", fallback=-1 | |
) |
Do you think this change is worth the additional complexity?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just made response_limit
consistent with the proposed implementation for response_size
:
- -1 means disabled
- 0 means it will raise an exception if the response size is 0 or higher
if self.handler: | ||
response = self.handler(result) | ||
if self.response_limit and len(response) > self.response_limit: | ||
raise IllegalLoadToDatabaseException() # pragma: no cover | ||
return response | ||
if self.response_size >= 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
..then changing this to:
if self.response_size >= 0: | |
if self.response_size > 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(discussion in the previous thread)
Behaviour before: - 0 meant the user did not set a response_limit - user was not able to raise an exception if the response was 0 Behaviour now: - 0 means the user expects response_limit to raise an exception if the response is 0 or more - -1 is used to disable response_limit This change is being done so we are consistent with the behaviour of response_size
@@ -27,7 +27,8 @@ def __init__( | |||
handler: Function | None = None, | |||
database: str | None = None, | |||
schema: str | None = None, | |||
response_limit: int = 0, | |||
response_limit: int = -1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tatiana Probably too late for the question, but what's the difference between response_limit and response_size?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, shouldn't we assign default values from RAW_SQL_MAX_RESPONSE_SIZE?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
response_limit
: raises an exception if the response limit is passed, used by theget_value_list
dynamic task template. It was introduced by Addget_value_list
for Dynamic Task #673, to avoid passing the limit of allowed parallel tasks of Airflow itself. It is a hard limit which is not user-defined. If there are more values than this limit, the DTT will fail.response_size
: "cuts" or "trims" the number of items in the response of therun_raw_sql
decorator. This was done as a follow up from @kaxil's comment 5b93df8#r83305025. This PR avoids introducing a breaking change, so we log a warning if the user uses the default backend and does not set this size. If the user sets this size, we cut the response without raising an exception.
Since we do not want to introduce a breaking change, I couldn't just reuse response_limit
.
Both response_limit
and response_size
are defined in the operators which use them.
We're already using RAW_SQL_MAX_RESPONSE_SIZE as a default for the run_raw_sql
operator, so I didn't see the benefit of setting the same default in both places. But I can do this if you feel strong about it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Fix #791