Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce direct memory OOM chances on broker with a per server query response size budget #11710

Merged
merged 5 commits into from Oct 18, 2023

Conversation

vvivekiyer
Copy link
Contributor

@vvivekiyer vvivekiyer commented Sep 29, 2023

When server responds for a query with a large response, the broker can potentially crash with direct memory OOM.

In PR #11496 - a fix was added to restart the Netty Channel in such scenarios. This will result in all active queries failing with an error. This is a good fix to have to protect Brokers in this extreme case.

However, to reduce the probability of such events happening and contain the impact for other queries, this PR introduces a threshold at the Server. If the serialized query response at a Server exceeds the threshold, the query is failed. The overall threshold for a query is set using a config. This budget is divided across all Servers processing the query.

cc: @siddharthteotia @jasperjiaguo @gortiz @ege-st @dinoocch

@vvivekiyer vvivekiyer changed the title Throw exception if response size exceeds thresholds Fix Direct Memory OOM on broker - Part 2 Sep 29, 2023
@vvivekiyer vvivekiyer changed the title Fix Direct Memory OOM on broker - Part 2 Fix Direct Memory OOM on broker by limiting query response size Sep 29, 2023
@vvivekiyer vvivekiyer marked this pull request as ready for review September 29, 2023 06:13
@gortiz
Copy link
Contributor

gortiz commented Sep 29, 2023

I've added some notes, but I think the PR is in good shape

@codecov-commenter
Copy link

codecov-commenter commented Sep 29, 2023

Codecov Report

Merging #11710 (53bf6d2) into master (c3cb5c0) will increase coverage by 0.06%.
Report is 50 commits behind head on master.
The diff coverage is 27.63%.

@@             Coverage Diff              @@
##             master   #11710      +/-   ##
============================================
+ Coverage     63.08%   63.15%   +0.06%     
- Complexity      207     1141     +934     
============================================
  Files          2342     2343       +1     
  Lines        125883   126503     +620     
  Branches      19357    19460     +103     
============================================
+ Hits          79410    79889     +479     
- Misses        40822    40930     +108     
- Partials       5651     5684      +33     
Flag Coverage Δ
custom-integration1 <0.01% <0.00%> (?)
integration <0.01% <0.00%> (-0.01%) ⬇️
integration1 <0.01% <0.00%> (-0.01%) ⬇️
integration2 0.00% <0.00%> (ø)
java-11 63.10% <27.63%> (+0.04%) ⬆️
java-17 63.02% <27.63%> (+13.01%) ⬆️
java-20 62.99% <27.63%> (-4.09%) ⬇️
temurin 63.15% <27.63%> (+0.06%) ⬆️
unittests 63.14% <27.63%> (+0.06%) ⬆️
unittests1 67.33% <33.33%> (+0.08%) ⬆️
unittests2 14.39% <13.15%> (-0.03%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Coverage Δ
...a/org/apache/pinot/common/metrics/ServerMeter.java 97.67% <100.00%> (+0.02%) ⬆️
...va/org/apache/pinot/spi/utils/CommonConstants.java 28.00% <ø> (ø)
...org/apache/pinot/spi/config/table/QueryConfig.java 82.35% <71.42%> (-8.56%) ⬇️
...e/pinot/common/utils/config/QueryOptionsUtils.java 60.52% <16.66%> (-8.23%) ⬇️
...che/pinot/core/query/scheduler/QueryScheduler.java 66.45% <16.66%> (-4.02%) ⬇️
...roker/requesthandler/BaseBrokerRequestHandler.java 46.09% <23.25%> (-0.30%) ⬇️

... and 155 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@ege-st
Copy link
Contributor

ege-st commented Sep 29, 2023

Cool, thanks! I think reducing the likelyhood of DM OOMs is the logical next step.

How will a user determine what the threshold should be? A critical requirement for a feature like this is to introduce it such that it does suddenly start breaking queries that were working before the threshold was enabled; so, a guide on how to determine what the threshold should be is, I think, critical for safe adoption.

@vvivekiyer
Copy link
Contributor Author

vvivekiyer commented Sep 29, 2023

How will a user determine what the threshold should be?

@ege-st
Good question. We did think about this. The ideal answer is that the value is use case dependent.
However, a safe starting point is to set the threshold limit to max direct memory size on broker host - this value is conservative to not break existing queries and at the same time, it reduces the probability of brokering OOMing. For the usecase that we saw in our production environment, this value works.
Also note that this setting is disabled by default in this PR. It can be enabled on a case-by-case basis.

Copy link
Contributor

@jasperjiaguo jasperjiaguo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM over all, I think the behavior of having both broker config and query option to set this is
pretty much like the global threshold + query option we do for timeout, easpecially when the direct memory threshold can be conservative and not considering data skew. My only concern is the user need to be instructed to not abuse the option and kill the cluster, or we can have a swich to turn it off centrally (for security).

@siddharthteotia siddharthteotia changed the title Fix Direct Memory OOM on broker by limiting query response size Reduce direct memory OOM chances on broker with a per server query response size budget Sep 29, 2023
@Jackie-Jiang Jackie-Jiang added feature release-notes Referenced by PRs that need attention when compiling the next release notes Configuration Config changes (addition/deletion/change in behavior) labels Sep 30, 2023
Copy link
Contributor

@Jackie-Jiang Jackie-Jiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may also consider adding a table level override for this config

@siddharthteotia
Copy link
Contributor

I am still not convinced of how we have wired the config aspect.

At a high level, I suggest we do the same thing as timeOut config since that is exactly what we seem to be doing here with an instance level config and then a query level config (in the form of query option) IIUC. So may be should also leverage queryConfig at the table level as well and have the preference as:

queryOption (query level) > table level > instance level

Also this may be a nit but the following 2 configs in CommonConstants is not super intuitive to me.

public static final long DEFAULT_MAX_QUERY_RESPONSE_SIZE_BYTES = Long.MAX_VALUE;

public static final String MAX_SERVER_RESPONSE_SIZE_BYTES = "maxServerResponseSizeBytes";

One problem I see is that while (1) is an instance config but tells for the query and therefore we internally compute the per server threshold / budget during routing depending on how many servers we are routing to.

However, (2) seems to be the queryOption way of doing things but here we are having the user directly specify the server level budget instead of overall. Why this difference ?

@vvivekiyer
Copy link
Contributor Author

@siddharthteotia Added tableConfig and overriding sequence.

The reasoning behind adding a broker level instance config was - the broker ultimately should decide how much response size it should get for each query (depending on it's direct memory limits). If this broker instance config is set, we use that to set the query option to limit response size.

I'm open to suggestions if the broker instance config doesn't make sense.

@Jackie-Jiang
Copy link
Contributor

I understand the intention of tracking broker side total memory usage, but evenly splitting it into server side memory limit might cause inconsistent behavior because the actually limit relies on the fanout, and it assumes evenly distribution of the results from each server.
Since the direct control is size limit per server, we should at least allow configuring it directly. If it is not configured, we can fall back to broker total memory limit.

@vvivekiyer
Copy link
Contributor Author

vvivekiyer commented Oct 6, 2023

@Jackie-Jiang got it. Based on the suggestions, I've added a server side limit in the table config. Would you prefer that we add an instance config for the server in addition to the table config setting?

@Jackie-Jiang
Copy link
Contributor

@vvivekiyer I'm thinking 2 configs - perServer & perQuery. Both of them can be set in 3 places: query option > table config > broker instance config (per server one can also be configured on broker). perServer takes precedence, and if not configured, fall back to perQuery. So overall the fallback order should be perServer query option > perQuery query option > perServer table config > perQuery table config > perServer instance config > perQuery instance config

We need some documentation to clear document how these 2 configs are used

@vvivekiyer
Copy link
Contributor Author

vvivekiyer commented Oct 13, 2023

@Jackie-Jiang added perServer and perQuery configs. Please take a look.

I'll update the pinot docs with this behavior once this code is merged.

Copy link
Contributor

@Jackie-Jiang Jackie-Jiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM otherwise

@siddharthteotia siddharthteotia merged commit d1222e7 into apache:master Oct 18, 2023
19 checks passed
@Jackie-Jiang
Copy link
Contributor

@vvivekiyer Have you added these configs to the pinot doc?

@vvivekiyer
Copy link
Contributor Author

@Jackie-Jiang Added documentations for broker configs, queryOption and table configs.

@vvivekiyer vvivekiyer deleted the nettyDirectOOM branch February 23, 2024 19:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Configuration Config changes (addition/deletion/change in behavior) documentation feature release-notes Referenced by PRs that need attention when compiling the next release notes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants