Reduce direct memory OOM chances on broker with a per server query response size budget #11710

vvivekiyer · 2023-09-29T06:00:04Z

When server responds for a query with a large response, the broker can potentially crash with direct memory OOM.

In PR #11496 - a fix was added to restart the Netty Channel in such scenarios. This will result in all active queries failing with an error. This is a good fix to have to protect Brokers in this extreme case.

However, to reduce the probability of such events happening and contain the impact for other queries, this PR introduces a threshold at the Server. If the serialized query response at a Server exceeds the threshold, the query is failed. The overall threshold for a query is set using a config. This budget is divided across all Servers processing the query.

cc: @siddharthteotia @jasperjiaguo @gortiz @ege-st @dinoocch

pinot-core/src/main/java/org/apache/pinot/core/query/scheduler/QueryScheduler.java

...on-tests/src/test/java/org/apache/pinot/integration/tests/OfflineClusterIntegrationTest.java

gortiz · 2023-09-29T06:43:55Z

I've added some notes, but I think the PR is in good shape

codecov-commenter · 2023-09-29T06:47:32Z

Codecov Report

Merging #11710 (53bf6d2) into master (c3cb5c0) will increase coverage by 0.06%.
Report is 50 commits behind head on master.
The diff coverage is 27.63%.

@@             Coverage Diff              @@
##             master   #11710      +/-   ##
============================================
+ Coverage     63.08%   63.15%   +0.06%     
- Complexity      207     1141     +934     
============================================
  Files          2342     2343       +1     
  Lines        125883   126503     +620     
  Branches      19357    19460     +103     
============================================
+ Hits          79410    79889     +479     
- Misses        40822    40930     +108     
- Partials       5651     5684      +33

Flag	Coverage Δ
custom-integration1	`<0.01% <0.00%> (?)`
integration	`<0.01% <0.00%> (-0.01%)`	⬇️
integration1	`<0.01% <0.00%> (-0.01%)`	⬇️
integration2	`0.00% <0.00%> (ø)`
java-11	`63.10% <27.63%> (+0.04%)`	⬆️
java-17	`63.02% <27.63%> (+13.01%)`	⬆️
java-20	`62.99% <27.63%> (-4.09%)`	⬇️
temurin	`63.15% <27.63%> (+0.06%)`	⬆️
unittests	`63.14% <27.63%> (+0.06%)`	⬆️
unittests1	`67.33% <33.33%> (+0.08%)`	⬆️
unittests2	`14.39% <13.15%> (-0.03%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files	Coverage Δ
...a/org/apache/pinot/common/metrics/ServerMeter.java	`97.67% <100.00%> (+0.02%)`	⬆️
...va/org/apache/pinot/spi/utils/CommonConstants.java	`28.00% <ø> (ø)`
...org/apache/pinot/spi/config/table/QueryConfig.java	`82.35% <71.42%> (-8.56%)`	⬇️
...e/pinot/common/utils/config/QueryOptionsUtils.java	`60.52% <16.66%> (-8.23%)`	⬇️
...che/pinot/core/query/scheduler/QueryScheduler.java	`66.45% <16.66%> (-4.02%)`	⬇️
...roker/requesthandler/BaseBrokerRequestHandler.java	`46.09% <23.25%> (-0.30%)`	⬇️

... and 155 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

ege-st · 2023-09-29T14:25:39Z

Cool, thanks! I think reducing the likelyhood of DM OOMs is the logical next step.

How will a user determine what the threshold should be? A critical requirement for a feature like this is to introduce it such that it does suddenly start breaking queries that were working before the threshold was enabled; so, a guide on how to determine what the threshold should be is, I think, critical for safe adoption.

vvivekiyer · 2023-09-29T17:06:40Z

How will a user determine what the threshold should be?

@ege-st
Good question. We did think about this. The ideal answer is that the value is use case dependent.
However, a safe starting point is to set the threshold limit to max direct memory size on broker host - this value is conservative to not break existing queries and at the same time, it reduces the probability of brokering OOMing. For the usecase that we saw in our production environment, this value works.
Also note that this setting is disabled by default in this PR. It can be enabled on a case-by-case basis.

pinot-spi/src/main/java/org/apache/pinot/spi/utils/CommonConstants.java

pinot-broker/src/main/java/org/apache/pinot/broker/requesthandler/BaseBrokerRequestHandler.java

jasperjiaguo

LGTM over all, I think the behavior of having both broker config and query option to set this is
pretty much like the global threshold + query option we do for timeout, easpecially when the direct memory threshold can be conservative and not considering data skew. My only concern is the user need to be instructed to not abuse the option and kill the cluster, or we can have a swich to turn it off centrally (for security).

pinot-core/src/main/java/org/apache/pinot/core/query/scheduler/QueryScheduler.java

pinot-broker/src/main/java/org/apache/pinot/broker/requesthandler/BaseBrokerRequestHandler.java

pinot-core/src/main/java/org/apache/pinot/core/query/scheduler/QueryScheduler.java

pinot-broker/src/main/java/org/apache/pinot/broker/requesthandler/BaseBrokerRequestHandler.java

Jackie-Jiang

You may also consider adding a table level override for this config

pinot-spi/src/main/java/org/apache/pinot/spi/utils/CommonConstants.java

pinot-core/src/main/java/org/apache/pinot/core/query/scheduler/QueryScheduler.java

siddharthteotia · 2023-10-04T08:23:05Z

I am still not convinced of how we have wired the config aspect.

At a high level, I suggest we do the same thing as timeOut config since that is exactly what we seem to be doing here with an instance level config and then a query level config (in the form of query option) IIUC. So may be should also leverage queryConfig at the table level as well and have the preference as:

queryOption (query level) > table level > instance level

Also this may be a nit but the following 2 configs in CommonConstants is not super intuitive to me.

public static final long DEFAULT_MAX_QUERY_RESPONSE_SIZE_BYTES = Long.MAX_VALUE;

public static final String MAX_SERVER_RESPONSE_SIZE_BYTES = "maxServerResponseSizeBytes";

One problem I see is that while (1) is an instance config but tells for the query and therefore we internally compute the per server threshold / budget during routing depending on how many servers we are routing to.

However, (2) seems to be the queryOption way of doing things but here we are having the user directly specify the server level budget instead of overall. Why this difference ?

vvivekiyer · 2023-10-05T20:16:36Z

@siddharthteotia Added tableConfig and overriding sequence.

The reasoning behind adding a broker level instance config was - the broker ultimately should decide how much response size it should get for each query (depending on it's direct memory limits). If this broker instance config is set, we use that to set the query option to limit response size.

I'm open to suggestions if the broker instance config doesn't make sense.

Jackie-Jiang · 2023-10-06T22:05:58Z

I understand the intention of tracking broker side total memory usage, but evenly splitting it into server side memory limit might cause inconsistent behavior because the actually limit relies on the fanout, and it assumes evenly distribution of the results from each server.
Since the direct control is size limit per server, we should at least allow configuring it directly. If it is not configured, we can fall back to broker total memory limit.

vvivekiyer · 2023-10-06T22:38:37Z

@Jackie-Jiang got it. Based on the suggestions, I've added a server side limit in the table config. Would you prefer that we add an instance config for the server in addition to the table config setting?

Jackie-Jiang · 2023-10-06T22:52:46Z

@vvivekiyer I'm thinking 2 configs - perServer & perQuery. Both of them can be set in 3 places: query option > table config > broker instance config (per server one can also be configured on broker). perServer takes precedence, and if not configured, fall back to perQuery. So overall the fallback order should be perServer query option > perQuery query option > perServer table config > perQuery table config > perServer instance config > perQuery instance config

We need some documentation to clear document how these 2 configs are used

vvivekiyer · 2023-10-13T21:29:12Z

@Jackie-Jiang added perServer and perQuery configs. Please take a look.

I'll update the pinot docs with this behavior once this code is merged.

Jackie-Jiang

LGTM otherwise

pinot-broker/src/main/java/org/apache/pinot/broker/requesthandler/BaseBrokerRequestHandler.java

pinot-common/src/main/java/org/apache/pinot/common/metrics/BrokerMeter.java

Jackie-Jiang · 2024-01-17T01:30:23Z

@vvivekiyer Have you added these configs to the pinot doc?

vvivekiyer · 2024-02-23T19:13:00Z

@Jackie-Jiang Added documentations for broker configs, queryOption and table configs.

vvivekiyer force-pushed the nettyDirectOOM branch from 53b16a8 to 445ab73 Compare September 29, 2023 06:06

vvivekiyer changed the title ~~Throw exception if response size exceeds thresholds~~ Fix Direct Memory OOM on broker - Part 2 Sep 29, 2023

vvivekiyer changed the title ~~Fix Direct Memory OOM on broker - Part 2~~ Fix Direct Memory OOM on broker by limiting query response size Sep 29, 2023

vvivekiyer marked this pull request as ready for review September 29, 2023 06:13

gortiz reviewed Sep 29, 2023

View reviewed changes

pinot-core/src/main/java/org/apache/pinot/core/query/scheduler/QueryScheduler.java Outdated Show resolved Hide resolved

gortiz reviewed Sep 29, 2023

View reviewed changes

pinot-core/src/main/java/org/apache/pinot/core/query/scheduler/QueryScheduler.java Outdated Show resolved Hide resolved

gortiz reviewed Sep 29, 2023

View reviewed changes

...on-tests/src/test/java/org/apache/pinot/integration/tests/OfflineClusterIntegrationTest.java Outdated Show resolved Hide resolved

siddharthteotia reviewed Sep 29, 2023

View reviewed changes

pinot-spi/src/main/java/org/apache/pinot/spi/utils/CommonConstants.java Outdated Show resolved Hide resolved

siddharthteotia reviewed Sep 29, 2023

View reviewed changes

pinot-spi/src/main/java/org/apache/pinot/spi/utils/CommonConstants.java Outdated Show resolved Hide resolved

siddharthteotia reviewed Sep 29, 2023

View reviewed changes

pinot-broker/src/main/java/org/apache/pinot/broker/requesthandler/BaseBrokerRequestHandler.java Outdated Show resolved Hide resolved

jasperjiaguo reviewed Sep 29, 2023

View reviewed changes

siddharthteotia reviewed Sep 29, 2023

View reviewed changes

pinot-core/src/main/java/org/apache/pinot/core/query/scheduler/QueryScheduler.java Outdated Show resolved Hide resolved

siddharthteotia changed the title ~~Fix Direct Memory OOM on broker by limiting query response size~~ Reduce direct memory OOM chances on broker with a per server query response size budget Sep 29, 2023

ege-st suggested changes Sep 29, 2023

View reviewed changes

Jackie-Jiang added feature release-notes Referenced by PRs that need attention when compiling the next release notes Configuration Config changes (addition/deletion/change in behavior) labels Sep 30, 2023

Jackie-Jiang reviewed Sep 30, 2023

View reviewed changes

vvivekiyer added 2 commits October 2, 2023 15:08

Add thresholds to limit query response size at server

4950138

Address review comments

889ecaf

vvivekiyer force-pushed the nettyDirectOOM branch from 89c634c to 889ecaf Compare October 2, 2023 22:09

siddharthteotia reviewed Oct 4, 2023

View reviewed changes

pinot-core/src/main/java/org/apache/pinot/core/query/scheduler/QueryScheduler.java Outdated Show resolved Hide resolved

vvivekiyer force-pushed the nettyDirectOOM branch from 48f169f to ccf29fe Compare October 5, 2023 20:50

Add table config and tests

e67f2b1

vvivekiyer force-pushed the nettyDirectOOM branch from ccf29fe to e67f2b1 Compare October 5, 2023 21:28

Add perServer and perQuery response size limits

c0db5cc

vvivekiyer requested review from Jackie-Jiang and siddharthteotia October 13, 2023 21:29

Jackie-Jiang approved these changes Oct 16, 2023

View reviewed changes

Address minor review comments

53bf6d2

siddharthteotia approved these changes Oct 18, 2023

View reviewed changes

siddharthteotia merged commit d1222e7 into apache:master Oct 18, 2023
19 checks passed

Jackie-Jiang mentioned this pull request Nov 2, 2023

Ask server to directly return final result for queries hitting single server #11938

Merged

Jackie-Jiang added the documentation label Jan 17, 2024

vvivekiyer deleted the nettyDirectOOM branch February 23, 2024 19:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce direct memory OOM chances on broker with a per server query response size budget #11710

Reduce direct memory OOM chances on broker with a per server query response size budget #11710

vvivekiyer commented Sep 29, 2023 •

edited

gortiz commented Sep 29, 2023

codecov-commenter commented Sep 29, 2023 •

edited

ege-st commented Sep 29, 2023

vvivekiyer commented Sep 29, 2023 •

edited

jasperjiaguo left a comment •

edited

Jackie-Jiang left a comment

siddharthteotia commented Oct 4, 2023

vvivekiyer commented Oct 5, 2023

Jackie-Jiang commented Oct 6, 2023

vvivekiyer commented Oct 6, 2023 •

edited

Jackie-Jiang commented Oct 6, 2023

vvivekiyer commented Oct 13, 2023 •

edited

Jackie-Jiang left a comment

Jackie-Jiang commented Jan 17, 2024

vvivekiyer commented Feb 23, 2024

Reduce direct memory OOM chances on broker with a per server query response size budget #11710

Reduce direct memory OOM chances on broker with a per server query response size budget #11710

Conversation

vvivekiyer commented Sep 29, 2023 • edited

gortiz commented Sep 29, 2023

codecov-commenter commented Sep 29, 2023 • edited

Codecov Report

ege-st commented Sep 29, 2023

vvivekiyer commented Sep 29, 2023 • edited

jasperjiaguo left a comment • edited

Choose a reason for hiding this comment

Jackie-Jiang left a comment

Choose a reason for hiding this comment

siddharthteotia commented Oct 4, 2023

vvivekiyer commented Oct 5, 2023

Jackie-Jiang commented Oct 6, 2023

vvivekiyer commented Oct 6, 2023 • edited

Jackie-Jiang commented Oct 6, 2023

vvivekiyer commented Oct 13, 2023 • edited

Jackie-Jiang left a comment

Choose a reason for hiding this comment

Jackie-Jiang commented Jan 17, 2024

vvivekiyer commented Feb 23, 2024

vvivekiyer commented Sep 29, 2023 •

edited

codecov-commenter commented Sep 29, 2023 •

edited

vvivekiyer commented Sep 29, 2023 •

edited

jasperjiaguo left a comment •

edited

vvivekiyer commented Oct 6, 2023 •

edited

vvivekiyer commented Oct 13, 2023 •

edited