Query pipeline refactoring pushing down type conversion to reduce memory consumption / GC pressure #27440

FabianMeiswinkel · 2022-03-04T00:12:15Z

Description

Summary

This PR is another iteration to reduce the memory footprint / JVM GC pressure when running queries with large record count (like in many OLTP Spark scenarios). In version 4.6.1 of the Spark connector (and earlier) we had an issue because pre-fetching/buffering was too aggressive (or even unbounded) - which would cause extremely high memory consumption and GC pressure when the consumer is slower of data being retrieved form the Cosmos backend. With version 4.6.2 pre-fetch has been changed - and data will only minimally be buffered - which resulted in stable memory consumption / GC-pressure when the consumer is slower than data flows in from the backend.
This PR further improves memory footprint / GC pressure by pushing down the conversion from ObjectNode/Json into the targeted type (and allowing ObjectNode/Json --> SparkRow transformation). This helps reducing memory pressure because a little simplified an ObjectNode holds data + schema - while the strong type (Spark Row, POJO) doesn't need to encode the schema info for every record. Because the conversion is done earlier, the JVM GC can clean-up the ObjectNode/Json representation sooner - without any thread switches - which reduces the risk that ObjectNode instances get propagated to long-living sections of the GC (which are more expensive to clean-up)

The main improvement in this trivial use case (SELECT * FROM c) doing a count() is the GC usage

4.6.2 --> 1.8 hours total time in GC across all executors
This PR --> 25 minutes total time in GC across all executors

A run against the customer's repro notebook shows a similar pattern - but the improvement on latency, memory footprint and CPU are slightly better, because the client resources are more under pressure. But there as well the significant improvement for time in GC is the main win.

Important design change

Before the query pipeline was assembled via PipelinedDocumentQueryExecutionContext.createAsync - the pipeline created multiple pipeline components - for example for aggregates, order by, gour by etc. that would process the query. All pipeline components would operate on Doucment (a Resource->JsonSerializable subclass). For some of the pipeline components right now processing is only possible on Document because they modify the json payload etc.
So to allow the push-down of type conversion on the hot path buts till support those query constructs requiring a pipeline component operating on Document types I have split the Query Pipeline into a fast path with transformation pushdown and a slow path that would keep operating on Document and only transform into T at the end.
Pipeline construction happens in PipelinedQueryExecutionContextBase.createAsync. PipelinedDocumentQueryExecutionContext operates all pipeline components via Document (slow path) - PipelinedQueryExecutionContext is the fast path - all allowed pipeline components can handle any T.

        **if (queryInfo.hasOrderBy() || queryInfo.hasAggregates() || queryInfo.hasGroupBy() || queryInfo.hasDCount()) {**
            return PipelinedDocumentQueryExecutionContext.createAsyncCore(diagnosticsClientContext, client, initParams, pageSize, factoryMethod);
        }

        return PipelinedQueryExecutionContext.createAsyncCore(diagnosticsClientContext, client, initParams, pageSize, factoryMethod);

High-level design

Reducing the too aggressive pre-fetching stabilizes the memory footprint / GC pressure

Pushing down the pipe conversion into the Query pipeline - doing it immediately after retrieving the response from the backend can reduce pressure on GC - but requires pretty extensive refactoring of the query pipeline.

Perf test

Baseline (4.6.2)

Results

Executors (Time in GC)

Relatively high GC pressure (about 10% time spent in GC)

Ganglia metrics

Heap stats (snapshot of 1 executor)

Shows relatively high memory allocation due to ObjectNodes and its children

This PR

Results

Executors (Time in GC)

Significantly lower GC pressure - saves nearly 70% of time spent in GC

Ganglia metrics

Heap stats (snapshot of 1 executor)

Lower memory allocation for ObjectNodes (and children) - therefore SparkRowItem - the Spark row representation. Memory footprint is lower, because the SparkRowItem doesn't need to encode schema info for every single row.

Client telemetry comparison (CPU)

cluster('cdbsupport.kusto.windows.net')
.database('Support')
.ClientRequest
| where TIMESTAMP > datetime(2022-03-10 15:00:00.0000000)
| where TIMESTAMP < datetime(2022-03-10 16:00:00.0000000)
| where GlobalDatabaseAccountName == "fabianm-oltp-spark-workshop-cdb"
| where MetricsName == "CPU"
| where UserAgent endswith "QryComparison"
| project ClientTimestamp, ProcessId, Min, Max, Mean, Percentile90
| summarize avg(Mean), avg(Min), avg(Max), avg(Percentile90) by bin(ClientTimestamp, 1m)
| render timechart

About 3 percent lower CPU usage overall

Client telemetry comparison (Memory)

cluster('cdbsupport.kusto.windows.net')
.database('Support')
.ClientRequest
| where TIMESTAMP > datetime(2022-03-10 15:00:00.0000000)
| where TIMESTAMP < datetime(2022-03-10 16:00:00.0000000)
| where GlobalDatabaseAccountName == "fabianm-oltp-spark-workshop-cdb"
| where MetricsName == "MemoryRemaining"
| where UserAgent endswith "QryComparison"
| project ClientTimestamp, ProcessId, Min, Max, Mean
| summarize avg(Mean), avg(Min), avg(Max) by bin(ClientTimestamp, 1m)
| render timechart

Also 3-5 percent less memory footprint

All SDK Contribution checklist:

The pull request does not introduce [breaking changes]
CHANGELOG is updated for new features, bug fixes or other significant changes.
I have read the contribution guidelines.

General Guidelines and Best Practices

Title of the pull request is clear and informative.
There are a small number of commits, each of which have an informative message. This means that previously merged commits do not appear in the history of the PR. For more information on cleaning up the commits in your PR, see this page.

Testing Guidelines

Pull request includes test coverage for the included changes.

azure-sdk · 2022-03-04T00:21:38Z

API changes have been detected in com.azure:azure-cosmos. You can review API changes here

check-enforcer · 2022-03-07T23:53:32Z

This pull request is protected by Check Enforcer.

What is Check Enforcer?

Check Enforcer helps ensure all pull requests are covered by at least one check-run (typically an Azure Pipeline). When all check-runs associated with this pull request pass then Check Enforcer itself will pass.

Why am I getting this message?

You are getting this message because Check Enforcer did not detect any check-runs being associated with this pull request within five minutes. This may indicate that your pull request is not covered by any pipelines and so Check Enforcer is correctly blocking the pull request being merged.

What should I do now?

If the check-enforcer check-run is not passing and all other check-runs associated with this PR are passing (excluding license-cla) then you could try telling Check Enforcer to evaluate your pull request again. You can do this by adding a comment to this pull request as follows:
/check-enforcer evaluate
Typically evaulation only takes a few seconds. If you know that your pull request is not covered by a pipeline and this is expected you can override Check Enforcer using the following command:
/check-enforcer override
Note that using the override command triggers alerts so that follow-up investigations can occur (PRs still need to be approved as normal).

What if I am onboarding a new service?

Often, new services do not have validation pipelines associated with them, in order to bootstrap pipelines for a new service, you can issue the following command as a pull request comment:
/azp run prepare-pipelines
This will run a pipeline that analyzes the source tree and creates the pipelines necessary to build and validate your pull request. Once the pipeline has been created you can trigger the pipeline using the following comment:
/azp run java - [service] - ci

…FabianMeiswinkel/azure-sdk-for-java into users/fabianm/QueryOptimization

azure-sdk · 2022-03-08T11:50:42Z

API changes have been detected in com.azure:azure-cosmos. You can review API changes here

FabianMeiswinkel · 2022-03-12T00:10:43Z

/azp run java - cosmos - tests

azure-pipelines · 2022-03-12T00:10:58Z

Azure Pipelines successfully started running 1 pipeline(s).

FabianMeiswinkel · 2022-03-12T00:11:05Z

/azp run java - cosmos - spark

azure-pipelines · 2022-03-12T00:11:16Z

Azure Pipelines successfully started running 1 pipeline(s).

xinlian12

LGTM, thanks a lot Fabian, great findings and improvements 👍

FabianMeiswinkel · 2022-03-12T00:38:17Z

/azp run java - cosmos - tests

FabianMeiswinkel · 2022-03-12T00:38:29Z

/azp run java - cosmos - spark

azure-pipelines · 2022-03-12T00:38:32Z

Azure Pipelines successfully started running 1 pipeline(s).

azure-pipelines · 2022-03-12T00:38:38Z

Azure Pipelines successfully started running 1 pipeline(s).

kushagraThapar

Looks great @FabianMeiswinkel - thanks for the optimization.
One comment on public API surface change, please take a look. Thanks!

sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/models/FeedResponse.java

...rc/main/java/com/azure/cosmos/implementation/query/DocumentQueryExecutionContextFactory.java

azure-sdk · 2022-03-14T21:18:47Z

API changes have been detected in com.azure:azure-cosmos. You can review API changes here

FabianMeiswinkel · 2022-03-15T01:14:11Z

/azp run java - cosmos - tests

FabianMeiswinkel · 2022-03-15T01:14:25Z

/azp run java - cosmos - spark

azure-pipelines · 2022-03-15T01:14:26Z

Azure Pipelines successfully started running 1 pipeline(s).

azure-pipelines · 2022-03-15T01:14:35Z

Azure Pipelines successfully started running 1 pipeline(s).

FabianMeiswinkel · 2022-03-18T10:55:00Z

/azp run java - cosmos - spark

azure-pipelines · 2022-03-18T10:55:09Z

Azure Pipelines successfully started running 1 pipeline(s).

FabianMeiswinkel · 2022-03-18T10:55:15Z

/azp run java - cosmos - tests

azure-pipelines · 2022-03-18T10:55:29Z

Azure Pipelines successfully started running 1 pipeline(s).

Initial draft

02b04db

FabianMeiswinkel requested review from kushagraThapar, kirankumarkolli, mbhaskar, simplynaveen20, xinlian12, milismsft and aayush3011 as code owners March 4, 2022 00:12

ghost added the Cosmos label Mar 4, 2022

FabianMeiswinkel added 3 commits March 4, 2022 12:07

Static code analysis issues fixed

16bf822

Fixing test failures

6fa5bfb

Ready for tests

3b3e6df

FabianMeiswinkel added 3 commits March 8, 2022 09:52

Merge branch 'main' into users/fabianm/QueryOptimization

65fbc1c

Changing factoryMethods's signature to JsonNode -> T

5584373

Merge branch 'users/fabianm/QueryOptimization' of https://github.com/…

aef3c8a

…FabianMeiswinkel/azure-sdk-for-java into users/fabianm/QueryOptimization

FabianMeiswinkel added 2 commits March 8, 2022 12:03

SpotBug fixes

dc4f082

Test fixes

18205b2

FabianMeiswinkel requested review from backwind1233, chenrujun, hui1110, jialigit, saragluna, stliu and yiliuTo as code owners March 8, 2022 16:35

FabianMeiswinkel added 3 commits March 8, 2022 16:48

Removing secrets

db3a794

Test fixes

6404f10

Update ParallelDocumentQueryExecutionContext.java

96a1db1

Fixing SpotBug violation

c465996

xinlian12 approved these changes Mar 12, 2022

View reviewed changes

kushagraThapar approved these changes Mar 14, 2022

View reviewed changes

sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/models/FeedResponse.java Outdated Show resolved Hide resolved

kushagraThapar reviewed Mar 14, 2022

View reviewed changes

...rc/main/java/com/azure/cosmos/implementation/query/DocumentQueryExecutionContextFactory.java Show resolved Hide resolved

FabianMeiswinkel added 2 commits March 14, 2022 20:57

Fixing queryDistinctDoucments test

467295b

Hiding FeedResponse.convertGenericType form public surface area

ca09acd

FabianMeiswinkel added 3 commits March 18, 2022 11:21

Merge branch 'main' into users/fabianm/QueryOptimization

91a647d

Updated change logs

6235546

Update CHANGELOG.md

4faec8d

FabianMeiswinkel merged commit 80b12e4 into Azure:main Mar 18, 2022

kushagraThapar mentioned this pull request Apr 8, 2022

Release version update for cosmos SDKs #28136

Merged

FabianMeiswinkel mentioned this pull request Jul 19, 2022

Fixes a regression for distinct queries with PoJo serialization #30025

Merged

6 tasks

xinlian12 mentioned this pull request Jan 23, 2024

allowUsingFactoryMethodInCosmosQueryRequestOptionsForReadMany #38433

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Query pipeline refactoring pushing down type conversion to reduce memory consumption / GC pressure #27440

Query pipeline refactoring pushing down type conversion to reduce memory consumption / GC pressure #27440

FabianMeiswinkel commented Mar 4, 2022 •

edited

Loading

azure-sdk commented Mar 4, 2022

check-enforcer bot commented Mar 7, 2022

azure-sdk commented Mar 8, 2022

FabianMeiswinkel commented Mar 12, 2022

azure-pipelines bot commented Mar 12, 2022

FabianMeiswinkel commented Mar 12, 2022

azure-pipelines bot commented Mar 12, 2022

xinlian12 left a comment

FabianMeiswinkel commented Mar 12, 2022

FabianMeiswinkel commented Mar 12, 2022

azure-pipelines bot commented Mar 12, 2022

azure-pipelines bot commented Mar 12, 2022

kushagraThapar left a comment

azure-sdk commented Mar 14, 2022

FabianMeiswinkel commented Mar 15, 2022

FabianMeiswinkel commented Mar 15, 2022

azure-pipelines bot commented Mar 15, 2022

azure-pipelines bot commented Mar 15, 2022

FabianMeiswinkel commented Mar 18, 2022

azure-pipelines bot commented Mar 18, 2022

FabianMeiswinkel commented Mar 18, 2022

azure-pipelines bot commented Mar 18, 2022

Query pipeline refactoring pushing down type conversion to reduce memory consumption / GC pressure #27440

Query pipeline refactoring pushing down type conversion to reduce memory consumption / GC pressure #27440

Conversation

FabianMeiswinkel commented Mar 4, 2022 • edited Loading

Description

Summary

Important design change

High-level design

Perf test

Baseline (4.6.2)

Results

Executors (Time in GC)

Ganglia metrics

Heap stats (snapshot of 1 executor)

This PR

Results

Executors (Time in GC)

Ganglia metrics

Heap stats (snapshot of 1 executor)

Client telemetry comparison (CPU)

Client telemetry comparison (Memory)

All SDK Contribution checklist:

General Guidelines and Best Practices

Testing Guidelines

azure-sdk commented Mar 4, 2022

check-enforcer bot commented Mar 7, 2022

What is Check Enforcer?

Why am I getting this message?

What should I do now?

What if I am onboarding a new service?

azure-sdk commented Mar 8, 2022

FabianMeiswinkel commented Mar 12, 2022

azure-pipelines bot commented Mar 12, 2022

FabianMeiswinkel commented Mar 12, 2022

azure-pipelines bot commented Mar 12, 2022

xinlian12 left a comment

Choose a reason for hiding this comment

FabianMeiswinkel commented Mar 12, 2022

FabianMeiswinkel commented Mar 12, 2022

azure-pipelines bot commented Mar 12, 2022

azure-pipelines bot commented Mar 12, 2022

kushagraThapar left a comment

Choose a reason for hiding this comment

azure-sdk commented Mar 14, 2022

FabianMeiswinkel commented Mar 15, 2022

FabianMeiswinkel commented Mar 15, 2022

azure-pipelines bot commented Mar 15, 2022

azure-pipelines bot commented Mar 15, 2022

FabianMeiswinkel commented Mar 18, 2022

azure-pipelines bot commented Mar 18, 2022

FabianMeiswinkel commented Mar 18, 2022

azure-pipelines bot commented Mar 18, 2022

FabianMeiswinkel commented Mar 4, 2022 •

edited

Loading