-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Perf-improvement avoiding extra-buffer copy for query and point operations #38072
Perf-improvement avoiding extra-buffer copy for query and point operations #38072
Conversation
API change check APIView has identified API level changes in this PR and created following API reviews. |
…to users/fabianm/SparkQuryProfiling
/azp run java - cosmos - tests |
/azp run java - cosmos - spark |
Azure Pipelines successfully started running 1 pipeline(s). |
1 similar comment
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run java - cosmos - tests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be good to run few tests with netty byte buff leak detection enabled to make sure there are no obvious memory leaks.
Great work @FabianMeiswinkel, thank you!
I have used that to fix some of the initial issues I had. The memory leak detection errors showed up in local tests under debugger - and I have validated that they were gone with query and Cosmos Item tests. |
/azp run java - cosmos - spark |
/azp run java - cosmos - tests |
Azure Pipelines successfully started running 1 pipeline(s). |
1 similar comment
Azure Pipelines successfully started running 1 pipeline(s). |
...s/src/main/java/com/azure/cosmos/implementation/directconnectivity/JsonNodeStorePayload.java
Outdated
Show resolved
Hide resolved
...e-cosmos/src/main/java/com/azure/cosmos/implementation/directconnectivity/StoreResponse.java
Outdated
Show resolved
Hide resolved
sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/models/CosmosItemResponse.java
Outdated
Show resolved
Hide resolved
...os/azure-cosmos/src/main/java/com/azure/cosmos/implementation/batch/BatchResponseParser.java
Outdated
Show resolved
Hide resolved
...os/azure-cosmos/src/main/java/com/azure/cosmos/implementation/batch/BatchResponseParser.java
Outdated
Show resolved
Hide resolved
...s/src/main/java/com/azure/cosmos/implementation/directconnectivity/JsonNodeStorePayload.java
Outdated
Show resolved
Hide resolved
...e-cosmos/src/main/java/com/azure/cosmos/implementation/directconnectivity/ResponseUtils.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks
/azp run java - cosmos - tests |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run java - cosmos - spark |
Azure Pipelines successfully started running 1 pipeline(s). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks @FabianMeiswinkel
…tions (Azure#38072) * Perf-improvement avoiding extra-buffer copy for query and point operations * Fixing JDK8 build errors * Update StoreResponseBuilder.java * Fixing test issues * Update BatchResponseParser.java * Update CosmosItemResponse.java * Update ResponseUtils.java * Fixing test issues * Test changes * Fixing test failures * Fixing encryption test failures * Update UtilsTest.java * Update ClientTelemetryTest.java * Fixing some encryption relates test failures * Update CosmosItemTest.java * Update CosmosItemResponse.java * Update CosmosItemResponse.java * Avoided double-deserialization in Gateway mode * Removing unnecessary imports * Update RxGatewayStoreModel.java * Addressing code review feedback
@FabianMeiswinkel @kushagraThapar As we upgraded from
I found a couple of year old #9802 which talks about the same error. Can you please look into it once? |
@varenyavv thanks for providing the exception details, can you also please provide the operation type you noticed this error on? I would like to know your workload pattern and throughput. Are you using direct or gateway mode? To help us investigate this issue, can you please create a separate github issue and provide all the necessary details? You can create the github issue on this repo and tag us. |
@kushagraThapar Thank you for your reply! We are using gateway mode. I created #39252 with all the necessary details. |
Sounds good, lets investigate it. |
Description
This PR is refactoring the transport to allow avoiding one buffer copy. Before this PR the flow of a request in the direct/rntbd hot path a bit simplified was:
Netty ByteBuf -> Copy into byte[] in RntbdResponse.toStoreResponse -> ObjectMapper deserialization off of the byte[]
The goal of this PR is to avoid the additional buffer copy -> allow directly deserializing from the netty buteBuf.
The perf results below show a good improvement (5-10 percent) - but to be clear only when the GC pressure is high - to simulate this in this benchmark I have set -Xmx to 2GB (max. heap restricted to 2 GB) and used a point-read benchmark reading docs >= 1.5 MB of size with concurrency of 35.
Kusto query against https://fabianm-perf-results.westeurope.kusto.windows.net/fabianm-java-perf-results
High-level design
Cosmos DB NoSql always requires documents to be stored as Json. Using this to allow parsing the Netty ByteBuf into a JsonNode directly - and normalizing on it. There are a few code paths that need special-casing (Queries extract docs form a json array, Stored procedures have special response payload etc.) - but on the hot path for point operations this always avoids one buffer copy. And this refactoring will also help with Binary implementation because we have a single place now where we need to support parsing Binary into a Json Node.
All SDK Contribution checklist:
General Guidelines and Best Practices
Testing Guidelines