Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved diagnostics with new models for StoreResponse, StoreResult and CosmosException #28620

Merged

Conversation

kushagraThapar
Copy link
Member

@kushagraThapar kushagraThapar commented May 3, 2022

  • Created StoreResultDiagnostics -> contains StoreResponseDiagnostics (diagnostics for store response in case of response and exception)
  • Added exceptionMessage and exceptionResponseHeaders to StoreResponseDiagnostics.
  • Added error handling logic in case of java.lang.Error.
  • Performance improvements related to headers in StoreResponse. Earlier all operations were O(n). Now with this change, they will be O(1).
  • Currently queryPlanCache is implemented as ThreadSafe LRU cache, with an instance of SynchronizedMap of fixed size 1000. This causes lock on the whole map during reads and writes (reads happen on every query call). To improve performance and removed the locking on the whole map, implemented queryPlanCache as ConcurrentHashMap with a fixed size of 1000. If customers have more than 1000 different queries, then the cache will clear after 1000 size.
  • Reasoning behind this change is to have queryPlanCache as a very simple cache, without any locking and performance overhead.
  • Reasoning behind 1000 size is to make sure customers use query spec and query params. If there is still need to increase the size, this can be done in future without any breaking change.

@azure-sdk
Copy link
Collaborator

azure-sdk commented May 3, 2022

API change check for com.azure:azure-cosmos

API changes have been detected in com.azure:azure-cosmos. You can review API changes here

API changes

-         @Warning public static void setGatewayRequestTimelineOnDiagnostics(CosmosDiagnostics cosmosDiagnostics, RequestTimeline requestTimeline) 
-         @Warning public static void recordGatewayResponse(CosmosDiagnostics cosmosDiagnostics, RxDocumentServiceRequest rxDocumentServiceRequest, StoreResponse storeResponse, CosmosException exception, GlobalEndpointManager globalEndpointManager) 
+         @Warning public static void recordGatewayResponse(CosmosDiagnostics cosmosDiagnostics, RxDocumentServiceRequest rxDocumentServiceRequest, StoreResponse storeResponse, GlobalEndpointManager globalEndpointManager) 
+         @Warning public static void recordGatewayResponse(CosmosDiagnostics cosmosDiagnostics, RxDocumentServiceRequest rxDocumentServiceRequest, CosmosException cosmosException, GlobalEndpointManager globalEndpointManager) 

@kushagraThapar
Copy link
Member Author

/azp run java - cosmos - tests

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

if (responseHeaders != null) {
for (Map.Entry<String, String> entry: responseHeaders.entrySet()) {
for (Map.Entry<String, String> entry : responseHeaders.entrySet()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: shouldn't that extra space be reverted?

Copy link
Member

@FabianMeiswinkel FabianMeiswinkel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love it - much cleaner.

…rors. Also added code for throwing any java.lang.Error
Copy link
Member

@FabianMeiswinkel FabianMeiswinkel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@kushagraThapar
Copy link
Member Author

/azp run java - cosmos - tests

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

private String content;

public static StoreResponseBuilder create() {
return new StoreResponseBuilder();
}

public StoreResponseBuilder() {
headerEntries = new ArrayList<>();
headers = new HashMap<>();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was there anything previously depending on header insertion ordering? If so, a LinkedHashMap may be a better option (though less performant)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing was depending on the ordering of the headers.

@@ -507,7 +506,8 @@ public Mono<RxDocumentServiceResponse> processMessage(RxDocumentServiceRequest r
}

if (Exceptions.isThroughputControlRequestRateTooLargeException(dce)) {
BridgeInternal.recordGatewayResponse(request.requestContext.cosmosDiagnostics, request, null, dce, globalEndpointManager);
StoreResponseDiagnostics storeResponseDiagnostics = StoreResponseDiagnostics.createStoreResponseDiagnostics(dce);
BridgeInternal.recordGatewayResponse(request.requestContext.cosmosDiagnostics, request, storeResponseDiagnostics, globalEndpointManager);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to check whether request.requestContext.cosmosDiagnostics != null? (similar to line 410 - 416)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, worth adding a null check here to avoid any NPEs.

@@ -664,7 +667,8 @@ StoreResult createAndRecordStoreResult(
StoreResult storeResult = this.createStoreResult(storeResponse, responseException, requiresValidLsn, useLocalLSNBasedHeaders, storePhysicalAddress);

try {
BridgeInternal.recordResponse(request.requestContext.cosmosDiagnostics, request, storeResult, transportClient.getGlobalEndpointManager());
StoreResultDiagnostics storeResultDiagnostics = StoreResultDiagnostics.createStoreResultDiagnostics(storeResult);
BridgeInternal.recordResponse(request.requestContext.cosmosDiagnostics, request, storeResultDiagnostics, transportClient.getGlobalEndpointManager());
Copy link
Member

@xinlian12 xinlian12 May 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kind feel like createStoreResultDiagnostics can be part of the logic in recordResponse,

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Valid point, will be much cleaner, will do that.


return null;
public String getHeaderValue(String attribute) {
return responseHeaders.get(attribute);
Copy link
Member

@xinlian12 xinlian12 May 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the logic here seems changed a little bit compared to original, do we need to allow match by ignoreCase?

if (responseHeaderNames[i].equalsIgnoreCase(attribute)) {
                 return responseHeaderValues[i];
             }

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Valid point, I think worth checking for both cases to make it case insensitive.

this.requestTimeline = storeResponse.getRequestTimeline();
this.channelAcquisitionTimeline = storeResponse.getChannelAcquisitionTimeline();
this.rntbdChannelTaskQueueSize = storeResponse.getRntbdChannelTaskQueueSize();
this.rntbdEndpointStatistics = storeResponse.getEndpointStsts();
Copy link
Member

@xinlian12 xinlian12 May 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it a typo or abbreviation ? storeResponse.getEndpointStsts

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like a typo, I will fix this API name.

cosmosException.getResponseHeaders().put(HttpConstants.HttpHeaders.REQUEST_CHARGE, totalRequestChargeString);
} else {
// Set total charge as final charge for the response.
response.getResponseHeaders().put(HttpConstants.HttpHeaders.REQUEST_CHARGE, totalRequestChargeString);
Copy link
Member

@xinlian12 xinlian12 May 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the logic here also seems a little different, also because of ignoreCase check in the original logic.
Do we ever have a mix use of uppercase/lowercase of the header names?

SerializerProvider serializerProvider) throws IOException {
StoreResponseDiagnostics storeResponseDiagnostics = storeResultDiagnostics.getStoreResponseDiagnostics();
jsonGenerator.writeStartObject();
jsonGenerator.writeObjectField("storePhysicalAddress", storeResultDiagnostics.storePhysicalAddressAsString);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be writeStringField?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Valid point, I will change it.

jsonGenerator.writeNumberField("requestCharge", storeResponseDiagnostics.getRequestCharge());
jsonGenerator.writeNumberField("itemLSN", storeResultDiagnostics.itemLSN);
jsonGenerator.writeStringField("sessionToken", storeResponseDiagnostics.getSessionTokenAsString());
jsonGenerator.writeObjectField("backendLatencyInMs", storeResultDiagnostics.backendLatencyInMs);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be writeNumberField?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't want to change any existing fields, the way they were being written, but will make sure to fix the newly added string fields.


this.writeNonNullObjectField(jsonGenerator,"transportRequestChannelAcquisitionContext", storeResponseDiagnostics.getChannelAcquisitionTimeline());

jsonGenerator.writeObjectField("rntbdRequestLengthInBytes", storeResponseDiagnostics.getRntbdRequestLength());
Copy link
Member

@xinlian12 xinlian12 May 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same question here: should we use writeNumberField for line 176 - 181?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't want to change any existing fields, the way they were being written, but will make sure to fix the newly added string fields.

Copy link
Member

@xinlian12 xinlian12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, so many improvements 👍

@xinlian12
Copy link
Member

Please also update change log

@check-enforcer
Copy link

This pull request is protected by Check Enforcer.

What is Check Enforcer?

Check Enforcer helps ensure all pull requests are covered by at least one check-run (typically an Azure Pipeline). When all check-runs associated with this pull request pass then Check Enforcer itself will pass.

Why am I getting this message?

You are getting this message because Check Enforcer did not detect any check-runs being associated with this pull request within five minutes. This may indicate that your pull request is not covered by any pipelines and so Check Enforcer is correctly blocking the pull request being merged.

What should I do now?

If the check-enforcer check-run is not passing and all other check-runs associated with this PR are passing (excluding license-cla) then you could try telling Check Enforcer to evaluate your pull request again. You can do this by adding a comment to this pull request as follows:
/check-enforcer evaluate
Typically evaulation only takes a few seconds. If you know that your pull request is not covered by a pipeline and this is expected you can override Check Enforcer using the following command:
/check-enforcer override
Note that using the override command triggers alerts so that follow-up investigations can occur (PRs still need to be approved as normal).

What if I am onboarding a new service?

Often, new services do not have validation pipelines associated with them, in order to bootstrap pipelines for a new service, you can issue the following command as a pull request comment:
/azp run prepare-pipelines
This will run a pipeline that analyzes the source tree and creates the pipelines necessary to build and validate your pull request. Once the pipeline has been created you can trigger the pipeline using the following comment:
/azp run java - [service] - ci

@kushagraThapar
Copy link
Member Author

/azp run java - cosmos - tests

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

…shagraThapar/azure-sdk-for-java into azure_cosmos_diagnostics_improvements
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants