Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storePhysicalAddress is null - V2 Java Async SDK #353

Open
kushagraThapar opened this issue Aug 27, 2020 · 2 comments
Open

storePhysicalAddress is null - V2 Java Async SDK #353

kushagraThapar opened this issue Aug 27, 2020 · 2 comments
Assignees

Comments

@kushagraThapar
Copy link
Member

  • storePhysicalAddress is null - V2 Java Async SDK
  • Example diagnostics from one of the customers:
RequestStartTime: "24 Aug 2020 13:09:05.284", RequestEndTime: "24 Aug 2020 13:09:06.299", Duration: 1015 ms, Number of regions attempted: 1
StoreResponseStatistics{requestResponseTime="24 Aug 2020 13:09:06.286", storeResult=storePhysicalAddress: null, lsn: -1, globalCommittedLsn: -1, partitionKeyRangeId: null, isValid: false, statusCode: 410, subStatusCode: 0, isGone: true, isNotFound: false, isInvalidPartition: false, requestCharge: 0.0, itemLSN: -1, sessionToken: null, exception: failed to establish connection to cdb-ms-prod-westus1-fd4.documents.azure.com:14001: connection timed out: cdb-ms-prod-westus1-fd4.documents.azure.com/40.112.249.60:14001, requestResourceType=Document, requestOperationType=Read}
StoreResponseStatistics{requestResponseTime="24 Aug 2020 13:09:06.299", storeResult=storePhysicalAddress: rntbd://cdb-ms-prod-westus1-fd4.documents.azure.com:14350/apps/28ad6635-acc0-4a33-8cbf-513f2a7ecff0/services/44c8c0ce-5154-49ca-918a-3160e5be949d/partitions/1316e464-6646-4890-8e75-1f564230ce92/replicas/132401551806471291p/, lsn: 3024626, globalCommittedLsn: 3438427, partitionKeyRangeId: 251, isValid: true, statusCode: 200, subStatusCode: 0, isGone: false, isNotFound: false, isInvalidPartition: false, requestCharge: 1.14, itemLSN: 644636, sessionToken: 18#3438428#5=-1#7=1595052, exception: null, requestResourceType=Document, requestOperationType=Read}
AddressResolutionStatistics{startTime="24 Aug 2020 13:09:06.299", endTime="31 Dec +999999999 23:59:59.999", targetEndpoint='https://prod-membership-core-westus.documents.azure.com:443/addresses/?$resolveFor=dbs%2FLs1aAA%3D%3D%2Fcolls%2FLs1aAJPsrkg%3D%2Fdocs&$filter=protocol%20eq%20rntbd&$partitionKeyRangeIds=251'} 

@henrikp-visma
Copy link

henrikp-visma commented Nov 26, 2020

Hi, we are experiencing similar issues in a production environment. Not sure if this is the same, so if you think I should open a separate issue, I can do that.

For the connection we are using the direct connection mode and have no replication for the data (using only one region). In addition to the store response stats, we have a lot of these in the logs:

...
io.netty.handler.timeout.ReadTimeoutException: null 
[ERROR] c.m.a.c.i.d.GatewayAddressCache - Network failure 
com.microsoft.azure.cosmosdb.DocumentClientException: null
c.m.a.c.r.i.ClientRetryPolicy - Gateway endpoint not reachable. Will refresh cache and retry.

The issue here is that, even if the client eventually recovers, once we hit this issue, it causes > 1 minute hiccups. This is obviously experienced by the users.

The weird part is, we have the same setup in some staging/test environments, and none of these errors are manifesting there. One guess is that the load on the application could be heavier for the production, so some timeouts would result. But as far as I can see it seems it is not caused be CPU resource shortage. This on the other hand could imply memory pressure or some other resource contention causing a lot of CPU stalls. Have not had time to look closer.

--
As a side note, we are still using the V2 Java Async SDK, as last time we were about to transition to using V4 some of the necessary features were not yet implemented. Currently investigating if we could do the transition now, and could it fix the issues we are having.

@henrikp-visma
Copy link

Here is and excerpt from the production application logs:

2020-11-26-log-excerpt.txt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants