Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AsyncCacheNonBlocking - Improve availability during replica movement #29322

Merged
merged 16 commits into from
Aug 24, 2022

Conversation

aayush3011
Copy link
Member

@aayush3011 aayush3011 commented Jun 9, 2022

Description

The current AsyncCache on force refreshes removes the value. It then creates a new value and all incoming requests are stuck waiting for the new task to complete.

Issue
If a single replica moves only 1 of the 4 replicas URIs is stale. The current async cache is blocking all requests until the value is updated. The 3 other replicas could be used to successfully complete requests.

Solution
AsyncCacheNonBlocking was created. This async cache does not block requests on refresh. It instead returns the stale value until the refresh is complete. Then the refresh value is used to update the cache. Any request using the forceRefresh to true will wait on the same refresh task.

All SDK Contribution checklist:

  • The pull request does not introduce [breaking changes]
  • CHANGELOG is updated for new features, bug fixes or other significant changes.
  • I have read the contribution guidelines.

General Guidelines and Best Practices

  • Title of the pull request is clear and informative.
  • There are a small number of commits, each of which have an informative message. This means that previously merged commits do not appear in the history of the PR. For more information on cleaning up the commits in your PR, see this page.

Testing Guidelines

  • Pull request includes test coverage for the included changes.
  • Benchmark Tests
  • Benchmark Tests:

Benchmark Workload in Prod on West US 2 during upgrades

Workload Test RU: 100,000 Documents: 100,000 Time Duration: 60 hrs Region: West US 2*

Throughput - Mean Rate

mean_rate

Throughput - m5 Rate

m5_rate

Latency - p95

p95

Latency - p99

p99

Latency - p999

p999

Latency - max

max

Benchmark Test RU: 100,000, Documents: 100,000, Region: South Central US**

Throughput:
image

P95 Latency
image

P99 Latency
image

P999 Latency
image

Max Latency
image

@aayush3011
Copy link
Member Author

/azp run java - cosmos - tests

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@aayush3011
Copy link
Member Author

/azp run java - cosmos - tests

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@aayush3011
Copy link
Member Author

/azp run java - cosmos - tests

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@aayush3011
Copy link
Member Author

/azp run java - cosmos - tests

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Member

@FabianMeiswinkel FabianMeiswinkel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - except for few NITs aroung logging and isDebugEnabled check

@check-enforcer
Copy link

This pull request is protected by Check Enforcer.

What is Check Enforcer?

Check Enforcer helps ensure all pull requests are covered by at least one check-run (typically an Azure Pipeline). When all check-runs associated with this pull request pass then Check Enforcer itself will pass.

Why am I getting this message?

You are getting this message because Check Enforcer did not detect any check-runs being associated with this pull request within five minutes. This may indicate that your pull request is not covered by any pipelines and so Check Enforcer is correctly blocking the pull request being merged.

What should I do now?

If the check-enforcer check-run is not passing and all other check-runs associated with this PR are passing (excluding license-cla) then you could try telling Check Enforcer to evaluate your pull request again. You can do this by adding a comment to this pull request as follows:
/check-enforcer evaluate
Typically evaulation only takes a few seconds. If you know that your pull request is not covered by a pipeline and this is expected you can override Check Enforcer using the following command:
/check-enforcer override
Note that using the override command triggers alerts so that follow-up investigations can occur (PRs still need to be approved as normal).

What if I am onboarding a new service?

Often, new services do not have validation pipelines associated with them, in order to bootstrap pipelines for a new service, you can issue the following command as a pull request comment:
/azp run prepare-pipelines
This will run a pipeline that analyzes the source tree and creates the pipelines necessary to build and validate your pull request. Once the pipeline has been created you can trigger the pipeline using the following comment:
/azp run java - [service] - ci

@aayush3011 aayush3011 merged commit 7ea4aae into Azure:main Aug 24, 2022
Harshan01 pushed a commit to Harshan01/azure-sdk-for-java that referenced this pull request Aug 30, 2022
…zure#29322)

* Async Non Blocking Cache

* Async Non Blocking Cache

* Resolving comments

* Resolving spotbugs

* Fixing spot bugs

* Resolving comments

* Fixing spot bugs

* Resolving comments

* Fixing CI build

* Resolving comments
vcolin7 pushed a commit to vcolin7/azure-sdk-for-java that referenced this pull request Sep 9, 2022
…zure#29322)

* Async Non Blocking Cache

* Async Non Blocking Cache

* Resolving comments

* Resolving spotbugs

* Fixing spot bugs

* Resolving comments

* Fixing spot bugs

* Resolving comments

* Fixing CI build

* Resolving comments
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants