-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speculative processing #34686
Speculative processing #34686
Conversation
…led when the subscription gets cancelled. - Cleanup and refactoring
Adding more tests
…e-processing # Conflicts: # sdk/cosmos/azure-cosmos-tests/src/test/java/com/azure/cosmos/EndToEndTimeOutValidationTests.java # sdk/cosmos/azure-cosmos-tests/src/test/java/com/azure/cosmos/implementation/RxDocumentClientImplTest.java # sdk/cosmos/azure-cosmos-tests/src/test/java/com/azure/cosmos/implementation/caches/AsyncCacheNonBlockingTest.java # sdk/cosmos/azure-cosmos-tests/src/test/java/com/azure/cosmos/implementation/directconnectivity/BarrierRequestHelperTest.java # sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/CosmosAsyncClient.java # sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/CosmosClientBuilder.java # sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/AsyncDocumentClient.java # sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/RMResources.java # sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/RequestOptions.java # sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/RxDocumentClientImpl.java # sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/caches/AsyncCacheNonBlocking.java # sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/directconnectivity/ReplicatedResourceClient.java # sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/models/CosmosItemRequestOptions.java # sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/models/CosmosQueryRequestOptions.java
...mos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/RxDocumentServiceRequest.java
Outdated
Show resolved
Hide resolved
...c/main/java/com/azure/cosmos/implementation/directconnectivity/ReplicatedResourceClient.java
Outdated
Show resolved
Hide resolved
...mplementation/directconnectivity/speculativeprocessors/ThompsonSamplingBasedSpeculation.java
Outdated
Show resolved
Hide resolved
...mplementation/directconnectivity/speculativeprocessors/ThompsonSamplingBasedSpeculation.java
Outdated
Show resolved
Hide resolved
...mplementation/directconnectivity/speculativeprocessors/ThompsonSamplingBasedSpeculation.java
Outdated
Show resolved
Hide resolved
...mplementation/directconnectivity/speculativeprocessors/ThompsonSamplingBasedSpeculation.java
Outdated
Show resolved
Hide resolved
...mplementation/directconnectivity/speculativeprocessors/ThompsonSamplingBasedSpeculation.java
Outdated
Show resolved
Hide resolved
sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/models/CosmosQueryRequestOptions.java
Outdated
Show resolved
Hide resolved
Test cleanup Adding only ThresholdBasedSpeculation
/azp run java - cosmos - tests |
Azure Pipelines successfully started running 1 pipeline(s). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks @mbhaskar
@@ -33,7 +33,7 @@ | |||
*/ | |||
public final class CosmosDiagnostics { | |||
private static final Logger LOGGER = LoggerFactory.getLogger(CosmosDiagnostics.class); | |||
private static final ObjectMapper OBJECT_MAPPER = new ObjectMapper(); | |||
static final ObjectMapper OBJECT_MAPPER = new ObjectMapper(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not private anymore? Are we re-using this anywhere else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was using in tests to validate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But could revert.
} | ||
|
||
public static int speculationThreshold() { | ||
return getJVMConfigAsInt(SPECULATION_THRESHOLD, 500); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: we should define these default values in separate constant variables above like other methods.
|
||
@Override | ||
public void onResponseReceived(URI region, Duration latency) { | ||
// unused |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if it is unused, may be throw error if it gets called, something like not implemented exception. Can be done in a follow up PR.
@@ -28,7 +28,7 @@ | |||
requires java.logging; | |||
requires HdrHistogram; | |||
|
|||
// public API surface area | |||
// public API surface area |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can revert this change,.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will revert in follow up PR
@@ -300,6 +311,18 @@ public static int getSessionTokenMismatchMaximumBackoffTimeInMs() { | |||
DEFAULT_SESSION_TOKEN_MISMATCH_MAXIMUM_BACKOFF_TIME_IN_MILLISECONDS); | |||
} | |||
|
|||
public static int getSpeculationType() { | |||
return getJVMConfigAsInt(SPECULATION_TYPE, 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this by default be 1?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No speculation by default
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks
...zure-cosmos/src/main/java/com/azure/cosmos/implementation/DocumentServiceRequestContext.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM - Thanks
/check-enforcer override |
1 similar comment
/check-enforcer override |
/check-enforcer override |
2 similar comments
/check-enforcer override |
/check-enforcer override |
/check-enforcer evaluate |
/check-enforcer reset |
/check-enforcer override |
* end-end timeout implementation * cleanup * cleanup * Adding missing comments * Cleanup and fixing test * Adding more tests * - Moving asyncCache to background runnable so that it wont get cancelled when the subscription gets cancelled. - Cleanup and refactoring * Refactoring and clean up * Refactoring and clean up * Adding timeouts for writes. Adding more tests * Adding speculative execution support for read operations * Adding config options to speculation * Fixing issue with query where request is not cloning property * Shading apache math3 * Adding Portions license Fixing lint issues * refactoring * Test fix * Cleanup and refactoring Test cleanup Adding only ThresholdBasedSpeculation * changing default value * Removing unused test
Description
This PR adds support for speculative processing on top of end-to-end timeout to improve availability within timeout
Introduces Threshold based speculative execution
Only for Reads / Queries
If we don't get a successful response (either delayed response or error response) within the threshold value, we go to all preferred regions in parallel.
- We can try one region at a time, with some linear timeline.
- Whichever responds first, we will return, and cancel all the other ongoing requests.
Benefits:
1. Better chances of getting a response from either of those regions.
2. This provides Maximum availability at RUs / compute tradeoff (increased cost)
If speculative retries are enabled, based on the probability threshold Speculative retries may increase RU costs
When we have sparse data, the algo adds more variance around mean and this may increase the exploration even though the lower mean value is selected until the perf guarantee is increased. (The algo has proofs that this keeps cost under consideration and does only necessary exploration)
Current algo is tuned for 10sample requests. It can be tuned based on time/ number of requests and can be adjusted accordingly
Initial test resuls
Tests:
Read document 100 times
30s of delay faults on the local region
How to enable:
Following environment variables enable speculation when end to end timeout policy is specified
Integer value specifying the speculation type
0 - No speculation
1 - Threshold based speculation
COSMOS_SPECULATION_TYPE
The threshold int representing milli seconds
COSMOS_SPECULATION_THRESHOLD
The threshold step int representing milli seconds. This sets when speculation on subsequent regions is triggered
COSMOS_SPECULATION_THRESHOLD_STEP
example: Three regions r1, r2,r3 in preferred regions
Speculation threshold: 500
Specialization step: 100
At time t1 request to r1 is made
If there is no response in 500ms, request to r2 is made
If there is no response at 500+100ms, request to r3 is made
All SDK Contribution checklist:
General Guidelines and Best Practices
Testing Guidelines