Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change feed mode switch error handling #38740

Merged
merged 26 commits into from
May 15, 2024

Conversation

jeet1995
Copy link
Member

@jeet1995 jeet1995 commented Feb 11, 2024

Objective

The objective of this PR is to detect scenarios where the customer starts an AllVersionsAndDeletes CFP instance following stopping the LatestVersion CFP instance with an already initialized lease container (and vice-versa). This is to prevent scenarios where AllVersionsAndDeletes picks up a continuation from the lease outside its retention period or picks up a lease with StartFrom as BEGINNING which doesn't apply for AllVersionsAndDeletes change-feed mode.

Implementation

Transition 1: LatestVersion using EPK-range based lease to AllVersionAndDeletes:

  1. Complete the lease store initialization flow within BootstrapperImpl.
  2. Upon lease store initialization completion, query for EPK-range based lease documents whose id starts with a given lease prefix (or an empty prefix). This is done to see if the CFP instance in the bootstrapping phase will reuse leases or not.
  3. If there exist leases, then extract the Mode property from the ContinuationToken of the lease document and compare it with the ChangeFeedMode value of the CFP instance in bootstrapping phase.
  4. Throw an IllegalStateException if there is a mismatch in the Mode value from the fetched lease and the ChangeFeedMode value of the CFP instance in bootstrapping phase.

Transition 2: AllVersionAndDeletes to LatestVersion using EPK-range based lease:

  1. Complete the lease store initialization flow within PkRangeIdVersionBootstrapperImpl.
  2. Upon lease store initialization completion, query for query for EPK-range based lease documents whose id starts with a given lease prefix (or an empty prefix). This is done to see if the CFP instance in the bootstrapping phase will reuse leases or not.
  3. If there exist leases, then extract the Mode property from the ContinuationToken of the lease document and compare it with the ChangeFeedMode value of the CFP instance in bootstrapping phase.
  4. Throw an IllegalStateException if there is a mismatch in the Mode value from the fetched lease and the ChangeFeedMode value of the CFP instance in bootstrapping phase.

Transition 3: AllVersionAndDeletes to LatestVersion using Pk-range based lease:

  • In this scenario, when LatestVersion CFP bootstraps - it can't pick up an EPK-range based lease as the CFP instance fetches leases / .info documents whose id embeds the name-based id of the database and feed container. An EPK-range based lease & its respective .info document has id which embeds the resource id (system-generated) of the database and feed container.
  • Hence, in this scenario, the CFP instance will either bootstrap from pre-existing lease documents which are also PkRange-based leases (provided the lease prefix in the id of the lease document and that of the CFP instance match) or create its own Pk-range based leases.

Transition 4: LatestVersion using Pk-range based lease to AllVersionAndDeletes:

  • In this scenario, when AllVersionAndDeletes CFP bootstraps - it can't pick up a PK-range based lease as the CFP instance fetches leases / .info documents whose id embeds the resource id of the database and feed container. A PK-range based lease & its respective .info document has id which embeds the name-based id (user provided) of the database and feed container.
  • Hence, in this scenario, the CFP instance will either bootstrap from pre-existing lease documents which are also EPK-range based leases (provided the lease prefix in the id of the lease document and that of the CFP instance match) or create its own EPK-range based leases.

Design decisions

  1. When validating change-feed modes, only 1 lease document needs to be fetched for a given lease prefix. The assumption is that all lease documents using the same lease prefix pertain to the same feed and therefore are homogeneous in terms of the value of Mode. This limits RU-usage on the lease container.
  2. Say the lease left behind by a prior-running CFP instance has a null continuation token - this scenario should not error out on a change-feed mode switch as the subsequent-running CFP instance could still try and use the lease with its own ContinuationToken.

@azure-sdk
Copy link
Collaborator

API change check

API changes are not detected in this pull request.

@jeet1995 jeet1995 changed the title [DRAFT / NO REVIEW]: Change feed mode switch error handling [DRAFT]: Change feed mode switch error handling Feb 12, 2024
@jeet1995
Copy link
Member Author

jeet1995 commented Mar 4, 2024

/azp run java - cosmos - tests

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@jeet1995
Copy link
Member Author

/azp run java - cosmos - tests

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@jeet1995 jeet1995 changed the title [DRAFT]: Change feed mode switch error handling Change feed mode switch error handling Mar 26, 2024
@jeet1995 jeet1995 marked this pull request as ready for review March 26, 2024 14:52
@jeet1995 jeet1995 requested a review from Pilchie as a code owner March 26, 2024 14:52
Copy link
Member

@kushagraThapar kushagraThapar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing changelog.

@jeet1995
Copy link
Member Author

/azp run java - cosmos - tests

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@jeet1995
Copy link
Member Author

jeet1995 commented May 9, 2024

/azp run java - cosmos - tests

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@jeet1995
Copy link
Member Author

jeet1995 commented May 9, 2024

/azp run java - cosmos - tests

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Member

@xinlian12 xinlian12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks

@jeet1995
Copy link
Member Author

/azp run java - cosmos - tests

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@jeet1995
Copy link
Member Author

/azp run java - cosmos - tests

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@jeet1995
Copy link
Member Author

jeet1995 commented May 15, 2024

Looks to be a regression in several split-based tests not finishing in time - tests pass locally. Change unrelated to splits so merging as is.

@jeet1995
Copy link
Member Author

/check-enforcer override

@jeet1995 jeet1995 merged commit e2e226b into Azure:main May 15, 2024
70 of 77 checks passed
@jeet1995
Copy link
Member Author

Closes #38577

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants