Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix issue where Watch in Postgres was looping endlessly #1278

Merged
merged 1 commit into from
Apr 27, 2023

Conversation

josephschorr
Copy link
Member

This occurred if there were two exactly overlapping transactions, which caused the watch code to repeatedly "move backwards" when looking up the next revision

The fix is to ensure that all found transactions are accounted in the overall transaction revision passed to the next iteration of the watch loop

Fixes #1272

Thanks to @aberkovsky for the bug report and detailed repro information!

@josephschorr josephschorr requested a review from a team as a code owner April 20, 2023 20:30
@github-actions github-actions bot added area/datastore Affects the storage system area/tooling Affects the dev or user toolchain (e.g. tests, ci, build tools) labels Apr 20, 2023
@josephschorr josephschorr requested review from ecordell and removed request for jakedt and vroldanbet April 26, 2023 15:52
// In order to make progress, we need to ensure that any seen transactions here are
// marked as done in the revision given back to Postgres on the next iteration. We pick
// the *last* transaction to start, as it should encompass all completed transactions
// except those running concurrenctly, which is handled by calling markComplete on the other
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// except those running concurrenctly, which is handled by calling markComplete on the other
// except those running concurrently, which is handled by calling markComplete on the other

// the *last* transaction to start, as it should encompass all completed transactions
// except those running concurrenctly, which is handled by calling markComplete on the other
// transactions.
currentTxn = newTxns[len(newTxns)-1].postgresRevision
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not a big difference, but you could move this to the end of the block and avoid the computation if there are errors in the watch.

}

_, err = tx.Exec(ctx, fmt.Sprintf(
`INSERT INTO %s ("xid", "snapshot") VALUES ('%d', '%d:%d:')`,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: same exact query above templated the column names, would be nice to be consistent

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As its a test, I think its fine

colCaveatContext,
colCreatedXid,
nextx+1,
))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the fix works by bumping the xid in the query forward, wouldn't it make sense to include a third write here that adds a _non_overlapping row at nextx+2 to ensure it doesn't get skipped (It should be fine because the query uses >=, but just to be defensive)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted the test to exactly reproduce what the user reported; I could add another test if you like?

require.True(pds.watchEnabled)

prev := rev.(postgresRevision)
nextx := prev.snapshot.xmax + 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

super nit, feel free to ignore:

Suggested change
nextx := prev.snapshot.xmax + 1
nexttx := prev.snapshot.xmax + 1

Copy link
Contributor

@ecordell ecordell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

This occurred if there were two exactly overlapping transactions, which caused the watch code to repeatedly "move backwards" when looking up the next revision

The fix is to ensure that all found transactions are accounted in the overall transaction revision passed to the next iteration of the watch loop

Fixes authzed#1272

Thanks to @aberkovsky for the bug report and detailed repro information!
@josephschorr josephschorr merged commit 6a00d19 into authzed:main Apr 27, 2023
24 checks passed
@github-actions github-actions bot locked and limited conversation to collaborators Apr 27, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area/datastore Affects the storage system area/tooling Affects the dev or user toolchain (e.g. tests, ci, build tools)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Endless loop in watch stream
2 participants