Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Buffer transaction statements to disk to prevent replay deadlock #505

Closed
wants to merge 1 commit into from

Conversation

arajkumar
Copy link
Contributor

@arajkumar arajkumar commented Oct 20, 2023

This commit buffers transactions being replayed into a separate file until
the entire transaction body is received. It addresses potential deadlocks
that can occur when the BEGIN message arrives before the creation of the
transaction metadata file by the transform process.

Signed-off-by: Arunprasad Rajkumar ar.arunprasad@gmail.com

@arajkumar
Copy link
Contributor Author

@dimitri The latest changes uses spills to disk approach to buffer transactions in replay mode.

@arajkumar arajkumar force-pushed the buffer-replay-statements branch 2 times, most recently from a369b17 to e609c67 Compare October 22, 2023 13:56
@dimitri
Copy link
Owner

dimitri commented Oct 23, 2023

Could we store the current transaction to a file named ${xid}.sql ; and maybe add a static filename as you did but this time as a symlink to the current transaction file? I believe that would make things more obvious when debugging live situations.

@arajkumar arajkumar changed the title Buffer replay statements to avoid deadlock Buffer transaction statements to disk to prevent replay deadlock Oct 25, 2023
@arajkumar arajkumar marked this pull request as ready for review October 25, 2023 10:35
@arajkumar
Copy link
Contributor Author

@dimitri I've added a symlink to the latest transaction buffer as suggested by you. Additionally KEEPALIVE, SWITCHWAL and ENDPOS messages which comes out of transaction will be replayed without buffering those.

Copy link
Owner

@dimitri dimitri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your work, I believe we're close to being able to merge this improvement! Please see my review and consider the changes I'm asking for.

src/bin/pgcopydb/copydb.c Outdated Show resolved Hide resolved
src/bin/pgcopydb/ld_replay.c Outdated Show resolved Hide resolved
Comment on lines 270 to 336
/* If there is already a replay buffer open, it indicates a partial
* transaction. Close it and start a new one.
*/
if (replayCtx->txnReplayBuffer != NULL)
{
fclose(replayCtx->txnReplayBuffer);
}
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I follow how this is not an ERROR (or a WARN) situation? When does that happen in a normal execution?

Copy link
Contributor Author

@arajkumar arajkumar Oct 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is to use similar fix for prefetch mode as well, during the resume, we might encounter partially written txns. But that doesn't apply for replay. I would rather add an error message here.

This happens while switching from prefetch to replay mode, the transform process invokes stream_transform_resume which leads to playback of partial file created by prefetch and it might have partial transaction.

In one of my test, 00000004000001590000007E.sql was partially written by the prefetch. While switching to replay, transform applies the partial file leading to the unexpected state!

2023-10-25 13:42:04 600127 INFO follow.c:355 Restarting logical decoding follower in replay mode
2023-10-25 13:42:04 600135 NOTICE follow.c:814 Starting the prefetch sub-process
2023-10-25 13:42:04 600136 NOTICE follow.c:814 Starting the transform sub-process
2023-10-25 13:42:04 600136 SQL pgsql.c:508 Connecting to [source] "postgres://tsdbadmin@tsdb-22e8519b-internal-90d0.a.timescaledb.io:26479/defaultdb?sslmode=require&keepalives=1&keepalives_idle=10&keepalives_interval=10&keepalives_count=60"
2023-10-25 13:42:04 600137 NOTICE follow.c:814 Starting the catchup sub-process
2023-10-25 13:42:04 600135 INFO ld_stream.c:1987 Resuming streaming from latest file "/home/ubuntu/.local/share/pgcopydb/00000004000001590000007E.json"
2023-10-25 13:42:04 600136 NOTICE ld_transform.c:234 Transforming from 159/7EE2F578 in "/home/ubuntu/.local/share/pgcopydb/00000004000001590000007E.sql"
2023-10-25 13:42:04 600136 NOTICE ld_transform.c:794 Transforming JSON file "/home/ubuntu/.local/share/pgcopydb/00000004000001590000007E.json" into SQL file "/home/ubuntu/.local/share/pgcopydb/00000004000001590000007E.sql"
...
2023-10-25 13:42:04 600137 INFO ld_apply.c:254 Replaying changes from LSN 159/7EE24298
2023-10-25 13:42:04 600137 ERROR ld_replay.c:363 BUG: Received COMMIT; -- {"xid":7789156,"lsn":"159/7E001210","timestamp":"2023-10-25 13:41:56.473524+0000"} when transaction is not in buffering mode

00000004000001590000007E.sql

COMMIT; -- {"xid":7789156,"lsn":"159/7E001210","timestamp":"2023-10-25 13:41:56.473524+0000"}
BEGIN; -- {"xid":7789155,"lsn":"159/7DFD6498","timestamp":"2023-10-25 13:41:56.511105+0000","commit_lsn":"159/7E008390"}
PREPARE 40ba08bf AS INSERT INTO public.readings
...
...

@dimitri, Is there a better way to deal with this?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even in prefetch mode we would only encounter a single partial transaction I believe, the last one we received. Cases where we would deal with a partial transaction and then another transaction certainly indicate a bug?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even in prefetch mode we would only encounter a single partial transaction I believe, the last one we received.

This is what is happening. We have to ignore the partial transactions (i.e. either a COMMIT without a BEGIN or BEGIN without COMMIT).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I addressed all the review comments except this.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I still have some with the code (the comments are good at explaining the situation, thanks). The trouble is that we don't protect against applying this approach even in the middle of our operations, when it looks like it's only needed before we reach our own “consistent point”, never after that. Can we make sure that as soon as we start replaying transactions, then we raise the situation from a DEBUG information to an actual ERROR? (well assuming my analysis is correct here).

src/bin/pgcopydb/ld_replay.c Outdated Show resolved Hide resolved
src/bin/pgcopydb/ld_replay.c Outdated Show resolved Hide resolved
src/bin/pgcopydb/ld_replay.c Outdated Show resolved Hide resolved
src/bin/pgcopydb/ld_replay.c Outdated Show resolved Hide resolved
src/bin/pgcopydb/ld_replay.c Outdated Show resolved Hide resolved
src/bin/pgcopydb/ld_replay.c Outdated Show resolved Hide resolved
Copy link
Owner

@dimitri dimitri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Getting there!

src/bin/pgcopydb/ld_replay.c Show resolved Hide resolved
src/bin/pgcopydb/ld_replay.c Outdated Show resolved Hide resolved
src/bin/pgcopydb/ld_replay.c Outdated Show resolved Hide resolved
src/bin/pgcopydb/ld_replay.c Outdated Show resolved Hide resolved
src/bin/pgcopydb/ld_replay.c Outdated Show resolved Hide resolved
This commit buffers transactions being replayed into a separate file until
the entire transaction body is received. It addresses potential deadlocks
that can occur when the BEGIN message arrives before the creation of the
transaction metadata file by the transform process.

Signed-off-by: Arunprasad Rajkumar <ar.arunprasad@gmail.com>
Comment on lines +352 to +354
* After switching to the replay mode, logical decoding will resume
* from consistent point which again starts with a valid transcation
* block.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have the LSN value for that consistent point available in the context here? If yes, I believe we should be using it to make sure we are skipping transactions as intended. If no, then we should add to the comment why not, so that another PR later can go and fix the situation if we feel like it should be fixed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dimitri This is a great point ! I think the current ignore logic implemented in this PR might cause inconsistency where it might ignore the partial transaction, but applies the next transaction because it has valid BEGIN/COMMIT block.

I need to have a another deep look now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dimitri I was wrong in my previous comment.

While switching to replay mode, follow ensures that the pending LSNs in transform queue has been processed and applied to the target. This ensures everything except the last the partial transaction has been successfully committed into the target. After switching to replay mode, stream_transform_resume again streams the last partial file(which is already applied by the previous step) to the replay process which leads to this scenario. I think streaming last partial .sql from stream_transform_resume is redundant and shall be removed and this shouldn't cause any issue. WDYT?

Copy link
Contributor Author

@arajkumar arajkumar Oct 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering are these PRs #348, #277 trying to do the same in different ways?

I no longer see partial txn when I remove part of the code from stream_transform_resume which streams partial sql., I didn't see any data loss either.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC the transform parts need to parse/load the latest file to build up the context in case we stopped in the middle of a transaction (we then need to have the start of the transaction in-memory); but I agree we don't need to send that to the replay process, which is only going to skip again.

That said the skip logic is sound and I wonder why in your PR we can't benefit from it?

Comment on lines +375 to +376
log_debug("Received %s when transaction is already "
"in buffering mode, ignoring.", line);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please rewrite the message so that it ends with ": %s", line. This makes it easier to process the logs later.

Comment on lines +458 to +459
log_debug("Received %s when transaction is not "
"in buffering mode, ignoring.", line);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please rewrite the message so that it ends with ": %s", line. This makes it easier to process the logs later.

Comment on lines +550 to +551
log_debug("Received %s when transaction is not "
"in buffering mode", line);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please rewrite the message so that it ends with ": %s", line. This makes it easier to process the logs later.

@dimitri dimitri added this to the v0.14 milestone Oct 26, 2023
@arajkumar
Copy link
Contributor Author

#525 fixes by eliminating the need of txn metadata file.

@arajkumar arajkumar closed this Nov 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants