-
Notifications
You must be signed in to change notification settings - Fork 73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Buffer transaction statements to disk to prevent replay deadlock #505
Conversation
eb5f0c6
to
f3c8bdd
Compare
@dimitri The latest changes uses spills to disk approach to buffer transactions in replay mode. |
a369b17
to
e609c67
Compare
Could we store the current transaction to a file named |
e609c67
to
c17fd8e
Compare
c17fd8e
to
985144c
Compare
@dimitri I've added a symlink to the latest transaction buffer as suggested by you. Additionally KEEPALIVE, SWITCHWAL and ENDPOS messages which comes out of transaction will be replayed without buffering those. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your work, I believe we're close to being able to merge this improvement! Please see my review and consider the changes I'm asking for.
src/bin/pgcopydb/ld_replay.c
Outdated
/* If there is already a replay buffer open, it indicates a partial | ||
* transaction. Close it and start a new one. | ||
*/ | ||
if (replayCtx->txnReplayBuffer != NULL) | ||
{ | ||
fclose(replayCtx->txnReplayBuffer); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I follow how this is not an ERROR (or a WARN) situation? When does that happen in a normal execution?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea is to use similar fix for prefetch mode as well, during the resume, we might encounter partially written txns. But that doesn't apply for replay. I would rather add an error message here.
This happens while switching from prefetch to replay mode, the transform process invokes stream_transform_resume which leads to playback of partial file created by prefetch and it might have partial transaction.
In one of my test, 00000004000001590000007E.sql was partially written by the prefetch. While switching to replay, transform applies the partial file leading to the unexpected state!
2023-10-25 13:42:04 600127 INFO follow.c:355 Restarting logical decoding follower in replay mode
2023-10-25 13:42:04 600135 NOTICE follow.c:814 Starting the prefetch sub-process
2023-10-25 13:42:04 600136 NOTICE follow.c:814 Starting the transform sub-process
2023-10-25 13:42:04 600136 SQL pgsql.c:508 Connecting to [source] "postgres://tsdbadmin@tsdb-22e8519b-internal-90d0.a.timescaledb.io:26479/defaultdb?sslmode=require&keepalives=1&keepalives_idle=10&keepalives_interval=10&keepalives_count=60"
2023-10-25 13:42:04 600137 NOTICE follow.c:814 Starting the catchup sub-process
2023-10-25 13:42:04 600135 INFO ld_stream.c:1987 Resuming streaming from latest file "/home/ubuntu/.local/share/pgcopydb/00000004000001590000007E.json"
2023-10-25 13:42:04 600136 NOTICE ld_transform.c:234 Transforming from 159/7EE2F578 in "/home/ubuntu/.local/share/pgcopydb/00000004000001590000007E.sql"
2023-10-25 13:42:04 600136 NOTICE ld_transform.c:794 Transforming JSON file "/home/ubuntu/.local/share/pgcopydb/00000004000001590000007E.json" into SQL file "/home/ubuntu/.local/share/pgcopydb/00000004000001590000007E.sql"
...
2023-10-25 13:42:04 600137 INFO ld_apply.c:254 Replaying changes from LSN 159/7EE24298
2023-10-25 13:42:04 600137 ERROR ld_replay.c:363 BUG: Received COMMIT; -- {"xid":7789156,"lsn":"159/7E001210","timestamp":"2023-10-25 13:41:56.473524+0000"} when transaction is not in buffering mode
00000004000001590000007E.sql
COMMIT; -- {"xid":7789156,"lsn":"159/7E001210","timestamp":"2023-10-25 13:41:56.473524+0000"}
BEGIN; -- {"xid":7789155,"lsn":"159/7DFD6498","timestamp":"2023-10-25 13:41:56.511105+0000","commit_lsn":"159/7E008390"}
PREPARE 40ba08bf AS INSERT INTO public.readings
...
...
@dimitri, Is there a better way to deal with this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even in prefetch mode we would only encounter a single partial transaction I believe, the last one we received. Cases where we would deal with a partial transaction and then another transaction certainly indicate a bug?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even in prefetch mode we would only encounter a single partial transaction I believe, the last one we received.
This is what is happening. We have to ignore the partial transactions (i.e. either a COMMIT without a BEGIN or BEGIN without COMMIT).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I addressed all the review comments except this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I still have some with the code (the comments are good at explaining the situation, thanks). The trouble is that we don't protect against applying this approach even in the middle of our operations, when it looks like it's only needed before we reach our own “consistent point”, never after that. Can we make sure that as soon as we start replaying transactions, then we raise the situation from a DEBUG information to an actual ERROR? (well assuming my analysis is correct here).
985144c
to
d35ca1a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Getting there!
d35ca1a
to
26e70a5
Compare
This commit buffers transactions being replayed into a separate file until the entire transaction body is received. It addresses potential deadlocks that can occur when the BEGIN message arrives before the creation of the transaction metadata file by the transform process. Signed-off-by: Arunprasad Rajkumar <ar.arunprasad@gmail.com>
26e70a5
to
21463c7
Compare
* After switching to the replay mode, logical decoding will resume | ||
* from consistent point which again starts with a valid transcation | ||
* block. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have the LSN value for that consistent point available in the context here? If yes, I believe we should be using it to make sure we are skipping transactions as intended. If no, then we should add to the comment why not, so that another PR later can go and fix the situation if we feel like it should be fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dimitri This is a great point ! I think the current ignore logic implemented in this PR might cause inconsistency where it might ignore the partial transaction, but applies the next transaction because it has valid BEGIN/COMMIT block.
I need to have a another deep look now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dimitri I was wrong in my previous comment.
While switching to replay mode, follow ensures that the pending LSNs in transform queue has been processed and applied to the target. This ensures everything except the last the partial transaction has been successfully committed into the target. After switching to replay mode, stream_transform_resume again streams the last partial file(which is already applied by the previous step) to the replay process which leads to this scenario. I think streaming last partial .sql from stream_transform_resume is redundant and shall be removed and this shouldn't cause any issue. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering are these PRs #348, #277 trying to do the same in different ways?
I no longer see partial txn when I remove part of the code from stream_transform_resume which streams partial sql., I didn't see any data loss either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC the transform parts need to parse/load the latest file to build up the context in case we stopped in the middle of a transaction (we then need to have the start of the transaction in-memory); but I agree we don't need to send that to the replay process, which is only going to skip again.
That said the skip logic is sound and I wonder why in your PR we can't benefit from it?
log_debug("Received %s when transaction is already " | ||
"in buffering mode, ignoring.", line); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please rewrite the message so that it ends with ": %s", line
. This makes it easier to process the logs later.
log_debug("Received %s when transaction is not " | ||
"in buffering mode, ignoring.", line); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please rewrite the message so that it ends with ": %s", line
. This makes it easier to process the logs later.
log_debug("Received %s when transaction is not " | ||
"in buffering mode", line); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please rewrite the message so that it ends with ": %s", line
. This makes it easier to process the logs later.
#525 fixes by eliminating the need of txn metadata file. |
This commit buffers transactions being replayed into a separate file until
the entire transaction body is received. It addresses potential deadlocks
that can occur when the BEGIN message arrives before the creation of the
transaction metadata file by the transform process.
Signed-off-by: Arunprasad Rajkumar ar.arunprasad@gmail.com