Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Durable slot #62

Merged
merged 4 commits into from
May 16, 2024
Merged

Durable slot #62

merged 4 commits into from
May 16, 2024

Conversation

DaemonSnake
Copy link
Contributor

@DaemonSnake DaemonSnake commented May 16, 2024

adds support for non-temporary replication slots

  • adds the optional config toggle durable_slot
  • when enabled it will:
    • check if slot already exists
    • try to create it if it doesn't
    • starts it
  • the startup will fail if:
    • the slot already exists and is still active
    • multiple processes/nodes try to:
      • create the same durable slot around the same time (only one will succeed)
      • start the durable slot at the same time

When the slot is restarted, it will start back at the last wal_end+1 that walex returned in the keep-alive reply to postgres.
Transactions processed between the last keep-alive and the interruption of walex will be replayed.

If event replay is problematic, the end-user must implement a way to mark LSN/transactions as processed to avoid re-processing.

Also event loss is possible currently in this situation:

  • Walex receives transaction
  • start processing asynchronously
  • receives keep-alive request
  • reply with the received wal_end+1 (which is greater than current transaction being processed)
  • walex crashes / get interrupted before finishing processing the transaction
  • restarts at position wal_end+1 (event was lost)

we made another PR to address this and will submit it shortly

- add the config option durable_slot(bool)
- if enabled check if slot already exists / is still active
- if any of those are true => raise
- if multiple processes try to create a slot with the same name it will error, raise in that case
- otherwise create durable slot and start it
enabled with option durable_slot: true
cannot be started by more than 1 ReplicationConnection at the same time
is actually durable (survives to server crash, marked as temporary: false)
can be resumed
@cpursley
Copy link
Owner

cpursley commented May 16, 2024

This is an awesome and very welcome improvement @DaemonSnake!

Let me know if you have other PRs planned - I need to cut a new release soon.

@cpursley cpursley merged commit d3530ff into cpursley:master May 16, 2024
1 check passed
@DaemonSnake DaemonSnake deleted the durable_slot branch May 16, 2024 17:18
@cpursley
Copy link
Owner

@DaemonSnake did you have any more changes you wanted to get in before I create a new release?

@DaemonSnake
Copy link
Contributor Author

@cpursley oh yes, will submit it today. Sorry for the late reply.
Thanks you so much for the quick review and merge by the way 😁

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants