Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kvserver: raft can pull large amounts of entries from Storage #104402

Closed
pav-kv opened this issue Jun 6, 2023 · 3 comments
Closed

kvserver: raft can pull large amounts of entries from Storage #104402

pav-kv opened this issue Jun 6, 2023 · 3 comments
Assignees
Labels
A-kv-replication Relating to Raft, consensus, and coordination. C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. T-kv-replication KV Replication Team

Comments

@pav-kv
Copy link
Collaborator

pav-kv commented Jun 6, 2023

During election, a raft instance checks whether the log has any unapplied committed config changes. To do so, it currently pulls the entire log between the applied and committed index to memory. There is no limit to how much it can read into memory. If a raft follower has large amounts of unapplied entries, and it campaigns, this may lead to out-of-memory situations.

We should never have unlimited amounts of logs pulled into memory. Here we should paginate the log scan, so that it fetches only up to a certain size of entries at a time.

This would be an upstream change in etcd-io/raft.

Jira issue: CRDB-28509

@pav-kv pav-kv added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. A-kv-replication Relating to Raft, consensus, and coordination. labels Jun 6, 2023
@pav-kv pav-kv self-assigned this Jun 6, 2023
@blathers-crl blathers-crl bot added the T-kv-replication KV Replication Team label Jun 6, 2023
@blathers-crl
Copy link

blathers-crl bot commented Jun 6, 2023

cc @cockroachdb/replication

@tbg tbg changed the title raft can pull large amounts of entries from Storage kvserver: raft can pull large amounts of entries from Storage Jun 6, 2023
pav-kv added a commit to pav-kv/cockroach that referenced this issue Jun 7, 2023
This includes etcd-io/raft#32 which fixes an
out-of-memory scenario during election with large unapplied Raft logs.

Part of cockroachdb#104402
Epic: none
Release note (bug fix): Fixed a bug in upstream etcd-io/raft which could result
in pulling unlimited amount of log into memory, and lead to out-of-memory
situations. With the fix, the log scan has a limited memory footprint.
craig bot pushed a commit that referenced this issue Jun 8, 2023
104483: go.mod: upgrade etcd-io/raft to 515b142 r=tbg a=pavelkalinnikov

This includes etcd-io/raft#32 which fixes an out-of-memory scenario during election with large unapplied Raft logs.

Part of #104402
Epic: none
Release note (bug fix): Fixed a bug in upstream etcd-io/raft which could result in pulling unlimited amount of log into memory, and lead to out-of-memory situations. With the fix, the log scan has a limited memory footprint.

Co-authored-by: Pavel Kalinnikov <pavel@cockroachlabs.com>
@tbg
Copy link
Member

tbg commented Jun 13, 2023

master is done, waiting for backports to 23.1 and 22.2. The latter is a bit tricky since the raft repo was extracted from its surrouding original repo in the meantime, so the backport requires some careful grafting.

pav-kv added a commit to pav-kv/cockroach that referenced this issue Jun 15, 2023
This dependency upgrade includes a patched PR 32 from etcd-io/raft which fixes
an out-of-memory scenario during election with large unapplied Raft logs.

Part of cockroachdb#104402
Epic: none
Release note (bug fix): Fixed a bug in upstream etcd/raft which could result in
pulling unlimited amount of log into memory, and lead to out-of-memory
situations. With the fix, the log scan has a limited memory footprint.
pav-kv added a commit to pav-kv/cockroach that referenced this issue Jun 15, 2023
This dependency upgrade includes a patched PR 32 from etcd-io/raft which fixes
an out-of-memory scenario during election with large unapplied Raft logs.

Part of cockroachdb#104402
Epic: none
Release note (bug fix): Fixed a bug in upstream etcd-io/raft which could result
in pulling unlimited amount of log into memory, and lead to out-of-memory
situations. With the fix, the log scan has a limited memory footprint.
Release justification: fixing a bug after a customer escalation
@pav-kv
Copy link
Collaborator Author

pav-kv commented Jun 15, 2023

Fixed on master and backported to release-22.2 and release-23.1.

@pav-kv pav-kv closed this as completed Jun 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-kv-replication Relating to Raft, consensus, and coordination. C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. T-kv-replication KV Replication Team
Projects
None yet
Development

No branches or pull requests

2 participants