Skip to content
This repository has been archived by the owner on Mar 4, 2024. It is now read-only.

recv_append_entries: Fix memory leak #167

Conversation

MathieuBordere
Copy link
Contributor

@MathieuBordere MathieuBordere commented Feb 2, 2021

AppendEntries RPC's are ignored during installation of a snapshot.
Free the memory associated to the ignored RPC.

Should help with canonical/dqlite#276

Test script output looks like:

=> Spawning database
=> Spawning test loops
=> Monitor the result
==> 20210202-1609: db1=80812 db2=81472 db3=162832 leader=127.0.0.1:9001
mykey1|foo=1 loop=1
mykey2|foo=1 loop=8074
==> 20210202-1610: db1=100788 db2=94684 db3=121800 leader=127.0.0.1:9001
mykey1|foo=1 loop=1
mykey2|foo=6 loop=67
==> 20210202-1611: db1=101184 db2=95952 db3=130304 leader=127.0.0.1:9001
mykey1|foo=1 loop=1
mykey2|foo=10 loop=3050
==> 20210202-1612: db1=100912 db2=96548 db3=137808 leader=127.0.0.1:9001
mykey1|foo=1 loop=1
mykey2|foo=14 loop=6934
...
==> 20210202-1648: db1=104232 db2=98924 db3=140100 leader=127.0.0.1:9001
mykey1|foo=100 loop=9999
mykey2|foo=97 loop=2581
Loop mykey2 exited
==> 20210202-1649: db1=104100 db2=99816 db3=140100 leader=127.0.0.1:9001
mykey1|foo=100 loop=9999
mykey2|foo=100 loop=9999
==> 20210202-1650: db1=104100 db2=99952 db3=140360 leader=127.0.0.1:9001
mykey1|foo=100 loop=9999
mykey2|foo=100 loop=9999
==> 20210202-1651: db1=104100 db2=99952 db3=139916 leader=127.0.0.1:9001
mykey1|foo=100 loop=9999
mykey2|foo=100 loop=9999

Have seen a run where db3 memory usage spikes to +- 770MB, but goes down again quickly, should be further investigated.

==> 20210202-1803: db1=103812 db2=100856 db3=169220 leader=127.0.0.1:9001
mykey1|foo=58 loop=7233
mykey2|foo=51 loop=3964
==> 20210202-1804: db1=104192 db2=100964 db3=211648 leader=127.0.0.1:9001
mykey1|foo=58 loop=7233
mykey2|foo=55 loop=3296
==> 20210202-1805: db1=104264 db2=100800 db3=211648 leader=127.0.0.1:9001
mykey1|foo=58 loop=7233
mykey2|foo=59 loop=4699
==> 20210202-1806: db1=104192 db2=101168 db3=772276 leader=127.0.0.1:9001
mykey1|foo=58 loop=7233
mykey2|foo=63 loop=3533
==> 20210202-1807: db1=104928 db2=101004 db3=150536 leader=127.0.0.1:9001
mykey1|foo=58 loop=7233
mykey2|foo=67 loop=4411
==> 20210202-1808: db1=104528 db2=101108 db3=149344 leader=127.0.0.1:9001
mykey1|foo=58 loop=7233
mykey2|foo=71 loop=5290
==> 20210202-1809: db1=104476 db2=101532 db3=149344 leader=127.0.0.1:9001
mykey2|foo=74 loop=9900
mykey1|foo=69 loop=7238

AppendEntries RPC's are ignored during installation of a snapshot.
Free the memory associated to the ignored RPC.
@freeekanayaka
Copy link
Contributor

One quick question, why are AppendEntries RPCs being sent at all, to the node which is installing the snapshot? IIRC the logic in progress.c/replication.c should not send further AppendEntries after sending a snapshot, until the target node replies that the snapshot was installed.

Regardless of that, I guess your fix still makes sense in general, I just don't get why this should be a common situation, perhaps there's another fix to be made to prevent concurrent AppendEntries RPCs while an InstallSnapshot is in progress.

@MathieuBordere
Copy link
Contributor Author

I don't know yet, but it looks like a code path that is frequently hit at the moment.
I will for sure investigate further.

@MathieuBordere
Copy link
Contributor Author

MathieuBordere commented Feb 3, 2021

It looks like the leak would also be triggered when we are taking a snapshot ourselves, see takeSnapshot in replication.c.
https://github.com/MathieuBordere/raft/blob/e117cac3bd47c33c9ba8e8cfc0d7baf2d447e4ac/src/replication.c#L1439

The comment in the code isn't complete, it's during snapshot install and taking a snapshot that AppendEntries are ignored.

@freeekanayaka
Copy link
Contributor

It looks like the leak would also be triggered when we are taking a snapshot ourselves, see takeSnapshot in replication.c.
https://github.com/MathieuBordere/raft/blob/e117cac3bd47c33c9ba8e8cfc0d7baf2d447e4ac/src/replication.c#L1439

The comment in the code isn't complete, it's during snapshot install and taking a snapshot that AppendEntries are ignored.

I guess you're right. It makes more sense now that the leak is happening.

@freeekanayaka freeekanayaka merged commit 329e3d8 into canonical:master Feb 3, 2021
@MathieuBordere MathieuBordere deleted the mbordere/memory_leak_during_snapshot_install branch February 3, 2021 13:22
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants