Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slow memory leak in v22.0? #24542

Closed
sangaman opened this issue Mar 12, 2022 · 25 comments · Fixed by #30094
Closed

Slow memory leak in v22.0? #24542

sangaman opened this issue Mar 12, 2022 · 25 comments · Fixed by #30094

Comments

@sangaman
Copy link
Contributor

I've been noticing gradually increasing memory usage bitcoin core v22.0. I've been running this node for a couple of years with the same hardware and general setup and have only noticed this in the past few months, around the time I upgraded to v22, which makes me suspect it may be related.

I'm seeing bitcoind memory usage slowly climb over weeks/months of uptime. Right now it's at 3.4 GB memory usage and as I recall I've seen it over 5 GB before in an incident where it caused the machine to run out of RAM, which is when I first noticed this. If I restart bitcoind, RAM usage goes back to a normal level and stays there for a while.

I do recall seeing a lot of BlockUntilSyncedToCurrentChain: txindex is catching up on block notifications messages in the logs when RAM usage was over 5GB and my node was barely responsive to RPC calls, but I'm not sure if this has anything to do with the issue.

I'm using dbcache=1024 which would explain some of the RAM usage but not all of it.

Other config settings that may be relevant:

txindex=1
peerblockfilters=1
blockfilterindex=1

maxconnections=100

rpcthreads=64
rpcworkqueue=256
rpcservertimeout=120

zmqpubrawblockhwm=100000
zmqpubrawtxhwm=100000

Otherwise, I'm using this node to run an lnd node (with zmq block/tx notifications) and electrumx server. There's a medium-sized watch-only wallet that's loaded. There's also a small script I run with blocknotify.

This is all running on an RPI4 with 8GB RAM, using raspbian buster w/ the OS and block indexes on an SSD.

I'd be interested if others have seen anything similar out of their nodes or if perhaps this RAM usage is normal given how I'm using it, but I am pretty sure this wasn't happening a year ago with similar usage patterns.

@ajtowns
Copy link
Contributor

ajtowns commented Mar 15, 2022

I think I've seen the same thing; but mostly for me it just looks like "bitcoind has a bunch of memory in swap and it takes a while to deallocate when shutting down" so I haven't been able to get any insight into what's actually going on. I think the same memory leak may show up in liquid/elements, but with higher severity (due to the higher block rate)

EDIT: maybe worth noting that the "takes a while to deallocate" occurs after Shutdown() has finished

@maflcko
Copy link
Member

maflcko commented Mar 15, 2022

Maybe valgrind --tool=massif bitcoind could help here?

@sangaman
Copy link
Contributor Author

I went to check on my node today expecting higher RAM usage than before, but it was actually down to 1.9 GB. Puzzled, I checked and it looks like the node actually shutdown 2 days ago (and systemd restarted it automatically). Looking at the logs, there are tons of these BlockUntilSyncedToCurrentChain: txindex is catching up on block notifications. Here's what my logs look like.

2022-03-13T04:48:01Z BlockUntilSyncedToCurrentChain: txindex is catching up on block notifications
2022-03-13T04:48:01Z BlockUntilSyncedToCurrentChain: txindex is catching up on block notifications
2022-03-13T04:53:05Z UpdateTip: new best=00000000000000000009e50a3c45c26a64cfe507b5abee9432460ed3d85c6855 height=727101 version=0x20200000 log2_work=93.398160 tx=717198865 date='2022-03-13T04:51:03Z' progress=1.000000 cache=32.2MiB(147906txo)
2022-03-13T04:53:05Z BlockUntilSyncedToCurrentChain: txindex is catching up on block notifications
2022-03-13T04:53:05Z BlockUntilSyncedToCurrentChain: txindex is catching up on block notifications
2022-03-13T04:54:48Z UpdateTip: new best=000000000000000000058b74712cac983dd446c6b2c5e078db9975c6638e86fd height=727102 version=0x20000000 log2_work=93.398173 tx=717199047 date='2022-03-13T04:52:40Z' progress=1.000000 cache=32.3MiB(148339txo)
2022-03-13T04:54:48Z BlockUntilSyncedToCurrentChain: txindex is catching up on block notifications
2022-03-13T04:54:48Z BlockUntilSyncedToCurrentChain: txindex is catching up on block notifications
2022-03-13T05:03:47Z UpdateTip: new best=0000000000000000000516fd3c3ecfb22e47ba8ed8c29163602230aee526d147 height=727103 version=0x20200000 log2_work=93.398186 tx=717199511 date='2022-03-13T04:57:22Z' progress=0.999999 cache=32.5MiB(150501txo)
2022-03-13T05:03:47Z BlockUntilSyncedToCurrentChain: txindex is catching up on block notifications
2022-03-13T05:03:47Z BlockUntilSyncedToCurrentChain: txindex is catching up on block notifications
2022-03-13T05:03:47Z BlockUntilSyncedToCurrentChain: txindex is catching up on block notifications
2022-03-13T05:03:47Z BlockUntilSyncedToCurrentChain: txindex is catching up on block notifications
2022-03-13T05:03:47Z BlockUntilSyncedToCurrentChain: txindex is catching up on block notifications
2022-03-13T05:03:47Z BlockUntilSyncedToCurrentChain: txindex is catching up on block notifications
2022-03-13T05:10:59Z UpdateTip: new best=00000000000000000009554288443c39f85f3e531e9e60c48db2d01a22585b81 height=727104 version=0x2000e004 log2_work=93.398199 tx=717200040 date='2022-03-13T05:02:51Z' progress=0.999998 cache=32.8MiB(152399txo)
2022-03-13T05:10:59Z BlockUntilSyncedToCurrentChain: txindex is catching up on block notifications
2022-03-13T05:10:59Z BlockUntilSyncedToCurrentChain: txindex is catching up on block notifications
2022-03-13T05:19:14Z UpdateTip: new best=0000000000000000000358f170fd5d78721254c9f3cad519155d36eaef8c1c63 height=727105 version=0x20004000 log2_work=93.398212 tx=717200804 date='2022-03-13T05:10:56Z' progress=0.999998 cache=33.4MiB(157443txo)
2022-03-13T05:19:15Z BlockUntilSyncedToCurrentChain: txindex is catching up on block notifications
2022-03-13T05:19:15Z BlockUntilSyncedToCurrentChain: txindex is catching up on block notifications
2022-03-13T05:19:15Z BlockUntilSyncedToCurrentChain: txindex is catching up on block notifications
2022-03-13T05:19:15Z BlockUntilSyncedToCurrentChain: txindex is catching up on block notifications
2022-03-13T05:19:15Z BlockUntilSyncedToCurrentChain: txindex is catching up on block notifications
2022-03-13T05:19:15Z BlockUntilSyncedToCurrentChain: txindex is catching up on block notifications
2022-03-13T05:19:15Z BlockUntilSyncedToCurrentChain: txindex is catching up on block notifications
2022-03-13T05:19:15Z BlockUntilSyncedToCurrentChain: txindex is catching up on block notifications
2022-03-13T05:19:41Z UpdateTip: new best=00000000000000000002bc5d2f34d9bb7c92d2e2c6f2a578ed704f7498cf4d53 height=727106 version=0x3fffe000 log2_work=93.398226 tx=717200853 date='2022-03-13T05:12:02Z' progress=0.999998 cache=33.4MiB(157649txo)
2022-03-13T05:19:42Z BlockUntilSyncedToCurrentChain: txindex is catching up on block notifications
2022-03-13T05:22:21Z UpdateTip: new best=0000000000000000000066ff7813973ee5068a0ff061f42901360871086a5025 height=727107 version=0x3fffe000 log2_work=93.398239 tx=717201297 date='2022-03-13T05:16:01Z' progress=0.999999 cache=33.6MiB(158708txo)
2022-03-13T05:22:22Z BlockUntilSyncedToCurrentChain: txindex is catching up on block notifications
2022-03-13T05:22:22Z BlockUntilSyncedToCurrentChain: txindex is catching up on block notifications
2022-03-13T05:31:47Z UpdateTip: new best=000000000000000000025ce8ab1c48e53f8cb2d5000807cf45ec18e543773f84 height=727108 version=0x2cbd0000 log2_work=93.398252 tx=717202203 date='2022-03-13T05:24:43Z' progress=0.999999 cache=33.9MiB(161187txo)
2022-03-13T05:31:49Z BlockUntilSyncedToCurrentChain: txindex is catching up on block notifications
2022-03-13T05:31:49Z BlockUntilSyncedToCurrentChain: txindex is catching up on block notifications
2022-03-13T05:31:49Z BlockUntilSyncedToCurrentChain: txindex is catching up on block notifications
2022-03-13T05:31:49Z BlockUntilSyncedToCurrentChain: txindex is catching up on block notifications
2022-03-13T05:31:49Z BlockUntilSyncedToCurrentChain: txindex is catching up on block notifications
2022-03-13T05:31:49Z BlockUntilSyncedToCurrentChain: txindex is catching up on block notifications
2022-03-13T05:37:23Z UpdateTip: new best=00000000000000000003be53d87aaf7489dc53d3c36fc36ea7b42d541a19c45f height=727109 version=0x22c18000 log2_work=93.398265 tx=717202876 date='2022-03-13T05:31:32Z' progress=0.999999 cache=34.1MiB(163020txo)
2022-03-13T05:37:23Z BlockUntilSyncedToCurrentChain: txindex is catching up on block notifications
2022-03-13T05:37:23Z BlockUntilSyncedToCurrentChain: txindex is catching up on block notifications
2022-03-13T05:37:23Z BlockUntilSyncedToCurrentChain: txindex is catching up on block notifications
2022-03-13T05:37:23Z BlockUntilSyncedToCurrentChain: txindex is catching up on block notifications
2022-03-13T05:37:23Z BlockUntilSyncedToCurrentChain: txindex is catching up on block notifications
2022-03-13T05:37:23Z BlockUntilSyncedToCurrentChain: txindex is catching up on block notifications
2022-03-13T05:37:41Z UpdateTip: new best=00000000000000000005464d4bd8530348a5b08eb2362f0f621213ca04873852 height=727110 version=0x27ffe004 log2_work=93.398278 tx=717202909 date='2022-03-13T05:31:53Z' progress=0.999999 cache=34.1MiB(163122txo)
2022-03-13T05:37:42Z BlockUntilSyncedToCurrentChain: txindex is catching up on block notifications
2022-03-13T05:41:20Z UpdateTip: new best=000000000000000000045ec5d2b3999a4336dd48724fed813ae8682868bf94b6 height=727111 version=0x20c00000 log2_work=93.398291 tx=717203383 date='2022-03-13T05:36:18Z' progress=0.999999 cache=34.3MiB(164597txo)
2022-03-13T05:41:21Z BlockUntilSyncedToCurrentChain: txindex is catching up on block notifications
2022-03-13T05:41:21Z BlockUntilSyncedToCurrentChain: txindex is catching up on block notifications
2022-03-13T05:41:21Z BlockUntilSyncedToCurrentChain: txindex is catching up on block notifications
2022-03-13T05:41:25Z New outbound peer connected: version: 70015, blocks=727111, peer=656929 (outbound-full-relay)
2022-03-13T05:41:48Z New outbound peer connected: version: 70016, blocks=727111, peer=657055 (outbound-full-relay)
2022-03-13T05:42:44Z New outbound peer connected: version: 70016, blocks=727111, peer=657066 (outbound-full-relay)
2022-03-13T05:58:47Z UpdateTip: new best=00000000000000000008ec489c496cca28028dfe29fec7c1cbf3c43dc7156312 height=727112 version=0x20c00004 log2_work=93.398304 tx=717204079 date='2022-03-13T05:42:48Z' progress=0.999997 cache=34.6MiB(167004txo)
2022-03-13T05:58:47Z BlockUntilSyncedToCurrentChain: txindex is catching up on block notifications
2022-03-13T05:58:47Z BlockUntilSyncedToCurrentChain: txindex is catching up on block notifications
2022-03-13T05:58:47Z BlockUntilSyncedToCurrentChain: txindex is catching up on block notifications
2022-03-13T05:58:47Z BlockUntilSyncedToCurrentChain: txindex is catching up on block notifications
2022-03-13T05:58:47Z BlockUntilSyncedToCurrentChain: txindex is catching up on block notifications
2022-03-13T05:58:47Z BlockUntilSyncedToCurrentChain: txindex is catching up on block notifications
2022-03-13T05:58:47Z BlockUntilSyncedToCurrentChain: txindex is catching up on block notifications
2022-03-13T05:58:47Z BlockUntilSyncedToCurrentChain: txindex is catching up on block notifications
2022-03-13T05:58:47Z BlockUntilSyncedToCurrentChain: txindex is catching up on block notifications
2022-03-13T05:58:47Z BlockUntilSyncedToCurrentChain: txindex is catching up on block notifications
2022-03-13T05:58:48Z BlockUntilSyncedToCurrentChain: txindex is catching up on block notifications
2022-03-13T05:58:48Z BlockUntilSyncedToCurrentChain: txindex is catching up on block notifications
2022-03-13T05:58:48Z BlockUntilSyncedToCurrentChain: txindex is catching up on block notifications
2022-03-13T05:58:48Z BlockUntilSyncedToCurrentChain: txindex is catching up on block notifications
2022-03-13T05:59:30Z New outbound peer connected: version: 70016, blocks=727112, peer=657227 (outbound-full-relay)
2022-03-13T06:00:03Z New outbound peer connected: version: 70016, blocks=727112, peer=657238 (outbound-full-relay)
2022-03-13T06:00:05Z New outbound peer connected: version: 70016, blocks=727112, peer=657239 (outbound-full-relay)
2022-03-13T06:00:13Z New outbound peer connected: version: 70015, blocks=727112, peer=657240 (outbound-full-relay)
2022-03-13T06:08:20Z New outbound peer connected: version: 70016, blocks=727112, peer=657341 (outbound-full-relay)
2022-03-13T06:41:11Z UpdateTip: new best=00000000000000000005fda11e6f284235a2c2cfcf74303e9610a40b5e489664 height=727113 version=0x345ca000 log2_work=93.398317 tx=717206214 da
te='2022-03-13T06:16:35Z' progress=0.999995 cache=36.0MiB(178695txo)
2022-03-13T06:41:25Z Potential stale tip detected, will try using extra outbound peer (last tip update: 2549 seconds ago)
2022-03-13T06:41:25Z BlockUntilSyncedToCurrentChain: txindex is catching up on block notifications
2022-03-13T06:41:25Z BlockUntilSyncedToCurrentChain: txindex is catching up on block notifications
2022-03-13T06:41:25Z BlockUntilSyncedToCurrentChain: txindex is catching up on block notifications
2022-03-13T06:41:25Z BlockUntilSyncedToCurrentChain: txindex is catching up on block notifications
2022-03-13T06:41:25Z BlockUntilSyncedToCurrentChain: txindex is catching up on block notifications
2022-03-13T06:41:25Z BlockUntilSyncedToCurrentChain: txindex is catching up on block notifications
2022-03-13T06:41:25Z BlockUntilSyncedToCurrentChain: txindex is catching up on block notifications
2022-03-13T06:41:25Z BlockUntilSyncedToCurrentChain: txindex is catching up on block notifications
2022-03-13T06:41:25Z BlockUntilSyncedToCurrentChain: txindex is catching up on block notifications
2022-03-13T06:41:25Z BlockUntilSyncedToCurrentChain: txindex is catching up on block notifications
2022-03-13T06:41:25Z BlockUntilSyncedToCurrentChain: txindex is catching up on block notifications
2022-03-13T06:41:25Z BlockUntilSyncedToCurrentChain: txindex is catching up on block notifications
2022-03-13T06:41:25Z BlockUntilSyncedToCurrentChain: txindex is catching up on block notifications
2022-03-13T06:41:25Z BlockUntilSyncedToCurrentChain: txindex is catching up on block notifications
2022-03-13T06:41:25Z BlockUntilSyncedToCurrentChain: txindex is catching up on block notifications
2022-03-13T06:41:25Z BlockUntilSyncedToCurrentChain: txindex is catching up on block notifications
2022-03-13T06:41:25Z BlockUntilSyncedToCurrentChain: txindex is catching up on block notifications
2022-03-13T06:41:25Z BlockUntilSyncedToCurrentChain: txindex is catching up on block notifications
2022-03-13T06:41:25Z BlockUntilSyncedToCurrentChain: txindex is catching up on block notifications
2022-03-13T06:41:25Z BlockUntilSyncedToCurrentChain: txindex is catching up on block notifications
2022-03-13T06:41:25Z BlockUntilSyncedToCurrentChain: txindex is catching up on block notifications
2022-03-13T06:41:25Z BlockUntilSyncedToCurrentChain: txindex is catching up on block notifications
2022-03-13T06:41:25Z BlockUntilSyncedToCurrentChain: txindex is catching up on block notifications
2022-03-13T06:41:25Z BlockUntilSyncedToCurrentChain: txindex is catching up on block notifications
2022-03-13T06:41:26Z New outbound peer connected: version: 70015, blocks=727116, peer=657444 (outbound-full-relay)
2022-03-13T06:41:47Z New outbound peer connected: version: 70016, blocks=727116, peer=657578 (outbound-full-relay)
2022-03-13T06:41:47Z New outbound peer connected: version: 70016, blocks=727116, peer=657579 (outbound-full-relay)
2022-03-13T06:42:04Z New outbound peer connected: version: 70015, blocks=727116, peer=657584 (outbound-full-relay)
2022-03-13T06:42:25Z New outbound peer connected: version: 70015, blocks=727117, peer=657591 (outbound-full-relay)
2022-03-13T06:42:41Z New outbound peer connected: version: 70016, blocks=727117, peer=657596 (outbound-full-relay)
2022-03-13T06:43:00Z New outbound peer connected: version: 70015, blocks=727117, peer=657598 (outbound-full-relay)
2022-03-13T06:43:04Z New outbound peer connected: version: 70016, blocks=727117, peer=657599 (outbound-full-relay)
2022-03-13T06:43:40Z New outbound peer connected: version: 70016, blocks=727117, peer=657607 (outbound-full-relay)
2022-03-13T06:44:36Z New outbound peer connected: version: 70016, blocks=727117, peer=657614 (outbound-full-relay)
2022-03-13T06:44:37Z New outbound peer connected: version: 70016, blocks=727117, peer=657615 (outbound-full-relay)
2022-03-13T06:44:38Z New outbound peer connected: version: 70015, blocks=727117, peer=657616 (outbound-full-relay)
2022-03-13T06:45:07Z New outbound peer connected: version: 70016, blocks=727117, peer=657618 (block-relay-only)
2022-03-13T06:45:43Z New outbound peer connected: version: 70015, blocks=727117, peer=657630 (outbound-full-relay)

That's the last log message before it switches to startup logs. Looks like peers started dropping my node, I'm guessing because it was unresponsive, and so it was making a lot of outbound connections, and looks like block syncing crawled to a stop.

I'll try to keep tabs on things and keep this issue updated if I see anything else interesting.

@mzumsande
Copy link
Contributor

mzumsande commented Mar 15, 2022

Looking at the logs, there are tons of these BlockUntilSyncedToCurrentChain: txindex is catching up on block notifications.

These are likely caused by multiple parallel getrawtransaction RPC calls (probably from the lnd node) - if the txindex is enabled but hasn't managed to process the newest block yet (as a result of a general slowness / unresponsiveness due to memory shortage), it will log this message.
So I think that these are probably not the cause of the memory problem, but appear as a result of it.

@sangaman
Copy link
Contributor Author

These are likely caused by multiple parallel getrawtransaction RPC calls (probably from the lnd node) - if the txindex is enabled but hasn't managed to process the newest block yet (as a result of a general slowness / unresponsiveness due to memory shortage), it will log this message. So I think that these are probably not the cause of the memory problem, but appear as a result of it.

Thanks, iirc when I tried making rpc calls via bitcoin-cli one of the last times this happened the command would hang while I saw that BlockUntilSyncedToCurrentChain, so that makes sense.

@cshintov
Copy link

cshintov commented May 4, 2023

Hi facing similar issue. I'm running an rpc node. My memory usage grows beyond 36GB. I only see this problem when setting rpcthreads parameter. Without rpcthreads the usage doesn't grow beyond 4 GB.

Did you figure out why the leak?

@mzumsande
Copy link
Contributor

Hi facing similar issue. I'm running an rpc node.

Also 22.0? Which value of rpcthreads are you using?

@sangaman
Copy link
Contributor Author

sangaman commented May 4, 2023

FYI, I am now using v24.0.1 with rpcthreads=64 as mentioned above, and I still see this issue although it still appears to be slow and unpredictable. I just checked and bitcoind is currently using close to 4 GB RAM, which is a lot higher than it would be if I'd restarted the node within the past few hours or even days.

I increased the rpcthreads setting to make sure bitcoind plays nice with lnd, but I don't expect that I actually need it that high. I can switch to the default rpcthreads value and see if this issue goes away, I'll report back later.

@willcl-ark
Copy link
Member

Thanks @sangaman. Additionally, do you have any estimates on how many rpcs you are making per time to the node?

@sangaman
Copy link
Contributor Author

sangaman commented May 4, 2023

Hmm I'm not exactly sure. This bitcoind node is primarily used to serve an lnd node as well as an electrum server and mempool.space block explorer, although the latter two get very light usage aside from staying in sync with the chain. If you think it'd be helpful I can try to turn on debug logs and see how many rpc calls are made over a certain period of time.

@cshintov
Copy link

cshintov commented May 4, 2023

I'm running 24.0.1 and I tried with various rpcthreads values 50, 100, 150.

I could solve the issue by setting the below env var.

By default, since glibc 2.10, the C library will create up to two heap
arenas per core. This is known to cause excessive memory usage in some
scenarios. To avoid this we have to set

MALLOC_ARENA_MAX=1

without this the usage grows beyond 36GB, with this it stays under 2GB
normally.

ref: https://github.com/bitcoin/bitcoin/blob/master/doc/reduce-
memory.md#linux-specific

@sangaman
Copy link
Contributor Author

sangaman commented May 4, 2023

Thank you @cshintov ! I'll try this env var out and see if it solves my issue without having to change my config.

@sangaman
Copy link
Contributor Author

So far my results with MALLOC_ARENA_MAX=1 seem to be good, I've been comfortably under 2GB RAM every time I've checked. I added it as an env var in my systemd configuration which I copied and modified from the one in this repo, I figure it ought to have that env var as well. I'll open a PR to add it.

sangaman added a commit to sangaman/bitcoin that referenced this issue May 12, 2023
This adds the `MALLOC_ARENA_MAX=1` environment variable as suggested in
https://github.com/bitcoin/bitcoin/blob/master/doc/reduce-memory.md#linux-specific
to the systemd service file definition.

Without this env var, memory usage can grow significantly especially when
`rpcthreads` is increased above its default value.

Closes bitcoin#24542.
@willcl-ark
Copy link
Member

I think setting this to 1 will give significantly worse performance for most users though?

Seems like it would be better to add it to the documentation if anything...

sangaman added a commit to sangaman/bitcoin that referenced this issue May 12, 2023
This adds the `MALLOC_ARENA_MAX=1` environment variable as suggested in
https://github.com/bitcoin/bitcoin/blob/master/doc/reduce-memory.md#linux-specific
to the systemd service file definition.

Without this env var, memory usage can grow significantly especially when
`rpcthreads` is increased above its default value.

Closes bitcoin#24542.
sangaman added a commit to sangaman/bitcoin that referenced this issue May 12, 2023
This adds the `MALLOC_ARENA_MAX=1` environment variable as suggested in
https://github.com/bitcoin/bitcoin/blob/master/doc/reduce-memory.md#linux-specific
to the systemd service file definition.

Without this env var, memory usage can grow significantly especially when
`rpcthreads` is increased above its default value.

Closes bitcoin#24542.
sangaman added a commit to sangaman/bitcoin that referenced this issue May 12, 2023
This adds the `MALLOC_ARENA_MAX=1` environment variable as suggested in
https://github.com/bitcoin/bitcoin/blob/master/doc/reduce-memory.md#linux-specific
to the systemd service file definition.

Without this env var, memory usage can grow significantly especially when
`rpcthreads` is increased above its default value.

Closes bitcoin#24542.
@sangaman
Copy link
Contributor Author

sangaman commented May 12, 2023

Seems like it would be better to add it to the documentation if anything...

It is in the documentation already at https://github.com/bitcoin/bitcoin/blob/master/doc/reduce-memory.md.

I think setting this to 1 will give significantly worse performance for most users though?

Per the documentation above that wouldn't be the case although I can't say for sure. I certainly didn't notice any decreased performance when setting this env var.

The behavior was introduced to increase CPU locality of allocated memory and performance with concurrent allocation, so this setting could in theory reduce performance. However, in Bitcoin Core very little parallel allocation happens, so the impact is expected to be small or absent.

@willcl-ark
Copy link
Member

Is this still a problem with v26.0 (or later)?

@maflcko
Copy link
Member

maflcko commented May 8, 2024

Not sure what can be done here. It would be good to use a tool to debug this, e.g. #24542 (comment), if it still happens.

@willcl-ark
Copy link
Member

willcl-ark commented May 10, 2024

OK, I think I have pretty much identified the root cause here.

Calling blockToJSON on a maximum-sized block allocates 70-80MB on the heap during runtime.

There is then an additional 15-20MB allocated for strJSON.

After these allocations are made they then seem to be re-used when possible in the future, so long as subsequent blocks are small enough to fit. This totals up to the ~100MB RSS increase observed sometimes when getting blocks via REST.

In the event that blocks are fetched in series in increasing sizes, repeated allocations (of increasing sizes) may be made, causing even higher apparent resource usage.

The cause seems to be two-fold:

  1. UniValue's are not a particularly compact representation (can't do that much about this, except perhaps we could write to string directly for this method.)
  2. The UniValue created inside blockToJSON is ~40MB worst-case, but it is then copied into a second UniValue, doubling the required heap size.

It appears that this has been investgated previously (see also here) in search of better speed, but having a move semantic for UniValue would also shave off half of this function's allocation requirement.

Thanks @stickies-v for helping me investigate

@stickies-v
Copy link
Contributor

but it is then copied into a second UniValue, doubling the required heap size.

And if I understand correctly, it is then copied again here since UniValue doesn't have a move constructor. So improving UniValue move semantics seems to make sense here indeed.

@willcl-ark
Copy link
Member

Discovered another attempt to avoid copies from @maflcko #25429 while researching further.

Ref my previous test in #30052, I have a branch with some improvements (heavily "insipred" by @martinus 's commit jgarzik/univalue@e9109e2) which limit allocations to about 60-70MB:

image

This is about what I'd expect to be the minimum size based on the allocation for the JSON string, and the UniValue Object.

@maflcko
Copy link
Member

maflcko commented May 13, 2024

And if I understand correctly, it is then copied again here since UniValue doesn't have a move constructor. So improving UniValue move semantics seems to make sense here indeed.

Can you explain this? It has a move constructor, at least the following compiles for me:

diff --git a/src/univalue/lib/univalue.cpp b/src/univalue/lib/univalue.cpp
index 656d2e8203..b3a33f36ac 100644
--- a/src/univalue/lib/univalue.cpp
+++ b/src/univalue/lib/univalue.cpp
@@ -125,6 +125,7 @@ void UniValue::pushKVEnd(std::string key, UniValue val)
 
 void UniValue::pushKV(std::string key, UniValue val)
 {
+static_assert(std::is_move_constructible_v<UniValue>);
     checkType(VOBJ);
 
     size_t idx;

The UniValue created inside blockToJSON is ~40MB worst-case, but it is then copied into a second UniValue, doubling the required heap size.

I presume this can be fixed by adding a std::move, but I haven't tested it.

@martinus
Copy link
Contributor

The UniValue created inside blockToJSON is ~40MB worst-case, but it is then copied into a second UniValue, doubling the required heap size.

I had a PR somewhere where I removed the copy constructor and made it mandatory to call a .copy(). Then you'd get a compile error wherever a copy happens, and then can either use std::move or really use .copy() if a copy is needed.

That's nice because the compiler finds all problems automatically, but requires to touch quite a lot of places.

@willcl-ark
Copy link
Member

Can you explain this? It has a move constructor, ...

I think you are correct here, the compiler implicitly generates move-contructors which are enough to fix this issue, if combined with appropriate std::move. I did not think that was the case, but I was using a non-reproducible test. I made my test reproducible and tested:

  1. v27.0
  2. addition of std::move only
  3. addition of std::move + excplicit move constructor(s)
  4. addition of std::move + excplicit move constructor(s) + LD_PRELOAD=mimalloc (just for my own interest)

rest-comparison

The test script requests every block from (0 - 844,000) % 5,000, and measures the RSS after each call.

@maflcko
Copy link
Member

maflcko commented May 13, 2024

either use std::move or really use .copy() if a copy is needed.

Style-wise I found it a bit unfortunate, because it forces code to either specify move or copy explicitly. (There is some beauty to rust, in that it implicitly moves)

@whitslack
Copy link
Contributor

I had a PR somewhere where I removed the copy constructor and made it mandatory to call a .copy(). Then you'd get a compile error wherever a copy happens, and then can either use std::move or really use .copy() if a copy is needed.

Don't do that. You'll break guaranteed copy elision, which requires the compiler to skip the construction of a temporary object in certain expressions and statements but can only happen if an accessible copy constructor exists (even though it is not called). By deleting the copy constructor, you are denying all opportunities for copy elision and are forcing the compiler always to construct temporaries and to move from them subsequently.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
11 participants