Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory usage of librdkafka #3343

Closed
4 of 7 tasks
YarinLowe opened this issue Apr 11, 2021 · 12 comments
Closed
4 of 7 tasks

Memory usage of librdkafka #3343

YarinLowe opened this issue Apr 11, 2021 · 12 comments

Comments

@YarinLowe
Copy link

YarinLowe commented Apr 11, 2021

Description

Hi,
I'm running several librdkafka instances on several machines, connecting to several clusters (each client connects to one cluster).
I found out that librdkafka's clients (producer and consumer) use a large amount of memory:
For an instance connecting to a cluster of 3 brokers, thus running 4-5 threads, there's a memory usage of 400-800MB - which is really problematic when I need to run several instances on a single machine (e.g. on different applications).
The memory usage is high even before connecting to the cluster (proved by running the client with dummy non-existing bootstrap servers).
I tried to 'play' with the configuration (e.g. lowering queue.buffering.max.kbytes) but nothing seemed to help (or even to have any effect).
I tried to configure some dummy bootstrap servers and found out that each server I add to the list (even if does not exist) adds 80-150MB of memory (and another broker thread, of course).
Connecting to different clusters results in varied memory usage. I have some instances that use 400MB and others that use 800MB. There's no significant difference between them - they are all "idle" producers, sending a message each 3 seconds via 2 topics only - the main diff is they're connecting to different clusters.

To sum up the questions:

  1. Is it the expected amount of memory? (for an instance connecting to a cluster of 3 brokers)
  2. What affects the memory usage? Is it the configuration (apart from bootstrap list)? Does the server [cluster] has any effect? (e.g. number of topics, even when not intentionally-used locally)
  3. Is there any way to reduce the memory usage?

Thanks!

How to reproduce

Simply use a default configuration object and initialize a producer/consumer.

Checklist

Please provide the following information:

  • librdkafka version: v1.5.2
  • Apache Kafka version: 2.7.0
  • librdkafka client configuration: default
  • Operating system: Ubuntu 16.04
  • Provide logs (with debug=.. as necessary) from librdkafka
  • Provide broker log excerpts
  • Critical issue
@edenhill
Copy link
Contributor

No, an idle librdkafka instance should not consume much memory, guessing less than a meg.

Is this a producer instance? If so; are you producing messages that could correlate to the memory size?

Or is it a consumer? If so; it could be the pre-fetch queues that are filling up.

@edenhill
Copy link
Contributor

You could try to run your application with valgrind, and when the memory size is large kill the application to get a leak report (which should then show all active allocations)

@YarinLowe
Copy link
Author

@edenhill, most of my checks were with a producer instance - and it was using a lot of memory before even connecting to the cluster and before any messages were produced, so I can't find any correlation with anything.
I also tried running with valgrind, but encountered some issues - will try to run it again.

@YarinLowe
Copy link
Author

YarinLowe commented Apr 12, 2021

I found out that only the VSZ (virtual size) of the process got a big rise when creating a producer.
The physical memory usage (RSS - resident set size) had, as you say, a minor addition of ~1MB (and it seems to stay like that).
So - the problem, I think, is much less critical than I thought - but is it really fine (and expected) that librdkafka has a VSZ of a few hundreds MB? (and configuration changes didn't have any effect)

@zhangwen-network
Copy link

@YarinLowe I have the same problem with you. Run the test pruducer on my device which is arm-based, the VSZ is 220M. It's unacceptable for my device, so i also want to figure out why and how to reduce it.

@edenhill
Copy link
Contributor

Running an idle kafkacat producer connected to a cluster with 3 known brokers, we look at the process memory map:

$ pmap $(pidof kafkacat)  | cut -d ' ' -f 2- | sort -nr | head -20
 65404K -----   [ anon ]
 65404K -----   [ anon ]
 65404K -----   [ anon ]
 65404K -----   [ anon ]
 65404K -----   [ anon ]
  8192K rw---   [ anon ]
  8192K rw---   [ anon ]
  8192K rw---   [ anon ]
  8192K rw---   [ anon ]
  8192K rw---   [ anon ]
  8192K rw---   [ anon ]
  8192K rw---   [ anon ]
  8192K rw---   [ anon ]
  1644K r-x-- libcrypto.so.1.1
  1504K r-x-- libc-2.31.so
  1288K r-x-- libdb-5.3.so
  1244K r---- libunistring.so.2.1.0
  1228K r-x-- librdkafka.so.1
  1160K r-x-- libgnutls.so.30.27.0
 ...

That's 8 anonymous 8MB allocations and 5 anonymous 64MB allocations.

Let's get the thread count:

$ top -b -n 1 -H -p $(pidof kafkacat) 
top - 09:02:19 up 1 day, 2 min,  1 user,  load average: 0,92, 0,61, 0,52
Trådar:   9 totalt,   0 körande,   9 sovande,   0 stoppade,   0 zombie
%Cpu/er:  2,5 an,  0,8 sy,  0,0 ni, 96,7 in,  0,0 vä,  0,0 ha,  0,0 ma,  0,0 st
MiB Minn :  32075,9 totalt,   9159,0 fritt,   9362,2 anv.,  13554,6 buff/cache
MiB Växl:  30864,0 totalt,  30864,0 fritt,      0,0 anv.,  21889,0 tillg Minn 

    PID ANVÄNDAR  PR  NI    VIRT    RES   DELT S  %CPU  %MIN      TID+ KOMMANDO
 115032 maglun    20   0  421372  11828  10180 S   0,0   0,0   0:00.00 kafkacat
 115033 maglun    20   0  421372  11828  10180 S   0,0   0,0   0:00.62 rdk:main
 115034 maglun    20   0  421372  11828  10180 S   0,0   0,0   0:00.01 rdk:broker-1
 115035 maglun    20   0  421372  11828  10180 S   0,0   0,0   0:00.01 rdk:broker-1
 115036 maglun    20   0  421372  11828  10180 S   0,0   0,0   0:00.02 rdk:broker-1
 115037 maglun    20   0  421372  11828  10180 S   0,0   0,0   0:00.01 rdk:broker-1
 115038 maglun    20   0  421372  11828  10180 S   0,0   0,0   0:00.00 rdk:broker5
 115039 maglun    20   0  421372  11828  10180 S   0,0   0,0   0:00.01 rdk:broker3
 115040 maglun    20   0  421372  11828  10180 S   0,0   0,0   0:00.00 rdk:broker4

8 threads. (see https://github.com/edenhill/librdkafka/wiki/FAQ#number-of-internal-threads)

Now let's try that again with kafkacat just knowing about one broker:

$ pmap $(pidof kafkacat)  | cut -d ' ' -f 2- | sort -nr | head -20
 65404K -----   [ anon ]
 65404K -----   [ anon ]
  8192K rw---   [ anon ]
  8192K rw---   [ anon ]
  8192K rw---   [ anon ]
  1644K r-x-- libcrypto.so.1.1
  1504K r-x-- libc-2.31.so
  1288K r-x-- libdb-5.3.so
  1244K r---- libunistring.so.2.1.0
  1228K r-x-- librdkafka.so.1

3 x 8MB, 2 x 64 MB.

$ top -b -n 1 -H -p $(pidof kafkacat) 
top - 09:03:38 up 1 day, 4 min,  1 user,  load average: 0,50, 0,56, 0,51
Trådar:   4 totalt,   0 körande,   4 sovande,   0 stoppade,   0 zombie
%Cpu/er:  5,0 an,  0,0 sy,  0,0 ni, 95,0 in,  0,0 vä,  0,0 ha,  0,0 ma,  0,0 st
MiB Minn :  32075,9 totalt,   9157,3 fritt,   9363,9 anv.,  13554,7 buff/cache
MiB Växl:  30864,0 totalt,  30864,0 fritt,      0,0 anv.,  21887,5 tillg Minn 

    PID ANVÄNDAR  PR  NI    VIRT    RES   DELT S  %CPU  %MIN      TID+ KOMMANDO
 116456 maglun    20   0  183784  11980  10336 S   0,0   0,0   0:00.01 kafkacat
 116457 maglun    20   0  183784  11980  10336 S   0,0   0,0   0:00.00 rdk:main
 116458 maglun    20   0  183784  11980  10336 S   0,0   0,0   0:00.00 rdk:broker-1
 116459 maglun    20   0  183784  11980  10336 S   0,0   0,0   0:00.00 rdk:broker-1

for 3 threads.

The default per thread stack size on Linux is 8 MB, so the 8MB allocations are most likely per-thread stacks.

The 64 MB allocations on the other hand are proportionate but not equal to the number of threads, what we could be seeing
is on-demand per-thread heap (malloc) space.

@edenhill
Copy link
Contributor

edenhill commented Apr 13, 2021

Calling malloc_info(3) might strengthen this assumption, this is from the run with 8 threads and five 64 MB chunks. See the <heap nr=..> tags:

<malloc version="1">
<heap nr="0">
<sizes>
  <unsorted from="145" to="145" total="145" count="1"/>
</sizes>
<total type="fast" count="0" size="0"/>
<total type="rest" count="2" size="107777"/>
<system type="current" size="417792"/>
<system type="max" size="417792"/>
<aspace type="total" size="417792"/>
<aspace type="mprotect" size="417792"/>
</heap>
<heap nr="1">
<sizes>
</sizes>
<total type="fast" count="0" size="0"/>
<total type="rest" count="1" size="124832"/>
<system type="current" size="135168"/>
<system type="max" size="135168"/>
<aspace type="total" size="135168"/>
<aspace type="mprotect" size="135168"/>
<aspace type="subheaps" size="1"/>
</heap>
<heap nr="2">
<sizes>
</sizes>
<total type="fast" count="0" size="0"/>
<total type="rest" count="1" size="132272"/>
<system type="current" size="135168"/>
<system type="max" size="135168"/>
<aspace type="total" size="135168"/>
<aspace type="mprotect" size="135168"/>
<aspace type="subheaps" size="1"/>
</heap>
<heap nr="3">
<sizes>
  <unsorted from="1569" to="1569" total="1569" count="1"/>
</sizes>
<total type="fast" count="0" size="0"/>
<total type="rest" count="2" size="122641"/>
<system type="current" size="135168"/>
<system type="max" size="135168"/>
<aspace type="total" size="135168"/>
<aspace type="mprotect" size="135168"/>
<aspace type="subheaps" size="1"/>
</heap>
<heap nr="4">
<sizes>
  <unsorted from="1121" to="1121" total="1121" count="1"/>
</sizes>
<total type="fast" count="0" size="0"/>
<total type="rest" count="2" size="34177"/>
<system type="current" size="217088"/>
<system type="max" size="217088"/>
<aspace type="total" size="217088"/>
<aspace type="mprotect" size="217088"/>
<aspace type="subheaps" size="1"/>
</heap>
<heap nr="5">
<sizes>
  <unsorted from="1569" to="1569" total="1569" count="1"/>
</sizes>
<total type="fast" count="0" size="0"/>
<total type="rest" count="2" size="126625"/>
<system type="current" size="135168"/>
<system type="max" size="135168"/>
<aspace type="total" size="135168"/>
<aspace type="mprotect" size="135168"/>
<aspace type="subheaps" size="1"/>
</heap>
<total type="fast" count="0" size="0"/>
<total type="rest" count="10" size="648324"/>
<total type="mmap" count="0" size="0"/>
<system type="current" size="1175552"/>
<system type="max" size="1175552"/>
<aspace type="total" size="1175552"/>
<aspace type="mprotect" size="1175552"/>
</malloc>

@edenhill
Copy link
Contributor

There is not much we can do about this in librdkafka, short of a redesign of the threading model, so I suggest you see if you can reuse the same producer instance (which is recommend unless you need different configs), or use another allocator, e.g, tcmalloc.

@zhangwen-network
Copy link

@edenhill Firstly, thank you very much for your answers about the issue. And i have a quesion that there are 3 threads below, why two of them are named with "broker-1" and "broker0"? Running only with producer can also start broker features? I only want the function of producer.
29533 root 20 0 165060 4092 3648 S 0.0 0.0 0:00.00 producer
29534 root 20 0 165060 4092 3648 S 0.0 0.0 0:03.51 rdk:main
29535 root 20 0 165060 4092 3648 S 0.0 0.0 0:00.10 rdk:broker-1
29536 root 20 0 165060 4092 3648 S 0.0 0.0 0:00.12 rdk:broker0

@edenhill
Copy link
Contributor

broker -1 == bootstrap broker

@rolandyoung
Copy link

A "broker" thread is created for each broker the producer connects to. See https://github.com/edenhill/librdkafka/wiki/FAQ#number-of-broker-tcp-connections for an explanation of why there may be two broker connection threads even if there is only one broker in the cluster.

@zhangwen-network
Copy link

Link tcmalloc into my test-application, the VSZ reduced to 44M from 220M.

@edenhill edenhill closed this as completed Apr 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants