Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dnsdist: addAction/rmRule/newServer/rmServer leak memory #9372

Closed
yiyuandao opened this issue Aug 3, 2020 · 6 comments
Closed

dnsdist: addAction/rmRule/newServer/rmServer leak memory #9372

yiyuandao opened this issue Aug 3, 2020 · 6 comments

Comments

@yiyuandao
Copy link

yiyuandao commented Aug 3, 2020

  • Program: dnsdist
  • Issue type: Bug report

Short description

We have tested dnsdist as DNS load balance.
We frequently add/delete rules or servers.
and we found that when frequently add/delete rules/servers will lead to memory leak.

Environment

  • Operating system: Ubuntu 16.04.5 LTS
  • Software version: 1.4.0-1pdns.xenial, 1.5.0
  • Software source: 1.4.0(PowerDNS repository), 1.5.0(compiled myself)
 dnsdist -V
dnsdist 1.4.0 (Lua 5.1.4 [LuaJIT 2.0.4])
Enabled features: cdb dns-over-tls(openssl) dnscrypt ebpf ipcipher libsodium protobuf re2 recvmmsg/sendmmsg systemd
dnsdist 1.5.0 (Lua 5.1.4 [LuaJIT 2.0.4])
Enabled features: ebpf fstrm ipcipher libsodium protobuf recvmmsg/sendmmsg systemd

Steps to reproduce

  1. dnsdist server config
-- listen for console connection with the given secret key
controlSocket("127.0.0.1")
addConsoleACL("127.0.0.1")
setKey("xxxx")

addLocal("127.0.0.1:53", {reusePort=true})
newServer("8.8.8.8")
newServer("1.1.1.1")

pc = newPacketCache(5000000)
getPool(""):setCache(pc)

-- test rule
addAction(makeRule({"10.254.0.60/32", "10.254.0.71/32", "10.254.0.72/32", "10.254.0.73"}), PoolAction(""))
  1. server.sh and makerule_test.sh

makerule_test.sh: add and delete the same rule 4000 times
server.sh: add and delete the same server 200 times.

server.lua: add new server and delete it

newServer("2400:3200::1")
rmServer(2)
collectgarbage()

v1.4 server.sh:

mem_start=$(ps -eo rss,args | grep bin/dnsdis[t]|awk '{print $1}')
for x in $(seq 1 $1); do dnsdist -c -C ./dnsdist.conf < server.lua >/dev/null; done
mem_end=$(ps -eo rss,args | grep bin/dnsdis[t]|awk '{print $1}')
echo "start: $mem_start end: $mem_end"
echo "($mem_end-$mem_start)/1024"|bc

v1.5 server.sh:

mem_start=$(ps -eo rss,args | grep 1.5.0/dnsdis[t]|awk '{print $1}')
for x in $(seq 1 $1); do dnsdist -c -C ./dnsdist.conf < server.lua >/dev/null; done
mem_end=$(ps -eo rss,args | grep 1.5.0/dnsdis[t]|awk '{print $1}')
echo "start: $mem_start end: $mem_end"
echo "($mem_end-$mem_start)/1024"|bc

makerule_test.sh:

generate_ip.sh: create a IP list configuration

#!/bin/ash

count="$1"
for i in `seq 1 255`; do
  for j in `seq 1 255`; do
      echo 10.10.$i.$j
      let count-=1
      if [ $count -eq 0 ]; then
        exit 0
      fi
  done
done

bash generate_ip.sh 15000 >ip.conf

dist_rule.py: the arg is a ip list file, and will generate a single rule with 15000 IPs

def only_make_rule():
    # rule example: addAction(makeRule({"10.254.0.60/32", "10.254.0.71/32", "10.254.0.72/32", "10.254.0.73"}), PoolAction(""))
    rule_template = "addAction(makeRule({0}), PoolAction(''))"

    f = open(sys.argv[1])
    l = f.readlines()

    acl_rule = ['"{0}"'.format(i[0:-1]) for i in l]

    acl_rule_final = ','.join(acl_rule)
    #acl_rule_final = acl_rule_template.format(acl_rule_final)

    print(rule_template.format("{{{0}}}".format(acl_rule_final)))
    print('rmRule(1)')
    print('collectgarbage()')
only_make_rule()

python3 dist_rule.py ip.conf > makerule_15000.lua

mem_start=$(ps -eo rss,args | grep bin/dnsdis[t]|awk '{print $1}')
for x in $(seq 1 $1); do dnsdist -c -C ./dnsdist.conf < makerule_15000.lua >/dev/null; done
mem_end=$(ps -eo rss,args | grep bin/dnsdis[t]|awk '{print $1}')
echo "start: $mem_start end: $mem_end"
echo "($mem_end-$mem_start)/1024"|bc

makerule_test.sh: add and delete the same rule x times

mem_start=$(ps -eo rss,args | grep bin/dnsdis[t]|awk '{print $1}')
for x in $(seq 1 $1); do dnsdist -c -C ./dnsdist.conf < makerule_15000.lua >/dev/null; done
mem_end=$(ps -eo rss,args | grep bin/dnsdis[t]|awk '{print $1}')
echo "start: $mem_start end: $mem_end"
echo "($mem_end-$mem_start)/1024"|bc
  1. add and delete the same rule
bash makerule_test.sh 4000
  1. stop dnsdist and restart dnsdist.
  2. run newServer/rmServer two hundred times
bash server.sh 200
  1. and new rules and delete old rule.
    step 3: add and delete the same rule lead to memory leak.
    and this step will add more IPs to a single rule, and delete old rule.

example: addAction(A, PoolAction(""))
a) first rule: with old IP set A : 15000 IPs
b) new rule: new IP set B: add two IPs to A
c)delete the first rule

Expected behaviour

add and delete the same rule will not increase memory usage.
add new rule and delete old rule will increase a little memory usage.
add and delete the same server will not increase memory usage.

Actual behaviour

step 3: add and delete the same rule
$ bash makerule_test.sh 4000
output:
start: 97356 end: 949800
832

result: increased 832M byte memory!

step 5: add and delete the same server
$ bash server.sh 200
output:
start: 97156 end: 3889156
3703

result: increased 3703M byte memory!

step 6:
this also lead to memory leak

Other information

issues/8530: dnsdist: addAction/rmRule consumes incrementally more memory

dnsdist -c -C ./dnsdist.conf < makerule_15000.lua will add log to ~/.dnsdist_history, in the test, we clear the history file per second.

@yiyuandao
Copy link
Author

yiyuandao commented Aug 8, 2020

@Habbie for the newServer leak memory, I followed the call trace, and I found that when configure a newServer , dnsdist will add a response thread, but after the rmServer(x), the response thread still keep running, each thread lead to 10M byte memory leak.

@rgacogne
Copy link
Member

rgacogne commented Aug 10, 2020

I see two different things going on, that might be related:

  • the "responder" thread is not stopped when a backend is removed ;
  • each thread keeps it own copy of several configuration items, including the rules, for performance reasons (that's how our state holder works, in a RCU-like fashion). So it might take a while before all threads update their version of the ruleset, and during that time several versions might exist in memory.

We should first make sure that the responder thread eventually stops after the corresponding backend has been removed. This might also solve the issue with the rules since that thread might never update to a newer version once the backend has been removed.

@phonedph1
Copy link
Contributor

phonedph1 commented Aug 10, 2020

Were your results for both 1.4.0 and 1.5.0? For our use case in #8530 (adding/removing a bunch of rules which match based on netmask group) it seems 1.5.0 has been working very well by doing the collectgarbage() and - possibly #8538?

@rgacogne
Copy link
Member

rgacogne commented Aug 10, 2020

So:

@yiyuandao
Copy link
Author

yiyuandao commented Aug 11, 2020

@rgacogne Thanks for your help.

I will re-test #9378 and #9379

test result:
#9378 fixes the the newServer/rmServer issue
#9379 fixes the addAction/rmRule memory leak issue

@rgacogne
Copy link
Member

rgacogne commented Aug 12, 2020

Thanks for reporting the issue and for testing the fixes, much appreciated!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants