-
-
Notifications
You must be signed in to change notification settings - Fork 80
Description
Hello. I am filing a report on behalf of our Ruby developers. Their production jobs have been failing with EMFILE.
dnsruby appears to leak file descriptors when both of the following factors are present.
- dnsruby must not be allowed to idle for more than ~one second
- a query over UDP must result in a truncated response (
tc); the query is retried over TCP
So long as condition (1) holds, every (2) appears to leak a file descriptor (fd).
A repro can be found at the bottom of this post.
Too many open files @ rb_sysopen - / (Errno::EMFILE)
When talking exclusively on UDP, dnsruby will immediately close the fd at the conclusion of the exchange. The bug is not visible here.
If the UDP response is truncated (because it does not fit into a UDP datagram), however, dnsruby will fail to immediately close the fd belonging to its UDP socket. Instead, this fd goes into a pool of fds to be closed 'later'. The problem is that 'later' may never come around.
https://github.com/alexdalitz/dnsruby/blob/v1.72.4/lib/dnsruby/select_thread.rb#L228-L235
AFAICS, this branch is only evaluated if no queries are currently in-flight and if no I/O events were observed in a while. If we push even one query through dnsruby every ~half-second -- because we are busy processing many queued jobs -- it will never close any of the fds that it had set aside. This behaviour presents like an fd leak.
The example below assumes a Linux host environment. Comment out the call to system if you are on another platform; it is only there to monitor fd growth.
#!/usr/bin/env ruby
# frozen_string_literal: true
require 'dnsruby'
require 'thread'
#Dnsruby.log.level = Logger::DEBUG
# speed up the repro
NFILES = 30
nfiles, _ = Process.getrlimit(Process::RLIMIT_NOFILE)
Process.setrlimit(Process::RLIMIT_NOFILE, NFILES) if nfiles > NFILES
NAMESERVERS = ["192.31.80.30"]
Thread.new {
res = Dnsruby::Resolver.new(nameserver: NAMESERVERS, do_caching: false, query_timeout: 5)
loop do
begin
res.query("blahblahblah.com.edgekey.net", "CNAME")
rescue Dnsruby::ResolvError
end
sleep 0.2
end
}
loop do
system("ls -l /proc/#{Process.pid}/fd/")
File.open("/") { |_| }
sleep 1
end