Skip to content

file descriptor leak on UDP truncate and TCP retry when under continuous throughput #209

@saj

Description

@saj

Hello. I am filing a report on behalf of our Ruby developers. Their production jobs have been failing with EMFILE.

dnsruby appears to leak file descriptors when both of the following factors are present.

  1. dnsruby must not be allowed to idle for more than ~one second
  2. a query over UDP must result in a truncated response (tc); the query is retried over TCP

So long as condition (1) holds, every (2) appears to leak a file descriptor (fd).

A repro can be found at the bottom of this post.

Too many open files @ rb_sysopen - / (Errno::EMFILE)

When talking exclusively on UDP, dnsruby will immediately close the fd at the conclusion of the exchange. The bug is not visible here.

If the UDP response is truncated (because it does not fit into a UDP datagram), however, dnsruby will fail to immediately close the fd belonging to its UDP socket. Instead, this fd goes into a pool of fds to be closed 'later'. The problem is that 'later' may never come around.

https://github.com/alexdalitz/dnsruby/blob/v1.72.4/lib/dnsruby/select_thread.rb#L228-L235

AFAICS, this branch is only evaluated if no queries are currently in-flight and if no I/O events were observed in a while. If we push even one query through dnsruby every ~half-second -- because we are busy processing many queued jobs -- it will never close any of the fds that it had set aside. This behaviour presents like an fd leak.


The example below assumes a Linux host environment. Comment out the call to system if you are on another platform; it is only there to monitor fd growth.

#!/usr/bin/env ruby

# frozen_string_literal: true

require 'dnsruby'
require 'thread'

#Dnsruby.log.level = Logger::DEBUG

# speed up the repro
NFILES = 30
nfiles, _ = Process.getrlimit(Process::RLIMIT_NOFILE)
Process.setrlimit(Process::RLIMIT_NOFILE, NFILES) if nfiles > NFILES

NAMESERVERS = ["192.31.80.30"]

Thread.new {
  res = Dnsruby::Resolver.new(nameserver: NAMESERVERS, do_caching: false, query_timeout: 5)
  loop do
    begin
      res.query("blahblahblah.com.edgekey.net", "CNAME")
    rescue Dnsruby::ResolvError
    end
    sleep 0.2
  end
}

loop do
  system("ls -l /proc/#{Process.pid}/fd/")
  File.open("/") { |_| }
  sleep 1
end

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions