Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Socket activation & unix domain socket #71

Closed
nikolay-turpitko opened this issue Jan 22, 2015 · 8 comments
Closed

Socket activation & unix domain socket #71

nikolay-turpitko opened this issue Jan 22, 2015 · 8 comments

Comments

@nikolay-turpitko
Copy link

In coreos/go-systemd/activation/listeners.go, line 34 method net.FileListener() is used to create net.Listener from file. For unix domain sockets it returns UnixListener, whose method Close() unlinks it, which leads to deletion of the file in the file system. Here is method Close() from unixsock_posix.go:

// Close stops listening on the Unix address.  Already accepted
// connections are not closed.
func (l *UnixListener) Close() error {
    if l == nil || l.fd == nil {
        return syscall.EINVAL
    }

    // The operating system doesn't clean up
    // the file that announcing created, so
    // we have to clean it up ourselves.
    // There's a race here--we can't know for
    // sure whether someone else has come along
    // and replaced our socket name already--
    // but this sequence (remove then close)
    // is at least compatible with the auto-remove
    // sequence in ListenUnix.  It's only non-Go
    // programs that can mess us up.
    if l.path[0] != '@' {
            syscall.Unlink(l.path)
    }
    return l.fd.Close()
}

I think this behavior conflicts with socket activation, because systemd doesn't anticipate this and doesn't recreate file or replace socket. So, when I kill/reload a socket activated service (for test), systemd restarts it with the same socket. but without correspondent file, which leads to immediate crash of service with "file not found" error. In my opinion, this disable important use case of socket activation.

I tried to use abstract names (starting with '@') and they work perfectly (see 'if' block in the code above). Unfortunately, server, which I use as a proxy, doesn't support them.

I am about to implement somewhat cumbersome solution in my own code, but may be you have a better idea how to deal with this issue (may be in your lib, or in it's client code). My idea is to wrap net.Listener, returned from activation.Listeners() with my implementation of net.Listener interface and replace method Close() so, that for UnixListener it do nothing, but with preceding invocation of activation.Files() I am going to obtain and remember file descriptors to be able to close them before exit. A bit better, I can store file descriptor in the wrapper as well and close it in my Close() method.

@nikolay-turpitko
Copy link
Author

After some pondering I decided to reopen this issue. Sure, it's an issue of the standard library, not entirely yours, but I think you guys have more expertise to rise and express it to library developers.

The problem is that I need to be able to close unix domain socket, provided by systemd, without unlinking it from a file system. And using net.UnixListener API I cannot.

From http://www.freedesktop.org/software/systemd/man/systemd.socket.html: "A daemon listening on an AF_UNIX socket may, but does not need to, call close(2) on the received socket before exiting. However, it must not unlink the socket from a file system".

So, I have an option not to close socket, but in this case program cannot return from Server.Serve() or Listener.Accept() when receives signal to terminate and hangs till systemd forcibly kills it (not giving chance to final cleanup).

Alas, my previously offered "solution" doesn't address this issue. It works in a simple test as a no-op Listener.Close() and leaves socket opened, because file descriptor get duplicated in net.FileListener(f) and though I close my descriptor, there is still an opened descriptor in the UnixListener internal field.

@nikolay-turpitko
Copy link
Author

I re-wrote test to better illustrate the issue, see https://github.com/nikolay-turpitko/go-systemd/commit/578612f6e0c4a1610641271f90169026e1887fd7

Notice constants in listen.go at line 43 and in listener_test.go at line 43. They allow to try different behavior - with one or several forks of child process, with or without method Close() invocation.

The issue is, that with unix domain sockets second fork of the child process terminates with error "dial unix /tmp/activation-test-1.sk: no such file or directory". And without Close() process just hangs up.

I think, that it hangs up in the https://golang.org/src/net/fd_unix.go, line 404, method func (fd *netFD) accept() (netfd *netFD, err error). It seems, that accept() at line 416 should invoke accept4 system call with my kernel, so the problem is to somehow break out of that endless for loop.

@nikolay-turpitko
Copy link
Author

Here is my attempt to handle termination signal: https://github.com/nikolay-turpitko/go-systemd/commit/5bdb401973017d432adb9f5104eca62c580de769

We cannot just kill process or exit from program in the signal handler, since in the real program we want to execute some additional cleanup and graceful-shutdown logic. We cannot close unix domain socket, but could we cause it to return EINTR, as described in http://linux.die.net/man/7/signal ?

@nikolay-turpitko
Copy link
Author

I tried to prevent file deletion without code modification, but it didn't lead to desired result in my app (and I don't know how to provide simple test for this attempt). My idea was to put socket file into folder with restricted access, so that only systemd could manipulate files in that folder. Code in UnixListener.Close() doesn't check the result or error of syscall.Unlink(), so I assumed, that it will not be able to delete file and will go on, close file descriptor and method http.Serve() will exit with error, so that rest of program will be able to perform cleanup logic and exit. But my service hung up at termination nevertheless along with socket unit. Don't know, whether I messed something with privileges or it just not work as I suppose it should.

So, as a last resort, I tried to install golang from sources and added this method to UnixListener (the same code as in Close(), but with syscall.Unlink() completely removed):

func (l *UnixListener) CloseFD() error {
    if l == nil || l.fd == nil {
        return syscall.EINVAL
    }
    return l.fd.Close()
}

and invoke it in my test as this:

func (l *noCloseListener) Close() error {
    switch l.Listener.(type) {
    case *net.UnixListener:
            {
                return l.Listener.(*net.UnixListener).CloseFD()
            }
    }
    return l.Listener.Close()
}

With this fix tests seems to work as expected.

Now, could someone help me by reviewing my tests and support in discussion of this issue with guys from golang team?

@nikolay-turpitko
Copy link
Author

I started a discussion here: https://groups.google.com/forum/#!topic/golang-nuts/UtBR4IfgaEw

@nikolay-turpitko
Copy link
Author

I was advised to reorganize my code to leave sockets opened and goroutines unfinished, which I managed to do. Nevertheless, I have a strong filling that code would be more accurate and straightforward if it could be possible to call something like listener.(*net.UnixListener).SetUnlinkOnClose(false) so that it would be safe to call UnixListener.Close() and reuse existing code which works for tcp listeners.

@freeekanayaka
Copy link

Following up this old issue since I stumbled into it while working on LXD and reading @stgraber's workaround (referenced just above here).

@nikolay-turpitko I think your report and analysis were quite clear and compelling, and I'm surprised that Ian Lance Taylor was so dismissive initially, when replying to your post on golang-nuts ("This means making the API more complex, which is something we only
want to do where necessary", and well in this case it is necessary).

Anyway, it seems at some point the Go upstream authors have actually acknowledged the issue and also went for a fix which is basically the same proposed here (SetUnlinkOnClose):

https://golang.org/pkg/net/#UnixListener.SetUnlinkOnClose

This was introduced in Go 1.8.

@freeekanayaka
Copy link

Actually I just realized that, although the SetUnlinkOnClose API is available only since Go 1.8, actually the fix for the bug was merged already in Go 1.6, after an issue identical to this one was reported by a Docker folk:

golang/go#11826

The relevant commit to master that closes the issue is dated 5 December 2015:

golang/go@a4fd325

So anyone using >=1.6 will be safe.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants