Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bad abstract socket name? #85

Closed
CendioOssman opened this issue Nov 28, 2018 · 14 comments
Closed

Bad abstract socket name? #85

CendioOssman opened this issue Nov 28, 2018 · 14 comments

Comments

@CendioOssman
Copy link

I assume this (from 19c25dd) was a mistake:

memset(&addr, 0, sizeof(struct sockaddr_un));

It causes irqbalance to bind to an abstract socket consisting of just \0s. E.g.:

$ sudo ss --unix -l -p | grep irq
u_str             LISTEN              0                    1                    @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 227776                                                * 0                        users:(("irqbalance",pid=15106,fd=5))      

(accidentally discovered because we had a program that also screwed up its UNIX socket address and tried to connect to it)

@ppwaskie
Copy link
Contributor

Actually, I don't think it is a mistake. Originally irqbalance was only using an abstract socket, which was causing issues (see issue #72 ). So we opted to try making a file-based socket first in tmpfs, and if that failed, fall back to an abstract socket.

I'd look into why irqbalance failed to create a file-based socket on your system first. It will try to make it in /var/run (assuming you're using systemd).

An abstract socket is where the first byte is null. The rest of the string is meaningless more or less.

@nhorman
Copy link
Member

nhorman commented Nov 28, 2018

I can't really say if it was intentional or not. @ppwaskie, you're absolutely right, we should almost always be binding to a path based socket. As for the use of sun_path if we fall into the abstract socket case, the man page says this about it:

  • abstract: an abstract socket address is distinguished (from a pathname socket) by the fact that sun_path[0] is a null byte ('\0'). The socket's address in this namespace is given by the addi‐
    tional bytes in sun_path that are covered by the specified length of the address structure. (Null bytes in the name have no special significance.) The name has no connection with filesystem
    pathnames. When the address of an abstract socket is returned, the returned addrlen is greater than sizeof(sa_family_t) (i.e., greater than 2), and the name of the socket is contained in the
    first (addrlen - sizeof(sa_family_t)) bytes of sun_path

From that I read that bytes 1-107 of sun_path define an arbitrary name for the socket, and the implication is that 106 '\0' is perfectly valid, if a little vauge. I'm not sure if that was intentional, but it should work just fine.

I agree that the first course of action here should be figuring out why the path based socket bind failed.

@CendioOssman
Copy link
Author

From that I read that bytes 1-107 of sun_path define an arbitrary name for the socket, and the implication is that 106 '\0' is perfectly valid, if a little vauge. I'm not sure if that was intentional, but it should work just fine.

Well, it is the abstract socket equivalent of "/var/run/sock" so it is very generic. :)
Something specific to irqbalance would be good form IMO. The previous one was "\0irqbalance<pid>.sock", which was a good, unique name.

(it also does clutter up ss a bit with such a long name, see above)

I agree that the first course of action here should be figuring out why the path based socket bind failed.

This was more a courtesy issue rather than me having a practical problem that needed to be solved. But I can tell you that this happens on every Debian Buster system we've seen (we've had two customer reports, and seen it on our own test install here).

It also only happens when systemd starts the daemon, not when started by hand. Which makes it difficult to strace things unfortunately...

@ppwaskie
Copy link
Contributor

Appreciate the report. I'll go ahead and get a patch together to name it something more sane. I agree having a meaningful name in the abstract path would still be good.

When systemd starts it, can you get anything out of journalctl or even systemctl on the daemon itself why it failed? There is debugging output when it fails to bind to the filesystem socket, so maybe there is something there if you can look. Otherwise I can try to get a VM going to attempt to reproduce (I'm mostly Gentoo and a sprinkle of Fedora).

@CendioOssman
Copy link
Author

The default logging unfortunately produces nothing of value. This is it:

Nov 29 09:32:31 dhcp-253-139 systemd[1]: Started irqbalance daemon.

I added --debugto the command line, which gave me a lot of output, but unfortunately all I got about this issue was:

Nov 29 10:02:11 dhcp-253-139 irqbalance[17526]: Daemon couldn't be bound to the file-based socket.

@nhorman
Copy link
Member

nhorman commented Nov 29, 2018

@ppwaskie perhaps you could also add to your patch a change to that log message above which reports what the exact errno value is (or converts it via strerror), so we can have a bit more debug visibility in the log? Thanks!

@ppwaskie
Copy link
Contributor

ppwaskie commented Dec 3, 2018

@nhorman yep, sorry I missed this (got pushed down my email stack). Working on the patch now to include both the name fix and better errno logging.

@nhorman
Copy link
Member

nhorman commented Dec 3, 2018

@ppwaskie Thank you! Appreciate the extra effort!

@bigon
Copy link
Contributor

bigon commented Jan 12, 2019

In debian we have this bug https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=915834

It seems that the file socket has not been created because the systemd .service file set /run as read only, see: https://github.com/Irqbalance/irqbalance/blob/master/misc/irqbalance.service#L15

Edit: Removing the "ro" fixes allows the file socket to be created, but it seems to break irqbalance-ui

@nhorman
Copy link
Member

nhorman commented Jan 14, 2019

How does it break irqbalance-ui break? It seems to work fine for me

@andhe
Copy link

andhe commented Jan 14, 2019

How does it break irqbalance-ui break? It seems to work fine for me

If you use "TemporaryFileSystem=/run" in the service file AIUI systemd will mount a new tmpfs in the services namespace/cgroup. (This will also make file-based sockets work, and thus be used instead of abstract socket.)

You then likely run irqbalance-ui in your user session (separate namespace/cgroup) which sees a different tmpfs on /run. The -ui process will thus not be able to locate the file that the daemon created since they both use the same path, but are still looking in two different tmpfs'.

The (temporary) solution I used in the debian package for now was to completely remove the "TemporaryFileSystem=..." line and append /var/run to the ReadWritePaths.
This exposes the systems /run to the daemon and the irqbalance-ui can thus find the socket, but comes with the cost that it's not as restricted as before so in an event of a security issue the daemon has full access to all of /run.

@bigon
Copy link
Contributor

bigon commented Jan 14, 2019

IMHO, you could use RuntimeDirectory= set to /run/irqbalance in the .service file and put the socket file there with a well-known name /run/irqbalance/irqbalance.socket (?) and then the -ui will be able to find it and connect to it.

RuntimeDirectory= will make sure the directory exists when the service is started and the directory will be deleted when stopped

@nhorman
Copy link
Member

nhorman commented Jan 14, 2019

I agree with @bigon. The purpose of the TemporaryFileSystem directive was to isolate the unix socket from other processes so that it couldn't be exploited, but that obviously doesn't work because at least one other process (the irqbalance-ui command), needs to be able to access it, and is by definition in a separate namespace. The other option here is to have irqbalance-ui read the daemons pidfile (if it created one), and do a setns on its mount namespace so that it could access the needed socket, but thats feeling a little to rube goldbergish to work reliably. I think, since we only write the socket to this temporary file we can leave it in the common namespace. I'll make that change shortly.

@nhorman
Copy link
Member

nhorman commented Jan 14, 2019

fixed with commit f5ca2eb

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants