Skip to content

VIP 17: Enable Unix domain sockets for listen and backend addresses

Nils Goroll edited this page Nov 19, 2018 · 19 revisions

Closed, implemented by and large

Some minor details of the implementation differ from this proposal (e.g. the peer credentials info functions went into vmod unix instead of std.

Synopsis

Allow Unix Domain Sockets (UDS) as listen addresses for Varnish (-a, -T and -M options) and as addresses for backends. Ideally also obtain credentials of the peer process connected on a UDS, such as uid and gid, for use in VCL.

Named listen addresses

This is not directly related to UDS, but this change would solve some of the problems and mitigate some the complexity raised by the original draft. Because this change has already been accepted, there is no VIP to link to, and no documentation to refer to until it is implemented. For convenience it is described here.

Influence

This feature is similar to how storage backends are exposed in VCL, they have a name that can then be used in VCL, and when a name is omitted, generic names are attributed (s0, s1, sN etc).

Example: varnishd -s malloc,10G -s video=malloc,100G [...]

You end up with 3 storage backends called s0, video and Transient, and as such have access in VCL to the following symbols and their respective fields:

  • storage.s0
  • storage.video
  • storage.Transient
  • (and storage.<name>.*, see man vcl)

You can then have this kind of logic in VCL:

sub vcl_backend_response {
    if (beresp.http.content-type ~ "video") {
        set beresp.storage = storage.video;
    } else {
        set beresp.storage = storage.s0;
    }
}

The advantage of beresp.storage over beresp.storage_hint is the strong typing guaranteeing that VCL won't compile if there is a typo in the storage name.

Implementation

Named listen addresses will work like storage backends in that regard (generic names being a0, a1, aN etc).

Example: varnishd -a public_http=:80 -a public_https=:8443,PROXY -a admin=:1234 [...]

You can then use the logical names in your VCL:

sub vcl_recv {
    if (local.address == listen_address.public_http) {
        # do an https redirect for example
    }
    if (req.method == "PURGE") {
        if (local.address != listen_address.admin) {
            return (synth(405));
        }
        return (purge);
    }
}

Actual names of the variables used to access this information in VCL hasn't been decided yet.

The benefits are the ability to reuse the same VCL when all varnishd instances in a cluster may not be able to provide consistent listen interfaces or port numbers.

String conversion

Objects of type listen_address could be used where strings are expected and be converted to the address part of the -a option (that is, excluding the parameters).

Example: varnishd -a public_http=:80 -a public_https=:8443,PROXY -a admin=:1234 [...]

sub vcl_deliver {
    set resp.http.Address = local.address;
}

A non-synthetic response may contain one of the following headers:

  • Address: :80
  • Address: :8443

In the case of unix domain sockets, automatic conversion to a string could be used for regular expression matching of the paths for example:

sub vcl_recv {
    if (req.method == "PURGE") {
        # there may be more than one admin UDS
        if (local.address !~ "admin\.sock$") {
            return (synth(405));
        }
        return (purge);
    }

}

Security concerns

This is not a security feature despite what all the examples above may suggest. Using this as a security measures implies the assumption that the network is actually secured before traffic hits Varnish on the admin listen address for example (firewalls and all that jazz).

phk: I don't agree entirely, the root@ may want to restrict the paths to backends.

dridi: I'm not sure I understand, this is not about UDS yet, only named listen addresses in general.

Testing

We can expose additional macros for listen addresses. For example with a v1 varnish instance:

  • v1_addr: the first listen address
  • v1_port: the first listen port
  • v1_sock: the first listen address+port
  • v1_addr_a0: a0's listen address
  • v1_port_a0: a0's listen port
  • v1_sock_a0: a0's listen address+port

Benefits

Once again strong typing, because port numbers in VCL and in the varnishd command line may get out of sync without being noticed. Here a typo in the name prevents the VCL from compiling. It's also a transport-independent alternative to ACLs, as shown in the purge example above.

Being transport-independent, it also means that it can accommodate future transports, like for example unix domain sockets described below.

Why?

The main reason to use a UDS is that it works like TCP sockets (reliable bidirectional byte stream behind a file descriptor) and would likely not be too intrusive in the existing code base.

Other noteworthy reasons:

  • Eliminate the overhead of TCP/loopback for connections with peers that are co-located on a host with Varnish
  • The possibility to query the peer process credentials and restrict access using regular filesystem permissions

A common case for co-locating Varnish with a peer is the need of a TLS proxy for HTTPS. On both client and backend sides, a UDS should work seamlessly with the PROXY protocol.

How?

Listen address notation

On the listen side, expecting an absolute path would prevent ambiguity with IP addresses or ports:

varnishd -a /path/to/http.sock -T /path/to/cli.sock [...]

As it is common with other varnishd options, we can pass additional parameters:

varnishd -T /path/to/cli.sock,uid=varnish,gid=varnish

However this introduces an ambiguity for PROXY protocol in the -a option. The syntax can be changed to:

varnishd -a /path/to/http.sock,proto=<proto>,uid=varnish,mode=0600 [...]

The -M option being of the connect persuasion, it wouldn't take additional parameters to the absolute path.

Backend address notation

On the backend side we can avoid ambiguity by introducing a new .path field:

backend local {
    .path = "/path/to/backend.sock";
    # or maybe .unix or .uds instead?
}

The .path field would be enough in itself to declare a backend (like .host) and would be mutual exclusive with .host and .port.

By adding a parameter (for example uds_path) akin to vcl_path and vmod_path to maintain a PATH where to look sockets up we could allow relative paths on the backend side.

Peer credentials

Getting the peer credentials is not portable, and the least common denominator seems to be the euid and egid. We probably want to extract them both as names and numbers. See Geoff's draft for the technical details.

VCL/VRT

The backend notation was already described above, but filed under the "notation" category rather than VCL. This section is more about the VCL changes in the context of a transaction.

IP addresses

The obvious implication of a UDS listen address is the lack of values for the *.ip variables (same on the backend side for beresp.backend.ip).

This could be solved by making all uses of VCL_IP gracefully fail in the presence of a NULL IP address. So an ACL match '~' would always fail and a negative match '!~' would always succeed.

What happens when a UDS gets IP addresses from a PROXY header? One solution could be to set server.ip and client.ip accordingly and leave the local.ip and remote.ip variables NULL. It would preserve this pattern:

sub vcl_recv {
    if (local.ip != server.ip) {
        # PROXY protocol detected
        set req.X-Forwarded-Proto = "https"; # for instance
    }
}

port, euid, egid

Much like we may access port numbers via *.ip variables, we want to access credentials of a UDS peer. We can do that using the std VMOD.

In the case of std.port, it could fail gracefully like ACLs when a NULL IP address is submitted by returning -1.

The std VMOD could then learn new functions:

  • std.uid
  • std.gid
  • std.uid_name
  • std.gid_name

Example:

import std;

sub vcl_recv {
    std.log("euid: " + std.uid(local.address));
}

If local.address is not a UDS, numeric variants could also return -1 and name variants could return NULL. The functions could also take fallback parameters, possibly with a default value to the ones suggested (-1 and NULL).

The consensus seems to lean towards naming functions by omitting the "effective" e from e[ug]id.

beresp.backend.ip

This variable should obviously be NULL in the case of a UDS backend if we follow the rules described above. However it is already possible to write a backend implementation not based on TCP/IP (see fsbackend for example) and NULL seems to already be the way to go.

The question here is more whether we need something like beresp.backend.path in addition to the ip field. Same question for peer credentials, they probably don't make sense for backends (and that would keep the new std functions limited to the listen addresses type).

local.address == listen_address.<name>

For std.uid to provide anything useful, we need a peer that a static listen_address.<name> has no reason to have. To enable strong typing, the == operator should be backed by a VRT function that checks for equivalence except for the peer. The structs behind listen_address.<name> could have a negative file descriptor for the peer for example.

Another possible useful VRT function would be to find the corresponding listen_address.<name> of a local.address for VMODs looking for a safe pointer outliving a transaction.

Needs further discussions

  1. phk: What happens to struct suckaddr ? We added that to avoid lugging around sockaddr_storage all over the place and it shaves something like 4x96 bytes off the size of a session ?

dridi: In the case of a UDS, we can keep track of the sockaddr_un with the rest of the -a parameters and use a pointer to that "pseudo-static" struct in the suckaddr union. That shouldn't increase the overall size.

  1. phk: On the VCL side, what happens if in the future a jail performs a chroot? Users would have similar problems with today's std.fileread.

dridi: That would indeed be a problem for backends.

  1. phk: During the first planning session for Varnish 6 we agreed that UDS addresses would be kept separate from suckaddr. (How?)

dridi: See question 1, then we can figure what to do in code branching on the suckaddr type.

  1. dridi: Is the question of naming from the original draft still relevant?

  2. phk: What happens if the VCL asks for remote.ip.port() ?

dridi: I'm supposed to answer that in the VCL/VRT section but I haven't yet. I need to browse the planning session logs because I think we agreed that with the lack of IP address, *.ip should be NULL and IP-related facilities (eg. ACLs, std.port...) should gracefully fail if they encounter NULL.

  1. phk: What happens if the VCL asks for remote.ip.uid() on a IPv4/6 socket ?

dridi: Same as question 5, although with subtle differences. In both questions the syntax is wrong anyway.

  1. dridi: the section on beresp.backend.ip needs further discussion too.

Implementation

Pull Request 2371 is a partial implementation of ideas from this VIP, intended to get iterations going toward support for these features in Varnish.

A few proposals in the VIP are not implemented in the PR. The code has helped to give a clearer picture of how they could be done, so the rest of this section describes possible implementations more concretely.

This will not cover named listen addresses, which Dridi will be working on separately.

Unix domain sockets for the CLI/admin interface (-T and -M args)

The -T arg will have to add the capability to define permissions on the created path, as with the -a arg, otherwise clients may not be able to connect. And as with -a, the Varnish admin may choose to restrict access to limit who can connect.

-T /path/to/uds[,[user=<user>,][group=<group>,][mode=<mode>]]

So -T will need the same sub-args as -a except for the PROXY specification.

In the PR, the code to interpret the -a arg and its sub-args is in mgt_acceptor.c. So for -T:

  • Pull the code to parse the sub-args out from mgt_acceptor.c into a utility function, probably in mgt_utils.c.

  • If the -T arg begins with /, then interpret it as a UDS, and call the utility function to parse the sub-args, and bind as for a UDS (mostly following what mgt_acceptor.c does). This would happen in mgt_cli.c

For -M: If the argument begins with /, use VSS_unix instead of VSS_resolve to create the VSA for UDS, and connect as before. We don't need any special handling in this case -- for example, we wouldn't need to do the checking VCC does in the PR for the .path field of a backend definition (stat the path, and check if it is a socket). If the -M arg does not exist or is not a socket, the connect will fail and Varnish will exit soon enough.

Additions to VMOD std

I suggest these new functions:

BOOL std.is_unix(ENUM {LISTEN, BACKEND})
INT std.peer_uid(ENUM {LISTEN, BACKEND})
INT std.peer_gid(ENUM {LISTEN, BACKEND})
STRING std.peer_user(ENUM {LISTEN, BACKEND})
STRING std.peer_group(ENUM {LISTEN, BACKEND})

For the ENUM, choose LISTEN to ask about the -a address at which the current client request was received, and BACKEND for the backend address to which we are currently connected. LISTEN can be used in any VCL sub (by looking up the LOCAL address session attribute of ctx->sp). BACKEND MUST be called in vcl_backend_response only; invoke VCL failure if not.

I think that VCL authors will need is_unix() to clarify some potentially ambiguous situations that have arisen, for example the fact that variables like client.ip can be NULL (in which case they would use the new variable local.path instead). is_unix() would help VCL logic know which is which; it returns true if the address in question is a UDS (VSA_Get_Proto() == PF_UNIX).

To implement the functions to return peer credentials:

  • autoconf checks if getpeereid is supported (that should cover FreeBSD, Darwin and the other *BSDs).

  • otherwise check if the socket option SO_PEERCRED works, the same way we check if other socket options work (covers Linux)

  • otherwise check if getpeerucred is supported (covers Solaris & descendants)

  • if none of the above, then peer credentials are not supported

An internal #ifdef compat function gets uid & gid by attempting these functions in that order of preference: getpeereid, SO_PEERCRED, getpeerucred. If called for a socket that is not UDS, or if peer credentials are not supported, then set uid & gid to -1.

The std.*id() functions return that numeric value. std.peer_user() and std.peer_group return the result of getpwnam/getgrnam, or NULL if the id resulted as -1.

-p param uds_path

The .path field in a backend definition unambiguously identifies the address as a Unix domain socket, we don't need the leading / to determine that. So for .path, it would be possible to allow relative paths, and search for the address using a uds_path parameter.

In the PR, the VSA for a backend UDS address is created in VRT_new_backend(), so this is where the search could take place:

  • If the string in .path begins with /, take that as the path for a Unix domain socket.

  • Otherwise, search for the path relative to each path prefix in uds_path, in the order given in uds_path.

  • Take the first such path for which: stat(2) succeeds (the path exists and is accessible), and stat tells us that the path is a socket. That will be taken as the configured backend address.

Clone this wiki locally