Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

server: on macOS, you can start two processes with the same --port as long as one specifies --listen-addr #40708

Open
andreimatei opened this issue Sep 12, 2019 · 14 comments
Labels
A-cli-server CLI commands that pertain to CockroachDB server processes A-server-networking Pertains to network addressing,routing,initialization B-os-macos Issues specific to macOS. C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. S-3-ux-surprise Issue leaves users wondering whether CRDB is behaving properly. Likely to hurt reputation/adoption. T-server-and-security DB Server & Security

Comments

@andreimatei
Copy link
Contributor

andreimatei commented Sep 12, 2019

This has bit me several times already: roachprod local starts the first node as

cockroach start --insecure [...] --listen-addr=127.0.0.1 --port=26257 --http-port=26258

This does not prevent you from starting another node as

./cockroach start --insecure --logtostderr --port=26257

It does, however, prevent you from starting a node as

./cockroach start --insecure --logtostderr --port=26257 --listen-addr=127.0.0.1

This is most unfortunate, because on several occasions I had a roachprod local cluster running in the background, but then naively started another node for unrelated work, and tried to connect to it, only to connect to the wrong cluster.

I'm assuming node startup doesn't fail as long as there's any interfaces on which the port is available. But I think it should fail if the port is taken on any interface, and force you to specify a listen-addr if you really want to do something funky.

The code around this is very simple; we just call net.Listen(), which doesn't fail when I'd like it to fail.

ln, err := net.Listen("tcp", *addr)

@bdarnell , do you happen to have any suggestion here? I'd be happy to implement it.

Epic: CRDB-549

Jira issue: CRDB-5519

@andreimatei andreimatei added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. S-3-ux-surprise Issue leaves users wondering whether CRDB is behaving properly. Likely to hurt reputation/adoption. A-kv-server Relating to the KV-level RPC server labels Sep 12, 2019
@andreimatei andreimatei added this to Incoming in KV via automation Sep 12, 2019
@bdarnell
Copy link
Member

I think this may be macOS-specific and may be related to ipv6: https://stackoverflow.com/questions/51071020/golang-net-listen-binds-to-port-thats-already-in-use

It looks like what may be happening is that it binds to ipv6 by default. IPv4 connections seem to be allowed (localhost-specific magic?) if there's nothing else bound to the ipv4 port, but it doesn't count as a conflict if there's something there.

This seems like it's mainly suboptimal behavior in the kernel, but one thing we could do to work around it is to call net.Listen twice, once with "tcp4" and once with "tcp6", and abort if either one fails (unless we're binding to port 0, in which case we must only bind once).

@bdarnell bdarnell removed their assignment Oct 30, 2019
@knz knz added this to To do in DB Server & Security via automation May 7, 2020
@knz knz moved this from To do to Linked issues (see roadmap) in DB Server & Security May 8, 2020
@lunevalex lunevalex moved this from Incoming to Server/CLI in KV Jul 27, 2020
@lunevalex lunevalex removed this from Server/CLI in KV Apr 23, 2021
@github-actions
Copy link

github-actions bot commented Jun 4, 2021

We have marked this issue as stale because it has been inactive for
18 months. If this issue is still relevant, removing the stale label
or adding a comment will keep it active. Otherwise, we'll close it in
5 days to keep the issue queue tidy. Thank you for your contribution
to CockroachDB!

@knz
Copy link
Contributor

knz commented Jun 5, 2021

yes this is working as intended - having two processes, one listening on 0.0.0.0:26257 and the other on 127.0.0.1:26257, is valid.

@knz knz closed this as completed Jun 5, 2021
DB Server & Security automation moved this from Linked issues (from the roadmap columns on the right) to Done 21.2 Jun 5, 2021
@andreimatei
Copy link
Contributor Author

yes this is working as intended - having two processes, one listening on 0.0.0.0:26257 and the other on 127.0.0.1:26257, is valid.

It's a bit dismissive when two people who are not clueless are talking about an issue and the 3rd closes it by simply stating that it's working "as intended", because it begs the question of intended by whom? Also, you say it's intended and valid, but that's not how the system works on Linux - I've just tried and the second instance refuses to use the port. As Ben said, this seems to be a Mac only behavior.

@andreimatei andreimatei reopened this Jun 7, 2021
DB Server & Security automation moved this from Done 21.2 to To do Jun 7, 2021
@knz
Copy link
Contributor

knz commented Jun 7, 2021

My point remains. It's a Macos thing that is intended by apple.

You would get the same behavior with nc or even a plain postgres server.

What do you think crdb should do about it?

@andreimatei
Copy link
Contributor Author

My point remains. It's a Macos thing that is intended by apple.

Well some low level behavior of some system call being "intended by Apple" does not generally translate into high-level CRDB behavior...

You would get the same behavior with nc or even a plain postgres server.

I've just tried, and I don't believe this to be true for Postgres. I've configured one pg server to bind to all interfaces, and another one to bind to localhost. If they're both trying to use the same port, the second one doesn't start. It says that another Postmaster is already using socket file /tmp/.s.PGSQL.5432, and then in another part of the log it refers to that file as a "lock file".

What do you think crdb should do about it?

One way or another, I would like for CRDB on OSX tro behave like it behaves on Linux (or at least how it appears to me to behave on Linux) - if I ask it to bind to all interfaces on port x, but it can't bind to some of them, I want it to fail.
Ben had a suggestion above about splitting the binding calls between the ipv6 and ipv4 interfaces. I think we should try that. Although it's not obvious to me that this is a ipv4 vs ipv6 thing, as opposed to more generally a single interface vs multiple interfaces thing.
If that doesn't work, we could try some lock files like pg apparently uses. PG does not appear to let you run two servers using the same port even if they're configured to bind on different interfaces. As far as I'm concerned, I could go either way on that.

@knz
Copy link
Contributor

knz commented Jun 7, 2021

The behavior with "lock files" works with crdb already, if you let it configure a Unix socket. I remember ben was opposed to enabling the Unix socket by default, but maybe that would be a good use case for it.

@bdarnell
Copy link
Member

bdarnell commented Jun 8, 2021

I agree with Andrei that I don't think we can call this "working as intended". Regardless of apple's intentions here, our intention is that if you ask us to bind on all interfaces, you should get an error unless we were able to bind the requested port on all interfaces. But it's a really weird edge case - how likely are you to hit this situation without roachprod setting --listen-addr for you? I don't think it's worth changing the way we bind our ports to get a better error message in this case. There's a non-trivial risk of breaking something by doing that (I called out the issues around port 0, but there may be other things I'm not thinking of at the moment).

@knz
Copy link
Contributor

knz commented Jun 8, 2021

Ben how do you feel about setting up the unix socket by default?

@bdarnell
Copy link
Member

bdarnell commented Jun 8, 2021

I don't understand how a unix socket helps here. Note that the invocations here must be running in different working directories because we're not running into store-level lock files, so where would we put the unix socket to do something useful here?

@knz
Copy link
Contributor

knz commented Jun 8, 2021

The unix socket is typically created in /tmp or /var/run regardless of the directory you run your server in.
Also the unix socket is called .s.PGSQL.NNNN where NNNN is the port number.

This is exactly the functionality that makes PostgreSQL (and presumably mysql and other db servers) work the way that andrei wants on macos.

@bdarnell
Copy link
Member

bdarnell commented Jun 8, 2021

Unix sockets aren't great as lockfiles because they are often left behind if a process exits uncleanly. It's very common to blindly delete them (which is allowed, even if they are in use). If we turned on this socket automatically it would cause other headaches and we'd need to figure out a safe way to clean up. We shouldn't turn on unix sockets just because we want a lockfile.

@knz knz moved this from To do to Cold storage in DB Server & Security Jun 9, 2021
@knz
Copy link
Contributor

knz commented Jun 9, 2021

Ok I'm going to leave this issue open, but not touching this until we have a concrete, specific solution in mind with a clear analysis that it's not going to break existing behavior (e.g. the handling of auto-allocated port numbers).

@knz knz added B-os-macos Issues specific to macOS. A-cli-server CLI commands that pertain to CockroachDB server processes and removed A-kv-server Relating to the KV-level RPC server labels Jun 9, 2021
@knz knz changed the title server: you can start two processes with the same --port as long as one specifies --listen-addr server: on macOS, you can start two processes with the same --port as long as one specifies --listen-addr Jun 9, 2021
@jlinder jlinder added the T-server-and-security DB Server & Security label Jun 16, 2021
@knz knz added the A-server-networking Pertains to network addressing,routing,initialization label Jul 29, 2021
@github-actions
Copy link

We have marked this issue as stale because it has been inactive for
18 months. If this issue is still relevant, removing the stale label
or adding a comment will keep it active. Otherwise, we'll close it in
10 days to keep the issue queue tidy. Thank you for your contribution
to CockroachDB!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-cli-server CLI commands that pertain to CockroachDB server processes A-server-networking Pertains to network addressing,routing,initialization B-os-macos Issues specific to macOS. C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. S-3-ux-surprise Issue leaves users wondering whether CRDB is behaving properly. Likely to hurt reputation/adoption. T-server-and-security DB Server & Security
Projects
DB Server & Security
  
Cold storage
Development

No branches or pull requests

4 participants