Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bind client-side UDP sockets. #551

Merged
merged 2 commits into from Apr 10, 2020
Merged

Conversation

danielballan
Copy link
Collaborator

@danielballan danielballan commented Jan 18, 2020

This fixes the critical Windows bug introduced in v0.4.0. Tested
interactively on a Windows VM.

Closes #540
Closes #514

This takes the approach first proposed by @ke-zhang-rd in #543 commit a3fb3d5 and applies it uniformly to all clients. Everywhere that we use caproto._utils.bcast_socket() in a client, we then immediately bind to ('', 0).

I intend to backport this to v0.4.x.

This fixes the critical Windows bug introduces in v0.4.0.

Closes caproto#540
Closes caproto#514

This takes the approach first proposed by @ke-zhang-rd
in caproto#543 commit a3fb3d5
and applies it uniformly to all clients. Also, we bind
immediately after creating the socket.
@danielballan
Copy link
Collaborator Author

A little more explanation:

My understanding is that udp_sock is implicitly bound the first time we call send_to on it (source). The problem arises when we call udp_sock.getsockname() before we have called send_to for the first time. On Linux and BSD/Darwin, we get back ('0.0.0.0', 0) at this stage but on Windows an exception is raised.

I don't think there is any harm in explicitly binding earlier so that the code that follows can assume that the socket is definitely already bound and safe to call getsockname() on. This is less complex than other approach I can think of, such as (1) ensuring that we never hit a logging code path until it is bound; (2) adding state to explicitly track whether it is bound; or (3) wrapping getsocketname() in a fault-tolerant "safe" wrapper.

@klauer
Copy link
Member

klauer commented Jan 18, 2020

I'm not entirely sure this is the best or most correct solution (consider the rationale presented in https://laforge.gnumonks.org/blog/20171020-local_ip_unbound_udp/), as specifically it may not give you the address you expect. Try it yourself with sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM); sock.connect((addr, 5)) to two different subnets on your local network and you should get different results.

I'd also consider the best place for this bind() call would be in bcast_socket itself, and only for platforms where it's required.

@danielballan
Copy link
Collaborator Author

danielballan commented Jan 18, 2020

My first thought was to put it in bcast_socket as well, but we only want it for client sockets. (Servers bind to CA_SERVER_PORT specifically.) But perhaps it would be better to add a kwarg like bcast_socket(client=True).

I'd like to settle this as correctly as we can. I don't think I fully understand the issue yet. Isn't true that sock.sendto binds the socket anyway, and that sock.bind(('', 0)) achieves the same outcome---just explicitly and, for our purposes, conveniently early?

In [21]: sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)                                                                                                                              

In [22]: sock.getsockname()                                                                                                                                                                   
Out[22]: ('0.0.0.0', 0)

In [23]: sock.sendto(b'asdf', ('127.0.0.1', 5))  # implicitly binds?                                                                                                                                             
Out[23]: 4

In [24]: sock.getsockname()                                                                                                                                                                   
Out[24]: ('0.0.0.0', 35984)

In [25]: sock.sendto(b'asdf', ('172.17.0.1', 5))                                                                                                                                              
Out[25]: 4

In [26]: sock.getsockname()                                                                                                                                                                   
Out[26]: ('0.0.0.0', 35984)

@danielballan
Copy link
Collaborator Author

danielballan commented Jan 18, 2020

Try it yourself with sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM); sock.connect((addr, 5)) to two different subnets on your local network and you should get different results.

It seems that connect resolves a specific interface:

In [32]: sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)                                                                                                                              

In [33]: sock.connect(('172.17.0.1', 5000))                                                                                                                                                   

In [34]: sock.getsockname()                                                                                                                                                                   
Out[34]: ('172.17.0.1', 60155)

In [35]: sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)                                                                                                                              

In [36]: sock.connect(('127.0.0.1', 5000))                                                                                                                                                    

In [37]: sock.getsockname()                                                                                                                                                                   
Out[37]: ('127.0.0.1', 47564)

In [38]: sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)                                                                                                                              

In [39]: sock.connect(('', 5000))                                                                                                                                                             

In [40]: sock.getsockname()                                                                                                                                                                   
Out[40]: ('127.0.0.1', 47173)

but bind does not:

In [41]: sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)                                                                                                                              

In [42]: sock.bind(('', 5000))                                                                                                                                                                

In [43]: sock.getsockname()                                                                                                                                                                   
Out[43]: ('0.0.0.0', 5000)

@klauer
Copy link
Member

klauer commented Jan 18, 2020

Let's imagine we have differing EPICS_CA_ADDRESS_LISTs:

  1. Unset, with AUTO=yes; this would mean broadcast to 255.255.255.255 and bind to any/all interfaces; what does getsockname return here? Seems like it picks the first interface on my machine.
  2. With address A. If .connect((A, ...)) indicates that the socket would go through interface 0 with local address IP0, getsockname() should return IP0 as the packet would be routed through that interface, and the destination likely would see it coming from IP0.
  3. With addresses A, B, where A is the same as above, and B going through interface 1 with address IP1.

What should happen for case (3)? getsockname should depend on the destination. Calling .connect when sending to each of those addresses would make getsockname return the appropriate value, or keeping one UDP socket per destination would do the same. .bind((, 0)) would keep it at 0.0.0.0 in my testing.

How about a separate bcast_client_socket function with a target arg? client=True could work, too, I suppose...

@danielballan
Copy link
Collaborator Author

danielballan commented Jan 18, 2020

How about a separate bcast_client_socket function with a target arg?

That sounds good to me. Would we then change self.udp_sock to self.udp_socks, a dict mapping targets [I mean remote addresses] to sockets connected to that target?

@danielballan
Copy link
Collaborator Author

I guess this also tells us that there is no single "client address". Thus the Broadcaster.client_address just added in #544 is not well-defined and should be removed before we issue a release.

@danielballan
Copy link
Collaborator Author

(P.S. Thanks for taking the trouble to walk me through this, @klauer.)

@klauer
Copy link
Member

klauer commented Jan 18, 2020

(If I'm being honest, I was walking myself through it as well ;) )

UDP socket per interface (multiple addresses might map to the same interface, after all) seems the most correct. That would likely be a rather large refactor...

@danielballan
Copy link
Collaborator Author

Socket per interface sounds correct. I'm not clear on how to get that from the target though. If I want to sendto address X, how can I tell which interface the system will use to route the message?

@klauer
Copy link
Member

klauer commented Jan 18, 2020

I was thinking something along the lines of this, if not using the auto address list:

test_socket = socket.socket(...)
sockets_by_interface = {}
sockets_by_ca_addr = {}
for ca_addr in CA_ADDR_LIST:
    test_socket.connect((ca_addr, arbitrary_port))
    interface_addr = test_socket.getsockname()[0]
    if interface_addr not in sockets_by_interface:
        sockets_by_interface[interface_addr] = s = socket.socket(...)
        # s.connect? s.bind?
    sockets_by_ca_addr[ca_addr] = sockets_by_interface[interface_addr]

@danielballan
Copy link
Collaborator Author

danielballan commented Jan 18, 2020

Right...I was thinking that sockets_by_ca_addr could grow without bound in a long-running process if servers come and go, but in fact the number of broadcast addresses is constant for the lifetime of the process, however many actual servers might come and go. This seems workable.

I think this refactor isn't prohibitively hard to execute. We'll see....

@klauer
Copy link
Member

klauer commented Jan 19, 2020

Just a thought, what if instead of the additional sockets there was just a mapping such as destination_addr_to_local_addr = {ca_address_list_address: local_addr_found_by_connect, ...}? I suppose that could go stale if network topology changes, but maybe it's good enough...

@danielballan
Copy link
Collaborator Author

One socket would be better if it were possible. But it looks like after the first connection, the name sticks. Am I missing a way to do this with just one socket?

In [2]: sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)                                                                                                                               

In [3]: sock.connect(('172.17.0.1', 5000))                                                                                                                                                    

In [4]: sock.getsockname()                                                                                                                                                                    
Out[4]: ('172.17.0.1', 53756)

In [5]: sock.connect(('127.0.0.1', 5000))                                                                                                                                                     

In [6]: sock.getsockname()                                                                                                                                                                    
Out[6]: ('172.17.0.1', 53756)

In [7]: sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)                                                                                                                               

In [8]: sock.connect(('127.0.0.1', 5000))                                                                                                                                                     

In [9]: sock.getsockname()                                                                                                                                                                    
Out[9]: ('127.0.0.1', 53302)

In [10]: sock.connect(('172.17.0.1', 5000))                                                                                                                                                   

In [11]: sock.getsockname()                                                                                                                                                                   
Out[11]: ('127.0.0.1', 53302)

@danielballan
Copy link
Collaborator Author

danielballan commented Jan 19, 2020

Oh, perhaps you mean we should map out the network topology using several short-lived sockets (connect/getsockname/close) at the start and then use just one socket for all further communication during the lifecycle of the Broadcaster. 👍

@danielballan
Copy link
Collaborator Author

We could get the IPs this way but I don't think we would know the ports.

@klauer
Copy link
Member

klauer commented Jan 19, 2020

Hmm, yeah... just a half-baked thought...

@danielballan
Copy link
Collaborator Author

danielballan commented Jan 20, 2020

Another half-baked thought:

If I understand correctly, getsockname() tells us which interface the socket can receive from. Currently, we have just one UDP socket so it better be listening to all interfaces. Seem like it is:

In [38]: sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)                                                                                                                              

In [39]: sock.sendto(b'', ('127.0.0.1', 5000))                                                                                                                                                
Out[39]: 0

In [40]: sock.getsockname()                                                                                                                                                                   
Out[40]: ('0.0.0.0', 42078)

But if move to an explicit connect it appears the socket will only receive from one interface:

In [41]: sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)                                                                                                                              

In [42]: sock.connect(('127.0.0.1', 5000))                                                                                                                                                    

In [43]: sock.getsockname()                                                                                                                                                                   
Out[43]: ('127.0.0.1', 39819)

Then it seems we have two options:

  1. Make a UDP socket per interface. For each ca_addr, use the appropriate socket to send messages. Add all of these sockets as "listeners" in the SharedBroadcaster.
  2. Keep just the one socket. Record its address as 0.0.0.0 which seems accurate in this case because it is sending and listening over multiple interfaces.

If this is right, then (2) is a lazier approach that lets the system take care of more things for us but leaves us ignorant about some details that it could hypothetically be useful to know and expose through logs. But it might not be a wrong approach.

@ke-zhang-rd
Copy link
Contributor

Then it seems we have two options:

  1. Make a UDP socket per interface. For each ca_addr, use the appropriate socket to send messages. Add all of these sockets as "listeners" in the SharedBroadcaster.
  2. Keep just the one socket. Record its address as 0.0.0.0 which seems accurate in this case because it is sending and listening over multiple interfaces.

If this is right, then (2) is a lazier approach that lets the system take care of more things for us but leaves us ignorant about some details that it could hypothetically be useful to know and expose through logs. But it might not be a wrong approach.

I agree with this summary.
Are you in the middle of writing a separate bcast_client_socket function with a target arg? @danielballan If not, I could help.

@danielballan
Copy link
Collaborator Author

Thanks, @ke-zhang-rd. I haven't started writing any code for this yet. I'd like to feel a little more confident that we understand our options and their trade-offs before we proceed. I'm soliciting more opinions...

@danielballan
Copy link
Collaborator Author

The first post in epics-modules/asyn#11 seems to reinforce our current understanding. If we add a connect or bind, the UDP socket will not be able to receive all server responses, so we could need multiple sockets.

@klauer
Copy link
Member

klauer commented Apr 9, 2020

Just a thought: I wonder if tracing/comparing system calls for caproto vs epics-base would yield something worthwhile:

https://gist.github.com/klauer/b5ff0a892126501d38efa39a7872b52c

Copy link
Member

@klauer klauer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@danielballan
Copy link
Collaborator Author

Nice. It's good to see this verified at the lowest level. Thanks for handling the merge as well.

@klauer klauer merged commit faf0979 into caproto:master Apr 10, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants