Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Binding with ReuseAddress not working with UdpClient on Linux #27274

Closed
jkotas opened this issue Aug 30, 2018 · 40 comments
Closed

Binding with ReuseAddress not working with UdpClient on Linux #27274

jkotas opened this issue Aug 30, 2018 · 40 comments
Assignees
Labels
area-System.Net.Sockets os-linux Linux OS (any supported distro) tenet-compatibility Incompatibility with previous versions or .NET Framework
Milestone

Comments

@jkotas
Copy link
Member

jkotas commented Aug 30, 2018

From @olijf on August 30, 2018 11:13

Binding multiple clients on Linux platform in dotnet framework 2.1 does not work as expected

I am binding to a socket with SO_REUSEADDR (MultiCastClient.Client.SetSocketOption(SocketOptionLevel.Socket, SocketOptionName.ReuseAddress, true);) but on Linux this gives me an address already in use exception.
In dotnet core 2.0 this was working fine.

General

I have the following relevant piece of code:

...
MultiCastClient = new UdpClient();

MultiCastClient.Client.SetSocketOption(SocketOptionLevel.Socket, SocketOptionName.ReuseAddress, true);
var EndPoint = new IPEndPoint(IPAddress.Any, _listenPort);
MultiCastClient.JoinMulticastGroup(IPAddress.Parse(multicastAddress));

MultiCastClient.Client.Bind(EndPoint); // <--- this is where the bind exception happens.

try
{
	MultiCastClient.BeginReceive(RecieveCallBack, null);
}
...

I have a project targeting netcoreapp2.0

When I am running this with dotnet-hosting-2.0.8 everything is fine. However when I am running this with the newer CLR aspnetcore-runtime-2.1 (all on Debian 9) I am getting a bind exception:

Application startup exception: System.Net.Sockets.SocketException (98): Address already in use
   at System.Net.Sockets.Socket.UpdateStatusAfterSocketErrorAndThrowException(SocketError error, String callerName)
   at System.Net.Sockets.Socket.DoBind(EndPoint endPointSnapshot, SocketAddress socketAddress)
   at System.Net.Sockets.Socket.Bind(EndPoint localEP)
   at UDPNMEAMessageReciever.UdpMessageProcessor.Start()
...

I havent looked into it much further, but I would like to be able to use the newer CLR.
Thanks for helping me out here, I really appreciate your efforts.

Copied from original issue: dotnet/coreclr#19765

@tmds
Copy link
Member

tmds commented Aug 30, 2018

This may be caused by changes in dotnet/corefx#24809.

With 2.0 ReuseAddress did:

[pid  5814] setsockopt(23, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
[pid  5814] setsockopt(23, SOL_SOCKET, SO_REUSEPORT, [1], 4) = 0

With 2.1 it does:

[pid  5921] setsockopt(24, SOL_SOCKET, SO_REUSEPORT, [1], 4) = 0

I'm trying to reproduce this, but I'm missing something. I can run two instances of this program concurrently:

static void Main(string[] args)
{
    UdpClient MultiCastClient = new UdpClient();
    MultiCastClient.Client.SetSocketOption(SocketOptionLevel.Socket, SocketOptionName.ReuseAddress, true);
    var EndPoint = new IPEndPoint(IPAddress.Any, 5000);
    MultiCastClient.JoinMulticastGroup(IPAddress.Parse("239.0.0.1"));
    Console.Read();
}

@tmds
Copy link
Member

tmds commented Aug 30, 2018

@olijf does this happen with two instances of your own program? Or is another program also using the port?

@olijf
Copy link

olijf commented Aug 30, 2018

Hi Tom,
Tnx for looking into this. I'm running socat and another java client all on the same binding.
I haven't tried running multiple instances but will try tomorrow.
Hope this helps

@tmds
Copy link
Member

tmds commented Aug 30, 2018

I can reproduce this. The issue occurs when two applications each use a different option: one does SO_REUSEADDR and the other SO_REUSEPORT.

using System;
using System.Net;
using System.Net.Sockets;
using System.Runtime.InteropServices;

namespace console
{
    class Program
    {
        static unsafe void Main(string[] args)
        {
            bool reuseAddr = args.Length > 0;
            Socket s = new Socket(AddressFamily.InterNetwork, SocketType.Dgram, ProtocolType.Udp);
            if (reuseAddr)
            {
                System.Console.WriteLine("reuse address");
                int value = 1;
                setsockopt(s.Handle.ToInt32(), 1, 2, &value, sizeof(int));
            }
            else
            {
                System.Console.WriteLine("reuse port");
                s.SetSocketOption(SocketOptionLevel.Socket, SocketOptionName.ReuseAddress, true);
            }
            s.Bind(new IPEndPoint(IPAddress.Parse("0.0.0.0"), 5000));
            Console.Read();
        }


        [DllImport("libc", SetLastError = true)]
        private unsafe static extern int setsockopt(int socket, int level, int option_name, void* option_value, uint option_len);
    }
}

The fix will be to change back to changing both options for SocketOptionName.ReuseAddress.

@davidsh you can assign this to me.

@olijf
Copy link

olijf commented Sep 3, 2018

Hi @tmds ,
I see you've already figured out how to reproduce this and have pushed a fix to mitigate my issue. Can you give me an indication when I can expect this into the regular release? I'm very happy with how fast you've resolved this. Thank you.

@tmds
Copy link
Member

tmds commented Sep 3, 2018

@karelz when the PR is merged on master, will it become part of the 2.2 release? Can we consider this for 2.1?

@olijf by adding the setsockopt(s.Handle.ToInt32(), 1, 2, &value, sizeof(int)); you should be able to unblock yourself. Please give it a try, an confirm that resolves the issue.

peudo code:

UdpClient MultiCastClient = new UdpClient();
MultiCastClient.Client.SetSocketOption(SocketOptionLevel.Socket, SocketOptionName.ReuseAddress, true);
if (RuntimeInformation.IsOSPlatform(OSPlatform.Linux))
{
    // set SO_REUSEADDR (https://github.com/dotnet/corefx/issues/32027)
    int value = 1;
    setsockopt(MultiCastClient.Client.Handle.ToInt32(), 1, 2, &value, sizeof(int));
}

[DllImport("libc", SetLastError = true)]
private unsafe static extern int setsockopt(int socket, int level, int option_name, void* option_value, uint option_len);

@olijf
Copy link

olijf commented Sep 3, 2018

@tmds Your workaround works fine in one scenario but in another one i'm still getting the bind exception. I'm still investigation why this happens.

@tmds
Copy link
Member

tmds commented Sep 4, 2018

@olijf When setting both SO_REUSEADDR (via setsockopt) and SO_REUSEPORT (via SocketOptionName.ReuseAddress), I only got a bind exception when a previous socket was bound that didn't set any option.

@karelz
Copy link
Member

karelz commented Sep 4, 2018

@tmds master flows into 3.0. 2.2 is almost-servicing bar. Can you sum up the impact of this problem? If it is wide-impact, or if there is not a good workaround, we could consider it for 2.2 or 2.1.x.

@karelz
Copy link
Member

karelz commented Sep 4, 2018

Based on reviewing the PR in master, it seems to be rather rare corner case, right?
Plus the workaround, sounds reasonably (although kind of ugly).

Given that we have only 1 report so far, I recommend to NOT port it to 2.2/2.1.x, until we get more developers hitting the problem.

@olijf
Copy link

olijf commented Sep 4, 2018

Hi @karelz , imho I think this is actually a pretty big issue. Because you can not distinguish between SO_REUSEPORT and SO_REUSEADDR in dotnet core this is a major problem if you have multiple clients running created in different programming languages on the same binding (which all have different ways of doing the same thing). Plus it's a regression compared to the previous 2.0 runtime. I hope my opinion helps to decide what's best.

@karelz
Copy link
Member

karelz commented Sep 4, 2018

@olijf the key question is: How common is such setup?
I understand that if someone needs it, then it is bad, although there is a workaround.

@tmds
Copy link
Member

tmds commented Sep 7, 2018

but in another one i'm still getting the bind exception. I'm still investigation why this happens.

@olijf have you found the reason for the exception?

@tmds
Copy link
Member

tmds commented Oct 9, 2018

This issue is fixed by dotnet/corefx#32046

@karelz
Copy link
Member

karelz commented Oct 9, 2018

Closing as fixed in dotnet/corefx#32046

@karelz karelz closed this as completed Oct 9, 2018
@softworkz
Copy link

Copying over from dotnet/corefx#37044:

@karelz I have an app with a large number of users affected by this. Any Dlna media app will be affected by this. A back-port would be much appreciated. Thanks.

Originally posted by @LukePulverenti in https://github.com/dotnet/corefx/issues/37044#issuecomment-480127404


@LukePulverenti did you validate your problem is indeed the same root cause and is fixed in .NET Core 3.0? (and that it is not just same symptom)
There seems to be enough +1s to justify backport, we just need to be sure it is the right fix ... first step would be to validate on 3.0. Then we can cherry pick and ask for private validation on 2.2/2.1 build.

Originally posted by @karelz in https://github.com/dotnet/corefx/issues/37044#issuecomment-480349496


For us, this is the one that we need:
https://github.com/dotnet/corefx/pull/32046/files

Originally posted by @LukePulverenti in https://github.com/dotnet/corefx/issues/37044#issuecomment-494950738


@LukePulverenti did you confirm that particular change helps your case? Or did you use latest .NET Core 3.0 to validate that?

Originally posted by @karelz in https://github.com/dotnet/corefx/issues/37044#issuecomment-495010883

@softworkz
Copy link

softworkz commented May 25, 2019

Repro Szenario for UDP Bug

@karelz - you wrote:

@LukePulverenti did you validate your problem is indeed the same root cause and is fixed in .NET Core 3.0? (and that it is not just same symptom)
There seems to be enough +1s to justify backport, we just need to be sure it is the right fix ... first step would be to validate on 3.0. Then we can cherry pick and ask for private validation on 2.2/2.1 build.

and

We still need someone to help us track this down:
Anyone has an environment where it happens on somewhat regular basis, where we could work with you to collect more logs and experiment? It would be great help. Thanks!

Following up your chat with @LukePulverenti about backporting the fix to 2.2, I have created a reproduction scenario for you: https://github.com/softworkz/ReuseBug

The solution contains a native Linux app and a netcore console app, multi-targeting netcore 2.0, 2.2 and 3.0

This demonstrates:

  • works in 2.0
  • fails in 2.2
  • works again in 3.0

I hope this helps getting the fix backported to 2.2...

Originally posted by @softworkz in https://github.com/dotnet/corefx/issues/37044#issuecomment-495592262

@softworkz
Copy link

@softworkz thank you !

@karelz Yes it would be great to get this back-ported because ever since the 2.1 release we've had to tell users to shutdown all other upnp or dlna software on the machine in order to prevent this from happening.

Originally posted by @LukePulverenti in https://github.com/dotnet/corefx/issues/37044#issuecomment-495726635


@softworkz @LukePulverenti I think we may be dealing with multiple problems here as some people on this thread said that 3.0 does not fix it for them.
Either way, we have a repro now, so let's try it -- @tmds or @wfurt will you have time to try it out and reproduce? If we can reproduce in-house, it should be easier for us to track it down. I'd be also interested in the repro result on 2.1.

Thanks @softworkz for repro!!! That is a HUGE step towards root cause and solution. Let's hope we can reproduce it too :)

Originally posted by @karelz in https://github.com/dotnet/corefx/issues/37044#issuecomment-495762650


Thanks @softworkz for repro!!! That is a HUGE step towards root cause and solution. Let's hope we can reproduce it too :)

@karelz @softworkz is talking about a UDP issue https://github.com/dotnet/corefx/issues/32027 which was decided not to be backported: https://github.com/dotnet/corefx/issues/32027#issuecomment-418447086.

The main issue reported here is a TCP issue observed when using HttpClient.

Originally posted by @tmds in https://github.com/dotnet/corefx/issues/37044#issuecomment-495904766

@softworkz
Copy link

@karelz @softworkz is talking about a UDP issue dotnet/corefx#32027 which was decided not to be backported: #32027 (comment).

And still we're asking for it. It's a bug - not a "corner case".

The main issue reported here is a TCP issue observed when using HttpClient.

Not quite. We're not the only ones referring to the UDP bug here.

Originally posted by @softworkz in https://github.com/dotnet/corefx/issues/37044#issuecomment-495927124

@softworkz
Copy link

@karelz - I was able to copy over our parts of the UdpClient issue, but not the ones from the others having he same problem.

@karelz
Copy link
Member

karelz commented May 25, 2019

Thanks @softworkz!
So far it seems we have 2 customers confirmed to hit UdpClient problem -- @olijf and @softworkz. If anyone else hit the UdpClient problem, please reply here and tell us so.

@softworkz I wonder if it would be acceptable for you to wait for 3.0 RC in July (see roadmap). If there are more customers hitting it, impacting their production, we could consider backporting to 2.1/2.2.

@LukePulverenti
Copy link

We are only one customer but we do bring a lot of users across every OS and NAS device that we can deploy the runtime to. Right now this is creating enough troubleshooting for us that in order to save face we are passing this information onto users and saying that we'll just have to wait for an updated runtime. We would prefer to not have to start building our own fork of the runtime from source, but it looks like that's where we're going to be headed if nothing changes here.

@softworkz
Copy link

Once we discovered the problem we thought we could wait until March which was the original roadmap date for netcore 3.0.
But we're getting increasing pressure from customers as nobody wants to stick to our old version based on netcore 2.0 anymore (where it was still working).
Also, migrating to a new framework version is not a trivial task, because in fact we're delivering to the widest range of platforms one could think of: Windows, Linux (7 different distributions), MacOs, FreeBSD and Android.

(funny: Luke just wrote about the same...)

@tmds
Copy link
Member

tmds commented May 25, 2019

You should be able to work around the issue as described here: https://github.com/dotnet/corefx/issues/32027#issuecomment-418082355

@karelz
Copy link
Member

karelz commented May 26, 2019

Just to clarify @softworkz @LukePulverenti: Are you from the same company?

@softworkz AFAIK, March was never original roadmap for .NET Core 3.0.
.NET Core 2.0 is out of support, so I understand your desire to not use it. We would recommend the same.

Can you please try workaround from @tmds? Would that be acceptable until you are able to upgrade to 3.0?

@softworkz
Copy link

softworkz commented May 26, 2019

Just to clarify @softworkz @LukePulverenti: Are you from the same company?

More 'for' than 'from', but yes.

@softworkz AFAIK, March was never original roadmap for .NET Core 3.0.

Well, now I'm confused because you're the one who edited the roadmap document:
https://github.com/dotnet/core/blob/1b9b75a242b09f85a6dd7916ff08e7c28154f2b5/roadmap.md

Q1 2019 was the announced ship date from May 24, 2018 until Nov 6, 2018.
The last month in Q1 2019 is March 2019.

Can you please try workaround from @tmds? Would that be acceptable until you are able to upgrade to 3.0?

We'll try and report back, thanks.

@karelz
Copy link
Member

karelz commented May 29, 2019

you're the one who edited the roadmap document

Fair point. It is so long time ago that I forgot :), sorry!

@softworkz
Copy link

Quick update: The workaround was successful in case of my repro scenario. We're currently adding this to a new beta and then I'll report back..

@fgheysels
Copy link

I'm experiencing a similar issue.
I have a custom Azure IoT Edge module which runs in a Linux container. The IoT Edge module uses the Zeroconf library to discover devices in the network via mDNS. ZeroConf uses UdpClient to bind to a certain socket. When the IoT Edge module starts, ZeroConf is called to discover the devices.
I get the following exception:

<06/04/2019 10:03:23> Address already in use
<06/04/2019 10:03:23>    at System.Net.Sockets.Socket.UpdateStatusAfterSocketErrorAndThrowException(SocketError error, String callerName)
   at System.Net.Sockets.Socket.DoBind(EndPoint endPointSnapshot, SocketAddress socketAddress)
   at System.Net.Sockets.Socket.Bind(EndPoint localEP)
   at Zeroconf.NetworkInterface.NetworkRequestAsync(Byte[] requestBytes, TimeSpan scanTime, Int32 retries, Int32 retryDelayMilliseconds, Action`2 onResponse, NetworkInterface adapter, CancellationToken cancellationToken) in D:\a\1\s\Zeroconf\NetworkInterface.cs:line 107
   at Zeroconf.NetworkInterface.NetworkRequestAsync(Byte[] requestBytes, TimeSpan scanTime, Int32 retries, Int32 retryDelayMilliseconds, Action`2 onResponse, NetworkInterface adapter, CancellationToken cancellationToken) in D:\a\1\s\Zeroconf\NetworkInterface.cs:line 169
   at Zeroconf.NetworkInterface.NetworkRequestAsync(Byte[] requestBytes, TimeSpan scanTime, Int32 retries, Int32 retryDelayMilliseconds, Action`2 onResponse, CancellationToken cancellationToken) in D:\a\1\s\Zeroconf\NetworkInterface.cs:line 34
   at Zeroconf.ZeroconfResolver.ResolveInternal(ZeroconfOptions options, Action`2 callback, CancellationToken cancellationToken) in D:\a\1\s\Zeroconf\ZeroconfResolver.cs:line 79
   at Zeroconf.ZeroconfResolver.ResolveAsync(ResolveOptions options, Action`1 callback, CancellationToken cancellationToken) in D:\a\1\s\Zeroconf\ZeroconfResolver.Async.cs:line 98
   at Zeroconf.ZeroconfResolver.ResolveAsync(IEnumerable`1 protocols, TimeSpan scanTime, Int32 retries, Int32 retryDelayMilliseconds, Action`1 callback, CancellationToken cancellationToken) in D:\a\1\s\Zeroconf\ZeroconfResolver.Async.cs:line 69

When I look into the relevant ZeroConf code, I notice that UdpClient is used and that

using (var client = new UdpClient())
 {
    for (var i = 0; i < retries; i++)
    {
        try
        {
            var socket = client.Client;

            if (socket.IsBound) continue;

            socket.SetSocketOption(SocketOptionLevel.IP,
                        SocketOptionName.MulticastInterface,
                        IPAddress.HostToNetworkOrder(ifaceIndex));

            client.ExclusiveAddressUse = false;
            socket.SetSocketOption(SocketOptionLevel.Socket,
                                                      SocketOptionName.ReuseAddress,
                                                      true);
            socket.SetSocketOption(SocketOptionLevel.Socket,
                                                      SocketOptionName.ReceiveTimeout,
                                                      (int)scanTime.TotalMilliseconds);
            client.ExclusiveAddressUse = false;


            var localEp = new IPEndPoint(IPAddress.Any, 5353);

            Debug.WriteLine($"Attempting to bind to {localEp} on adapter {adapter.Name}");
            socket.Bind(localEp);

(The exception is thrown on the socket.Bind() call).

@matthew798
Copy link

Was this ever backported to 2.2/2.1?

@davidsh
Copy link
Contributor

davidsh commented Sep 4, 2019

Was this ever backported to 2.2/2.1?

It was not. There are no plans to backport it. In generally, not all fixes get backported to previous releases.

@davidsh
Copy link
Contributor

davidsh commented Sep 4, 2019

Check out the latest .NET Core 3.0 preview. It is suitable for 'go-live' scenarios:

https://devblogs.microsoft.com/dotnet/announcing-net-core-3-0-preview-9/

@matthew798
Copy link

@davidsh Thanks, was hoping to not have to use the workaround. I am writing a library, users could be on any version.

@QTimort
Copy link

QTimort commented Sep 4, 2019

@matthew798 Not sure if this can help you but I posted a hack that works, it was tested on .NET Core 2.1. It could be improved, see https://github.com/QTimort/bind-reuse-port

@matthew798
Copy link

@QTimort Thanks. I'll have a look!

@tmds
Copy link
Member

tmds commented Sep 5, 2019

Alternatively, you can us the code from https://github.com/dotnet/corefx/issues/32027#issuecomment-417395637

if (RuntimeInformation.IsOSPlatform(OSPlatform.Linux))
{
    // set SO_REUSEADDR (https://github.com/dotnet/corefx/issues/32027)
    int value = 1;
    setsockopt(MultiCastClient.Client.Handle.ToInt32(), 1, 2, &value, sizeof(int));
}

[DllImport("libc", SetLastError = true)]
private unsafe static extern int setsockopt(int socket, int level, int option_name, void* option_value, uint option_len);

@softworkz
Copy link

Alternatively, you can us the code from #32027 (comment)

if (RuntimeInformation.IsOSPlatform(OSPlatform.Linux))
{
    // set SO_REUSEADDR (https://github.com/dotnet/corefx/issues/32027)
    int value = 1;
    setsockopt(MultiCastClient.Client.Handle.ToInt32(), 1, 2, &value, sizeof(int));
}

[DllImport("libc", SetLastError = true)]
private unsafe static extern int setsockopt(int socket, int level, int option_name, void* option_value, uint option_len);

We had the same problem and I can confirm that this has fixed it for us.

@matthew798
Copy link

I'm not sure this is related to this issue, so if need by, I'll start another. I have observed what I think is a discrepency in how .net core handles sockets compared to native (on linux, at least)

I am talking about the case where two sockets are bound to the same endpoint, but only one is connected to a remote endpoint. I have an SO question on the subject, and it seems that the behaviour is "undefned", yet the code here works perfectly, and does exactly that.

Specifically, the code I linked creates 2 sockets, one to listen for incoming dtls "connection" requests, and another for a connected client. Both of these sockets are bound to the same endpoint, and the second is connected to the client's endpoint. The result is that all traffic originating from the "connected" client is forwarded to the socket created specifically for them, and all other traffic is forwarded to the unconnected socket.

I tried to replicate this behaviour in C# with no luck. As I mentioned, it seems that in this specific case, the behavior is undefined and the data seems to be forwarded to a socket at random.

My code is as follows:

var localEp = new IPEndPoint(IPAddress.Loopback, 1114);

var socket = new Socket(AddressFamily.InterNetwork, SocketType.Dgram, ProtocolType.Udp);
socket.SetSocketOption(SocketOptionLevel.Socket, SocketOptionName.ReuseAddress, true);
socket.Bind(localEp);
...
Setting up SSL
...
var clientSocket = new Socket(AddressFamily.InterNetwork, SocketType.Dgram, ProtocolType.Udp);
clientSocket.SetSocketOption(SocketOptionLevel.Socket, SocketOptionName.ReuseAddress, true);
clientSocket.Bind(localEp);
clientSocket.Connect(bioAddr);

At this point, there is no way to guarantee that the client's dgrams will make it to clientSocket. This does not match the behavior of the code I linked above.

Is this a bug in .net? I am using .net core 3.0, so I know that both SO_REUSEADDRESS and SO_REUSEPORT are being set. I'm not sure what I am missing...

@tmds
Copy link
Member

tmds commented Sep 6, 2019

Is this a bug in .net? I am using .net core 3.0, so I know that both SO_REUSEADDRESS and SO_REUSEPORT are being set. I'm not sure what I am missing...

In the code you linked to: SO_REUSEPORT is not set on Linux: https://github.com/nplab/DTLS-Examples/blob/226f222e528858b3a8c5fa3326b0599d25d3ef1c/src/dtls_udp_echo.c#L652-L654

@matthew798
Copy link

Yes you are correct. My code doesn't work in both dotnet core 2.2 (where SO_REUSEPORT is not set) and 3.0 (where it is). So it seems that SO_REUSEPORT has no effect on the result.

The bottom line is that the behavior I described is achievable in native code, but there seems to be no way in dotnet.

Ill open another issue.

@msftgits msftgits transferred this issue from dotnet/corefx Jan 31, 2020
@msftgits msftgits added this to the 3.0 milestone Jan 31, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Dec 15, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-System.Net.Sockets os-linux Linux OS (any supported distro) tenet-compatibility Incompatibility with previous versions or .NET Framework
Projects
None yet
Development

No branches or pull requests