-
Notifications
You must be signed in to change notification settings - Fork 63
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Device discovery not reliable?! #36
Comments
Hi, for you setup the set timeout does not matter. 2 seconds or so should be plenty time. Do you know whether there actually is a response received, and whether it was discarded? |
Hi, I once played with the library and even attempted to write my own java I have a lot of experience in the java world as a professional with multi Within the Discoverer class you see following code when sending out the s.send(new DatagramPacket(buf, buf.length, SYSTEM_SETUP_MULTICAST, The time between sending out and starting up your listener thread reading Just my experience. Greetings Bert Vermeiren. 2016-09-28 11:24 GMT+02:00 bmalinowsky notifications@github.com:
|
Thanks Bert, |
I had a deeper look at the code and found more than one section where the receiver listens AFTER the datagram has been sent: Beside that, there are a few optimization options: a) https://github.com/calimero-project/calimero-core/blob/master/src/tuwien/auto/calimero/knxnetip/Discoverer.java#L518 b) https://github.com/calimero-project/calimero-core/blob/master/src/tuwien/auto/calimero/knxnetip/Discoverer.java#L385 For 2) the fix is very easy (already changed w.r.t. a) ): // start receiver BEFORE sending out the datagram
final ReceiverLoop l = startReceiver(s, timeout, nifName + localAddr.getHostAddress());
receivers.add(l);
s.send(new DatagramPacket(buf, buf.length, SYSTEM_SETUP_MULTICAST, SEARCH_PORT));
return l; I'll try to create a diff/patch and see if my changes will fix the issue. |
I modified the already mentioned section 1) and 2). Here's the patch: diff --git a/src/tuwien/auto/calimero/knxnetip/Discoverer.java b/src/tuwien/auto/calimero/knxnetip/Discoverer.java
index 0b39734..29a3dba 100644
--- a/src/tuwien/auto/calimero/knxnetip/Discoverer.java
+++ b/src/tuwien/auto/calimero/knxnetip/Discoverer.java
@@ -451,14 +451,21 @@
try {
final byte[] buf = PacketHelper.toPacket(new DescriptionRequest(nat ? null
: (InetSocketAddress) s.getLocalSocketAddress()));
+
+ // start receiver thread
+ ReceiverLoop receiver = startDescriptionReceiver(s, timeout * 1000, server, s.toString());
s.send(new DatagramPacket(buf, buf.length, server));
- final ReceiverLoop looper = new ReceiverLoop(s, 256, timeout * 1000,
- server);
- looper.loop();
- if (looper.thrown != null)
- throw looper.thrown;
- if (looper.res != null)
- return looper.res;
+ try {
+ // block until receiver finishs
+ join(receiver);
+ } catch (InterruptedException ex) {
+ // forward the exception to outer try/catch
+ throw new IOException(ex);
+ }
+ if (receiver.thrown != null)
+ throw receiver.thrown;
+ if (receiver.res != null)
+ return receiver.res;
}
catch (final IOException e) {
final String msg = "network failure on getting description";
@@ -513,14 +520,12 @@
final InetSocketAddress res = mcast ? new InetSocketAddress(SYSTEM_SETUP_MULTICAST,
s.getLocalPort()) : nat ? null : new InetSocketAddress(localAddr,
s.getLocalPort());
- final byte[] buf = PacketHelper.toPacket(new SearchRequest(res));
- s.send(new DatagramPacket(buf, buf.length, SYSTEM_SETUP_MULTICAST, SEARCH_PORT));
- synchronized (receivers) {
- final ReceiverLoop l = startReceiver(s, timeout,
- nifName + localAddr.getHostAddress());
- receivers.add(l);
- return l;
- }
+ final byte[] buf = PacketHelper.toPacket(new SearchRequest(res));
+ // start receiver BEFORE sending out the datagram
+ final ReceiverLoop l = startReceiver(s, timeout, nifName + localAddr.getHostAddress());
+ receivers.add(l);
+ s.send(new DatagramPacket(buf, buf.length, SYSTEM_SETUP_MULTICAST, SEARCH_PORT));
+ return l;
}
catch (final IOException e) {
if (mcast)
@@ -618,6 +623,16 @@
looper.t.start();
return looper;
}
+
+ private ReceiverLoop startDescriptionReceiver(final DatagramSocket socket, final int timeout, final InetSocketAddress queriedServer,
+ final String name)
+ {
+ final ReceiverLoop looper = new ReceiverLoop(socket, 256, timeout * 1000, queriedServer);
+ looper.t = new Thread(looper, "Discoverer " + name);
+ looper.t.setDaemon(true);
+ looper.t.start();
+ return looper;
+ }
private final class ReceiverLoop extends UdpSocketLooper implements Runnable
{ Would be great if you @bmalinowsky could check this and create a 2.3-beta2 or so, to have this fix available before the 2.4 release. |
for b) This code seems correct. For what I can think of - didn't read the If I should design this class, I would make sure you can only use this Regards, Bert. 2016-09-29 8:24 GMT+02:00 Alex notifications@github.com:
|
@bmalinowsky |
I didn't look into your problem in detail due to current time limitations. Therefore, I also can't exactly state a schedule for v2.3 (which would contain a resolution to this and #37, if applicable). In general (as said, without further insight), I have questions about the underlying hypothesis of Bert, being an issue wrt the order in the execution sequence. I have to take a closer look ... |
Trust me, the network can reply much faster as your java code can run and Regards, Bert. 2016-10-11 14:52 GMT+02:00 bmalinowsky notifications@github.com:
|
I even expect it to be faster on certain devices. By saying there is a race condition, you imply a happens-before requirement on receive -> send (and wrt that, the patch of tuxedo is actually not correct). Sockets use rx/tx buffers. You can wait a day before invoking receive, the response will be there (usual exceptions apply). |
I still fully agree with Bert. If you use UDP, you should start listenen before you send a request. Otherwise you have a receive-gap where you can miss datagram packets. The receive method block until a packet is received. It does AFAIK not return already received packets. wrt my patch: Could you please explain what "is not correct" means?
Of course.
That's true for TCP streams. But with UDP you don't have a stream. You don't even have a "connection". |
You can always miss datagrams. UDP is an unreliable protocol after all.
It only returns already received "packets".
Actual thread execution, and invocation of receive (note that my answer was to Bert). I won't go into another discussion here.
What does TCP and its definition of a connection have to do with that? |
I have to admin: You're right. I created a short demo, sending udp datagram from A to B with a giant pause and a minimum rx buffer size of 1.. And.... datagram receives. I'm currently sitting at a different PC, so I cannot re-test with the same environment. With this PC I had the situation where too many udp packets where received in tcpdump (filter was not explicit enough), so that I was not able to capture a screenshot/copy and paste a situation, where the MDT IP Router answered >5sek after the search request. My discoverer-setup is configured to 5sek. Maybe the whole issue depends on a stressed KNX interface?! I will do some more tests with my other machine where the problem was visible and report back the findings. |
Re-open if there are any conclusive findings... |
Issue still present, but no additional details available. The provided "fix" in fact solves the issue, reproducible on windows and linux. Beside participating on this discussion: Did you @bmalinowsky try to run the posted code snippet and thus try to reproduce? |
Hi,
I'm using 2.3-beta and face the following problem when trying to discover KNX interface devices:
The result is not deterministic: Sometimes I find my routing and tunneling device, sometimes not. I then just need to repeat the procedure a few times, and device is again detected...
here's the related code-snippet I'm using:
Some words on the implementation:
When working with multiple network interfaces on a single machine, I have to know on which network interface the KNX IP Router is running. So I first get all related network interfaces and then run a discovery on each of them. It makes no difference if I use a timeout of 1sec, 5sec or 15sec.
I personally faced this issue in Debian Testing AMD64, but some other people faced this on Win7 as well.
I have no firewall running. The PC is connected via LAN to a switch, where also the MDT IP Router is connected to (no WiFi or Fritzbox inbetween).
Any ideas what's wrong?!
The text was updated successfully, but these errors were encountered: