Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DNS lookup on connect causes delay #873

Closed
madpsy opened this issue Nov 20, 2021 · 11 comments
Closed

DNS lookup on connect causes delay #873

madpsy opened this issue Nov 20, 2021 · 11 comments
Labels
Milestone

Comments

@madpsy
Copy link

madpsy commented Nov 20, 2021

It appears when a connection comes in to rigctld it performs a reverse loopup of the source address. Depending on circumstance this can cause quite a delay.

In this case rigctld is running on a Windows 11 box and is being queried from a Pi over the local network.

For example:

pi@rpi4:~/tools $ time ./get_rigctl.sh 192.168.9.25 52222 f
14173000

real    0m4.679s
user    0m0.000s
sys     0m0.008s

That's nearly 5 seconds in this case. I can see in the rigctl output:

Connection opened from rpi4:36714
...
...
Connection closed from rpi4:36714

The delay occurs before the Connection opened line is outputted. This all suggests it it is indeed performing some kind of DNS related shinanigans in order to resolve it as 'rpi4'. Having looked at the available options (-L) I don't see any mention of being able to disable these lookups.

I can also see the PTR lookup hitting the local nameserver (answered in 1.3ms):

2021-11-20 12:37:23 | PTR | 35.9.168.192.in-addr.arpa | ROCKET.local | OK INSECURE | DOMAIN (1.3ms)

This lookup only hits the nameserver after this weird random delay so I'm confused what it's doing before making this PTR DNS request as that's the bit that's taking all the time. Maybe it's not DNS causing the delay but something is as I've no idea what.

If I use different software which also has a rigctld compatible interface (namely SparkSDR) there's no delay whatsoever so it's specific to rigctld.

My questions are therefore:

  1. Does the ability to disable lookup already exist?
  2. Could it be implemented?
  3. Is there a simple change I could make and recompile it myself to test?

Thanks

@madpsy
Copy link
Author

madpsy commented Nov 20, 2021

For reference and in case anyone else finds this useful here's the quick shell script being used:

#!/bin/bash

HOST=$1
PORT=$2
CMD=$3

exec 10<>/dev/tcp/$HOST/$PORT  # connect to rigctl and use FD 10
echo $CMD >&10                          # send command
read RESP line <&10                       # get response
echo $RESP

@madpsy
Copy link
Author

madpsy commented Nov 20, 2021

I've attached a wireshark screenshot from the box rigctld is running on. Highlighted first is the 'f' being sent to request frequency and from the time difference between that and the next highlighted line is this mysterious delay.

Screenshot 2021-11-20 125936

@mdblack98
Copy link
Contributor

mdblack98 commented Nov 20, 2021 via email

@madpsy
Copy link
Author

madpsy commented Nov 20, 2021

The rig is actually SDR Console over a virtual com port - it presents itself as a TS-2000.

It might not be DNS but can't think what else it is. If I run the same test locally (i.e. from localhost) there's no issue at all (returns within a few milliseconds) so it's not the 'rig' side of things.

If I use SparkSDR (which implements its own version of rigctld) it's also fine over the network. The network itself isn't the issue, it's something that rigctld on Windows is doing.

The order of events as I understand it from what I've tested is:

  1. TCP connection successfully established
  2. rigctld tries to get the hostname of the source connection (this is the bit I'd like to try disabling)
  3. 4.5 second delay (possibly trying some windows specific methods of lookup?)
  4. PTR DNS request recieved by nameserver
  5. DNS response sent/recieved (within 1.3ms)
  6. rigctl logs: Connection opened from rpi4:36714
  7. command recieved and processed ('get frequency')
  8. reply is sent
  9. TCP session closes
  10. rigctl logs: Connection closed from rpi4:36714

The fact rigctl logs 'rpi4:36714' means it's translated the source address to 'rpi4' otherwise I'd expect to see '192.168.9.35:36714'.

The PTR request is of course the standard way of doing lookups so I'm wondering if windows is trying some proprietary lookup nonsense before it resorts to a PTR lookup.

I assume rigctl is using a standard library to perform the hostname resoluition of the source address - I believe it's something in there causing the delay. I would question why rigctl even cares what the hostname of the source request is.

@madpsy
Copy link
Author

madpsy commented Nov 20, 2021

Forgot to say: hamlib 4.3.1

@madpsy
Copy link
Author

madpsy commented Nov 20, 2021

I've just reversed the test - ran rigctld on rpi4 and my test script via WSL on the Windows 11 box and there's no delay:

madpsy@ROCKET:~/tools$ time ./get_rigctl.sh 192.168.9.35 52222 f
145000000

real    0m0.028s
user    0m0.001s
sys     0m0.000s

This really does suggest a Windows specific issue with rigctld in my case, however, this delay would also occur in any situation where the lookup is slow for whatever reason.

@madpsy
Copy link
Author

madpsy commented Nov 20, 2021

I believe the lookup is being performed here:

https://github.com/Hamlib/Hamlib/blob/master/tests/rigctld.c#L960

and here too:

https://github.com/Hamlib/Hamlib/blob/master/tests/rigctld.c#L1198

I see no reason to perform these getnameinfo() calls or if there is a good reason for them it would be nice to be able to disable them.

I would try this fixing this myself but unsure how to compile for Windows.

if ((retcode = getnameinfo((struct sockaddr const *)&arg->cli_addr,
                                       arg->clilen,
                                       host,
                                       sizeof(host),
                                       serv,
                                       sizeof(serv),
                                       NI_NOFQDN))
                    < 0)
            {
                rig_debug(RIG_DEBUG_WARN,
                          "Peer lookup error: %s",
                          gai_strerror(retcode));
            }

            rig_debug(RIG_DEBUG_VERBOSE,
                      "Connection opened from %s:%s\n",
                      host,
                      serv);

@madpsy
Copy link
Author

madpsy commented Nov 20, 2021

Here's a discussion about getnameinfo() being slow and seeing the exact same ~4 second slowness:

https://social.msdn.microsoft.com/forums/windowsdesktop/en-US/5e8ce3a5-8c96-4205-a8ec-ba9a708b8c16/getnameinfo-very-slow

I would suggest either not trying to resolve IPs to hostnames or add a flag to disable it. It's been a fun ride this morning and good to get to the bottom of it (albeit completely untested)!

Would someone be willing to remove those calls and compile for me to test?

@mdblack98 mdblack98 added the bug label Nov 20, 2021
@mdblack98 mdblack98 added this to the 4.4 milestone Nov 20, 2021
mdblack98 added a commit that referenced this issue Nov 20, 2021
@mdblack98 mdblack98 added the needs test Patches have been submitted but need testing to close issue label Nov 20, 2021
@mdblack98
Copy link
Contributor

Try tonight's package from http://n0nb.users.sourceforge.net/
Fixed it so that it no longer does a lookup...doesn't really seem necessary anyways as numeric tells anybody what they need to know.

@madpsy
Copy link
Author

madpsy commented Nov 21, 2021

Nice one thanks - returns instantly:

pi@rpi4:~/tools $ time ./get_rigctl.sh 192.168.9.25 52222 f ; date
14074000

real    0m0.016s
user    0m0.010s
sys     0m0.002s
Sun 21 Nov 2021 11:27:40 AM GMT

@madpsy
Copy link
Author

madpsy commented Nov 22, 2021

The reason for this bug report was my work on the following project:

https://github.com/madpsy/xpa125b_controller

In its rawest form it's a multi-API interface for rigctl which now works as expected so thanks for all your effort.

@mdblack98 mdblack98 removed the needs test Patches have been submitted but need testing to close issue label Feb 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants