Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Internet was down after recent updates. #893

Closed
magman2112 opened this issue Aug 27, 2018 · 18 comments

Comments

Projects
None yet
5 participants
@magman2112
Copy link

commented Aug 27, 2018

On Monday 27th August, internet access was lost at DoES Liverpool.

On calling Baltic Broadband it was found that there was an issue with the DNS. The connection had expired approx 24 hours after the network upgrade this weekend. It was identified by the Baltic tech support that we were apparently using a DNS of 1.1.1.1, but had not informed Baltic that we were using this address, so they had assumed that we were using Baltic's own DNS of 192.16.101.1. Baltic programmed some kind of workaround in the router to allow DNS to work at the moment.

I believe that Adrian has now added the Baltic DNS to the Unifi router config, but I suspect we need to double check the DNS settings and confirm with Baltic what we plan to use in the future.

Another consequence of the network upgrades this weekend is that DoES now has as static IP address - 185.135.106.6. This is now used on all outgoing traffic from DoES.

@amcewen

This comment has been minimized.

Copy link
Member

commented Aug 27, 2018

I've set it in the DHCP settings on Unifi, so it'll give the Baltic Broadband one out with new leases. I didn't see anywhere obvious to set it for the main gateway (10.0.0.1), which is the first entry given via DHCP, so it might need more tweaking.

@skos-ninja

This comment has been minimized.

Copy link

commented Aug 28, 2018

This shouldn't be an issue if 1.1.1.1 is routed correctly and if Baltic are not properly routing 1.1.1.1 then it will need to be reported to Cloudflare who will then work with Baltic to fix this as squatting on 1.0.0.0/8 is very much a no no and we shouldn't be forced to used any ISP DNS

@johnmckerrell

This comment has been minimized.

Copy link
Member

commented Aug 28, 2018

Just going to mention here that the VOIP phones were all out after this but that turning them off and on again seems to have been enough to sort them out. So just unplugging the ethernet cable from the Yealinks, or pulling the power cable from the small bridge.

@skos-ninja

This comment has been minimized.

Copy link

commented Aug 28, 2018

I have changed the dns forward that the unifi gateway uses and have removed the 1.1.1.1 alternative dns for if the UniFi one fails however I would look at wanting to switch back as soon as Baltic can fix this.

I would also note that although we have a static ip with Baltic it is not passing through either the IPv4 or IPv6 to our router so port forwarding is still not available.

@amcewen

This comment has been minimized.

Copy link
Member

commented Aug 28, 2018

I doubt Baltic are doing anything like "squatting on 1.0.0.0/8". It'd be much more likely that they're filtering traffic on UDP port #53, and I suspect we'd have more productive discussions with them if we assumed that and asked why, rather than accusing them of abusing routing options.

Port-forwarding will still be available, but as mentioned in #817, it's something that Baltic would set up rather than us. We just need to tell them what we'd want forwarding, and to where.

@skos-ninja

This comment has been minimized.

Copy link

commented Aug 28, 2018

I would like to note here that if they were blocking external DNS running on UDP port 53 then that's worse

@MatthewCroughan

This comment has been minimized.

Copy link

commented Aug 28, 2018

@skos-ninja I think I figured out some of what's happening.

The local dns server isn't functioning, and some devices are not able to automatically add the secondary dns server 192.168.101.1 Adrian provided yesterday to their configuration automatically. This includes my phone, the computers in the laser cutting room and Darren's laptop. Whereas mine and Adrian's laptop correctly added the secondary DNS to their resolv.conf. I tested this theory by rebooting a raspberry pi that hadn't been rebooted until today, which was otherwise still able to resolve dns, but after reboot was not able to.

Adrian's device has been rebooted, mine has not, they both have the same contents in /etc/resolv.conf:

search localdomain
nameserver 10.0.0.1
nameserver 192.168.101.1
nameserver 1.0.0.1

This is correct, Adrian had noted supplying 192.168.101.1 yesterday.

I noticed that lookup is incredibly slow at around 8 seconds for a lookup to google.com

--- google.com ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 10.623/10.623/10.623/0.000 ms

real    0m8.384s

So, why is lookup slow? I had a pi running an ssh tunnel that hadn't been rebooted until I manually decided to do so to test what would happen. Upon rebooting, it had only nameserver 10.0.0.1 in /etc/resolv.conf, prior to reboot this device still had a proper tunnel to my home network. After rebooting, I could no longer resolve google.com

Aug 28 17:00:26 skypi autossh[414]: ssh: Could not resolve hostname matthewcroughan.co.uk: Temporary failure in name resolution

As a temporary solution, I'm removing 10.0.0.1 from my resolv.conf so I don't have to timeout.

@skos-ninja

This comment has been minimized.

Copy link

commented Aug 28, 2018

After checking I can confirm the local dns server is running fine on 10.0.0.1.
That server is then set to forward to 192.168.101.1 which is Baltics DNS.

The following are timing requests for DNS:

  • Using 10.0.0.1 with 192.168.101.1 as the forward
Mainroom-BZ.v3.9.27# time nslookup google.com 10.0.0.1
Server:    10.0.0.1
Address 1: 10.0.0.1 USG

Name:      google.com
Address 1: 2a00:1450:4009:80c::200e lhr35s07-in-x0e.1e100.net
Address 2: 216.58.213.78 lhr25s01-in-f78.1e100.net
real    0m 5.05s
user    0m 0.00s
sys     0m 0.00s
  • Using 192.168.101.1
Mainroom-BZ.v3.9.27# time nslookup google.com 192.168.101.1
Server:    192.168.101.1
Address 1: 192.168.101.1

Name:      google.com
Address 1: 2a00:1450:4009:80c::200e lhr35s07-in-x0e.1e100.net
Address 2: 216.58.213.78 lhr25s01-in-f78.1e100.net
real    0m 5.06s
user    0m 0.00s
sys     0m 0.00s
  • Using 1.1.1.1
Mainroom-BZ.v3.9.27# time nslookup google.com 1.1.1.1
Server:    1.1.1.1
Address 1: 1.1.1.1 one.one.one.one

Name:      google.com
Address 1: 2a00:1450:4009:80c::200e lhr35s07-in-x0e.1e100.net
Address 2: 216.58.213.78 lhr25s01-in-f78.1e100.net
real    0m 0.06s
user    0m 0.00s
sys     0m 0.00s

So it appears our slow DNS response times are because the Baltic DNS server that they requested we use is slow.

@MatthewCroughan

This comment has been minimized.

Copy link

commented Aug 28, 2018

I can't get any device to work with 10.0.0.1 or 192.168.101.1 as DNS servers. I was wrong about 192.168.101.1 functioning, it doesn't, that's why nslookup takes 8 seconds, they both fail and then it gets to 1.0.0.1, from the perspective of most devices in DoES at the moment that's what's happening. I just had to give Jackie's laptop 1.0.0.1, prior to that it was using 10.0.0.1 without a secondary DNS, so couldn't resolve anything.

@MatthewCroughan

This comment has been minimized.

Copy link

commented Aug 28, 2018

@skos-ninja

[matthew@thinkpad ~]$ nslookup google.com 10.0.0.1; nslookup google.com 192.168.101.1; nslookup google.com 1.0.0.1; nslookup google.com 1.1.1.1
;; connection timed out; no servers could be reached

;; connection timed out; no servers could be reached

Server:         1.0.0.1
Address:        1.0.0.1#53

Non-authoritative answer:
Name:   google.com
Address: 216.58.213.78
Name:   google.com
Address: 2a00:1450:4009:80f::200e

Server:         1.1.1.1
Address:        1.1.1.1#53

Non-authoritative answer:
Name:   google.com
Address: 216.58.213.78
Name:   google.com
Address: 2a00:1450:4009:80f::200e

This is true also when connected via ethernet.

@skos-ninja

This comment has been minimized.

Copy link

commented Aug 28, 2018

@MatthewCroughan can you run traceroute 10.0.0.1

@MatthewCroughan

This comment has been minimized.

Copy link

commented Aug 28, 2018

@skos-ninja

traceroute to 10.0.0.1 (10.0.0.1), 30 hops max, 60 byte packets
 1  _gateway (10.0.0.1)  0.582 ms  0.718 ms  0.805 ms

traceroute to 192.168.101.1 (192.168.101.1), 30 hops max, 60 byte packets
 1  _gateway (10.0.0.1)  0.495 ms  0.580 ms  0.843 ms
 2  192.168.1.1 (192.168.1.1)  2.713 ms  3.660 ms  4.692 ms
 3  192.168.101.1 (192.168.101.1)  6.665 ms  7.594 ms  9.253 ms

@skos-ninja

This comment has been minimized.

Copy link

commented Aug 28, 2018

Okay perfect. Your device can see the gateway properly (which I should hope so otherwise we would be having even more issues).

Would you now be able to run a verbose nslookup so we can see where the dns request is failing on 10.0.0.1 @MatthewCroughan?

@MatthewCroughan

This comment has been minimized.

Copy link

commented Aug 28, 2018

@skos-ninja How do I do a verbose nslookup?
This is what I get for -debug on cloudflare, which I assume is verbose, and is mentioned as verbose in the man pages.

[matthew@thinkpad ~]$ nslookup google.com 1.0.0.1 -debug 
Server:         1.0.0.1
Address:        1.0.0.1#53

------------
    QUESTIONS:
        google.com, type = A, class = IN
    ANSWERS:
    ->  google.com
        internet address = 216.58.212.110
        ttl = 82
    AUTHORITY RECORDS:
    ADDITIONAL RECORDS:
------------
Non-authoritative answer:
Name:   google.com
Address: 216.58.212.110
------------
    QUESTIONS:
        google.com, type = AAAA, class = IN
    ANSWERS:
    ->  google.com
        has AAAA address 2a00:1450:4009:80e::200e
        ttl = 184
    AUTHORITY RECORDS:
    ADDITIONAL RECORDS:
------------
Name:   google.com
Address: 2a00:1450:4009:80e::200e

But because nothing at all is happening with 10.0.0.1, I get nothing, even from the verbose nslookup

[matthew@thinkpad ~]$ nslookup google.com 10.0.0.1 -debug
;; connection timed out; no servers could be reached
@skos-ninja

This comment has been minimized.

Copy link

commented Aug 28, 2018

Ok for now I have reverted our DNS settings back to what they were before the network upgrades as it's clear that the Baltic DNS is not working.

Once their DNS servers are working again we are happy to move back if they still require it.

@MatthewCroughan

This comment has been minimized.

Copy link

commented Aug 28, 2018

Huzzah. My phone works now, I didn't want to bother with DNS settings on android. Thanks.

@amcewen

This comment has been minimized.

Copy link
Member

commented Sep 2, 2018

Thanks for the debugging @skos-ninja and @MatthewCroughan! I've pinged Baltic support to find out what DNS server we should be using (or let them know that 192.168.101.1 seems to be non-functional :-)

@johnmckerrell

This comment has been minimized.

Copy link
Member

commented May 22, 2019

Going to close this now, the network is functioning well setup as it is.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.