Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] crc does not start on Windows 10 due to DNS / nameserver not set succesfully #1193

Closed
michaelburch opened this issue Apr 27, 2020 · 20 comments
Labels
kind/enhancement New feature or request os/windows status/pinned Prevents the stale bot from closing the issue

Comments

@michaelburch
Copy link

michaelburch commented Apr 27, 2020

General information

  • OS: Windows
  • Hypervisor: Hyper-V
  • Did you run crc setup before starting it (Yes/No)? Yes

CRC version

crc version: 1.9.0+a68b5e0
OpenShift version: 4.3.10 (embedded in binary)

CRC config

- pull-secret-file                      : c:\downloads\pull-secret

Host Operating System

Host Name:                 LAPTOP-VD1LLMG5
OS Name:                   Microsoft Windows 10 Enterprise
OS Version:                10.0.18363 N/A Build 18363
OS Manufacturer:           Microsoft Corporation
OS Configuration:          Standalone Workstation
OS Build Type:             Multiprocessor Free
Registered Owner:          N/A
Registered Organization:   N/A
Product ID:                00330-80000-00000-AA174
Original Install Date:     3/25/2020, 1:02:06 AM
System Boot Time:          4/9/2020, 3:55:36 PM
System Manufacturer:       LENOVO
System Model:              20QQS2K200
System Type:               x64-based PC
Processor(s):              1 Processor(s) Installed.
                           [01]: Intel64 Family 6 Model 158 Stepping 10 GenuineIntel ~2592 Mhz
BIOS Version:              LENOVO N2NET28W (1.13 ), 8/27/2019
Windows Directory:         C:\Windows
System Directory:          C:\Windows\system32
Boot Device:               \Device\HarddiskVolume1
System Locale:             en-us;English (United States)
Input Locale:              en-us;English (United States)
Time Zone:                 (UTC-06:00) Central Time (US & Canada)
Total Physical Memory:     32,503 MB
Available Physical Memory: 8,292 MB
Virtual Memory: Max Size:  64,678 MB
Virtual Memory: Available: 7,948 MB
Virtual Memory: In Use:    56,730 MB
Page File Location(s):     C:\pagefile.sys
Domain:                    WORKGROUP
Logon Server:              \\LAPTOP-VD1LLMG5
Hotfix(s):                 9 Hotfix(s) Installed.
                           [01]: KB4551762
                           [02]: KB4534132
                           [03]: KB4497727
                           [04]: KB4515383
                           [05]: KB4517245
                           [06]: KB4537759
                           [07]: KB4541338
                           [08]: KB4552152
                           [09]: KB4549951
Network Card(s):           3 NIC(s) Installed.
                           [01]: Intel(R) Wi-Fi 6 AX200 160MHz
                                 Connection Name: Wi-Fi
                                 DHCP Enabled:    Yes
                                 DHCP Server:     192.168.0.2
                                 IP address(es)
                                 [01]: 192.168.0.153
                                 [02]: fe80::ec5d:9d70:fd8f:93a7
                           [02]: Cisco AnyConnect Secure Mobility Client Virtual Miniport Adapter for Windows x64
                                 Connection Name: Ethernet 2
                                 DHCP Enabled:    No
                                 IP address(es)
                                 [01]: x.x.x.x
                                 [02]: fe80::1234:abcd:d4e0:bec6
                           [03]: Hyper-V Virtual Ethernet Adapter
                                 Connection Name: vEthernet (Default Switch)
                                 DHCP Enabled:    Yes
                                 DHCP Server:
                                 IP address(es)
                                 [01]: 172.17.35.1
                                 [02]: fe80::2c10:aabc:3925:371b
Hyper-V Requirements:      A hypervisor has been detected. Features required for Hyper-V will not be displayed.

Steps to reproduce

  1. crc setup
  2. crc config set pull-secret-file c:\downloads\pull-secret
  3. crc start --log-level debug

Expected

INFO Check internal and public DNS query ...
INFO Check DNS query from host ...
INFO Copying kubeconfig file to instance dir ...

Actual

INFO Check internal and public DNS query ...
INFO Check DNS query from host ...
WARN foo.apps-crc.testing resolved to [23.202.231.169 23.217.138.110] but 172.17.35.9 was expected
ERRO Failed to query DNS from host: Invalid IP for foo.apps-crc.testing

Logs

https://gist.github.com/michaelburch/b549777ee4ae8ac689838a6b6d8ae815

@michaelburch michaelburch added the kind/bug Something isn't working label Apr 27, 2020
@michaelburch
Copy link
Author

michaelburch commented Apr 27, 2020

The DNS server entry is correctly added to Hyper-V default switch, however in the default configuration on Windows 10 this does not have a low enough metric to make any difference.

I have been able to work around this issue by doing the following:

  1. crc delete (since start did not complete successfully)
  2. crc start
  3. observe the IP being used by the netsh command (172.17.35.10 for example)
  4. Get-DnsClientNrptRule | ? {$_.namespace -like '*testing*'} | Remove-DnsClientNrptRule -Force
  5. Add-DnsClientNrptRule -Namespace ".apps-crc.testing" -NameServers "172.17.35.10"
  6. Add-DnsClientNrptRule -Namespace ".crc.testing" -NameServers "172.17.35.10

@gbraad
Copy link
Contributor

gbraad commented Apr 28, 2020

ERRO error: CreateFile C:\Users\MichaelBurch.crc\machines\crc\kubeconfig: The system cannot find the file specified.

This is not the issue. @code-ready/crc-devel can we make this a WARN instead?

The DNS server entry is correctly added to Hyper-V default switch,

The metric is usually not an issue. Were you connected to the VPN when you tried the initial run?

observe the IP being used by the netsh command (172.17.35.10 for example)

hvc ip crc might work better, as this shwos the IP as soon as the VM reports it to the hypervisor

@praveenkumar
Copy link
Member

This is not the issue. @code-ready/crc-devel can we make this a WARN instead?

@gbraad no, this is used by every oc command we run even the csr approval one. If you notice from the logs then this file is not copied because the crc start fails quite early due to not able to resolve external server.

@michaelburch
Copy link
Author

The metric is usually not an issue. Were you connected to the VPN when you tried the initial run?

Yes, I was. I have repeated the process while disconnected with the same results

hvc ip crc might work better, as this shwos the IP as soon as the VM reports it to the hypervisor

I meant the netsh command used by:

add dns server address to interface vEthernet (Default Switch)

@michaelburch
Copy link
Author

ERRO Failed to query DNS from host: Invalid IP for foo.apps-crc.testing

This early failure is problematic. Could this error be made a warning?

@michaelburch
Copy link
Author

seems like adding entries to the Name Resolution Policy Table (as demonstrated above) would be more similar to the way systemd-resolved handles this on Linux. could be less disruptive than modifying nameservers or hosts file

@gbraad
Copy link
Contributor

gbraad commented Apr 28, 2020

@gbraad no, this is used by every oc command we run even the csr approval one.

If so, this should exit the process, as it now seems to error and continue on? Why I mention this, is that it pops up in serveral issues and it is not the actual issue (it confuses users as there is no obvious solution to this).

@gbraad
Copy link
Contributor

gbraad commented Apr 28, 2020

seems like adding entries to the Name Resolution Policy Table would be more similar to the way systemd-resolved

Looking into this now. Thanks! basically it provides the much needed split DNS functionality.

Note: https://docs.microsoft.com/en-us/previous-versions/windows/it-pro/windows-server-2012-r2-and-2012/dn593632(v=ws.11) is available on/since Windows 2012 R2. Will look into this for use on Win 10 Pro (commands are available, but unsure if they work as expected).

@gbraad
Copy link
Contributor

gbraad commented Apr 28, 2020

kubeconfig

This is due to the crc status command and leads to unnecessary confusion. Need to create a separate issue for this.

@michaelburch
Copy link
Author

I've adapted my workaround for this based on your feedback, running these commands while 'crc start' shows:

INFO Verifying validity of the cluster certificates ..

#Remove any NRPT rules for testing domains
Get-DnsClientNrptRule | ? {$_.namespace -like '*.testing'} | Remove-DnsClientNrptRule -Force

#Add rule, using first available IP of crc vm
Add-DnsClientNrptRule -Namespace ".crc.testing" -NameServers  (get-vm -Name crc).NetworkAdapters[0].IPAddresses[0]

#Add rule, using first available IP of crc vm
Add-DnsClientNrptRule -Namespace ".apps-crc.testing" -NameServers (get-vm -Name crc).NetworkAdapters[0].IPAddresses[0]

This has the desired effect and everything is started successfully

INFO Check DNS query from host ...
INFO Copying kubeconfig file to instance dir ...
INFO Adding user's pull secret ...
INFO Updating cluster ID ...
INFO Starting OpenShift cluster ... [waiting 3m]
INFO

@stale
Copy link

stale bot commented Jun 27, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the status/stale Issue went stale; did not receive attention or no reply from the OP label Jun 27, 2020
@anjannath anjannath added kind/enhancement New feature or request os/windows status/pinned Prevents the stale bot from closing the issue and removed kind/bug Something isn't working status/stale Issue went stale; did not receive attention or no reply from the OP labels Jun 29, 2020
@gbraad
Copy link
Contributor

gbraad commented Aug 19, 2020

@praveenkumar @kevprice83 Can you have a look at this?

@gbraad
Copy link
Contributor

gbraad commented Aug 20, 2020

@par02 please report if this helps

@par02
Copy link

par02 commented Aug 20, 2020

Thanks @gbraad, that lets me script a workaround - much better than editing the hosts file!
I'll test it out later when I'm off VPN again.

@kevprice83
Copy link

@gbraad just to let you know so far so good, executing those commands as an admin privileged user solved the issue for me. I will deploy 3scale on top of the cluster over the weekend to see how it goes, especially with creating new routes etc and will provide further feedback but so far looks like CRC on Windows 10 Home is not just a pipedream :)

@kevprice83
Copy link

@gbraad this helped get round the issue I had deploying the cluster but it seems that the stability is not great. The cluster is running fine in general and I can navigate the consoled and deploy 3scale via the operatorHub. The 3scale applications also works okay until I begin trying to access some routes via curl. I either get a 503 service unavailable or a time out waiting for a response. I don't know how much of this is simply an openshift issue or a CRC issue.

I am seeing errors in the events stream like the following:

Status for clusteroperator/authentication changed: Degraded message changed from "RouteHealthDegraded: failed to GET route: dial tcp: lookup oauth-openshift.apps-crc.testing on 172.25.0.10:53: no such host" to ""

OR

Status for clusteroperator/console changed: Degraded message changed from "RouteHealthDegraded: failed to GET route (https://console-openshift-console.apps-crc.testing/health): Get https://console-openshift-console.apps-crc.testing/health: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)" to ""

It looks like the unstable network is causing some failures which are not enough alone to cause the cluster to fail but at least it makes certain things like monitoring etc impossible.

Interestingly all the UI pages for the 3scale app seem to work very well but executing a curl request directly to one of those routes fails.

@kevprice83
Copy link

@gbraad @praveenkumar after assigning more resources to the VM it looks like the cluster has stabilised and I am using it just fine for the moment. I would say the steps provided by @michaelburch have resolved the issue for Windows 10 Home users also :)

@chrislovecnm
Copy link

chrislovecnm commented Sep 15, 2020

The problem that I am having is when then cluster was using the Default vSwitch that was internal only and not exposed to the internet. Then digging around in issues I notice that you can name the vSwitch "crc" and it will start using that vSwitch. So that got it an external address. But then DNS configuration was failing. The above Add-DnsClientNrptRule powershell commands seem to have helped.

  1. Please can we get the crc vSwitch networking name configurable and documented.
  2. Also I am guessing that there is a bug in the code where DNS configuration is not occurring correctly.

Anyone that is struggling creating the "crc" vSwitch correctly run these commands in powershell

First view the network adapters and choose the nic you want to use:

 Get-NetAdapter

This will return the adapter names.

Then set up a variable to hold the correct name, and with my machine the name adapter name is "Ethernet".

$net = Get-NetAdapter -Name 'Ethernet'

Then create a vSwitch using the above variable

New-VMSwitch -Name "crc" -AllowManagementOS $True -NetAdapterName $net.Name

I started crc, and it failed but it then got an ip address. Stopped crc and added the DNS entries by hand

Replace the IP Address below with the IP Address that was assigned to the crc VM

Add-DnsClientNrptRule -Namespace ".apps-crc.testing" -NameServers "172.17.35.10"
Add-DnsClientNrptRule -Namespace ".crc.testing" -NameServers "172.17.35.10

Restart crc and it should use the crc vSwitch and have the correct DNS setup.

@gbraad gbraad changed the title [BUG] crc does not start on Windows 10 [BUG] crc does not start on Windows 10 due to DNS / nameserver not set succesfully Nov 24, 2020
@adrianriobo
Copy link
Contributor

I tested the solution with the default switch and I end up in two scenarios:

Scenario 1: crc start command fails due to dns check

crc start command fails with message:

Failed to query DNS from host: lookup foo.apps-crc.testing: no such host

From the log we can see the IP given to the VM, and using @michaelburch script we can set up the DnsClientNrptRules for the domains. After that a stop and start is required to start properly the cluster:

crc stop
crc start -p pull-secret-file -n 8.8.8.8 --log-level debug

Scenario 2: crc start finish but dns resolution fails randomly

The domain resolution fails randomly, this can be tested trying to ping the subdomain:

C:\> ping api.crc.testing                                                                                                        
Pinging api.crc.testing [192.168.185.205] with 32 bytes of data:
Reply from 192.168.185.205: bytes=32 time<1ms TTL=64
Ping statistics for 192.168.185.205:
    Packets: Sent = 1, Received = 1, Lost = 0 (0% loss),
Approximate round trip times in milli-seconds:
    Minimum = 0ms, Maximum = 0ms, Average = 0ms
Control-C
C:\> ping api.crc.testing                                                                                          
Ping request could not find host api.crc.testing. Please check the name and try again.

So, oc login fails:

C:\> oc login -u kubeadmin -p hxYtC-KLeQC-kfNkm-ppi8i https://api.crc.testing:6443                                               
error: dial tcp: lookup api.crc.testing: no such host - verify you have provided the correct host and port and that the server is currently running.

After setup the dns rules with the VM ip using the @michaelburch workaround. Now it is possible to login in the cluster directly without any restart:

C:\> oc login -u kubeadmin -p hxYtC-KLeQC-kfNkm-ppi8i https://api.crc.testing:6443                                               
Login successful.
You have access to 58 projects, the list has been suppressed. You can list all projects with ' projects'
Using project "default".

@guillaumerose
Copy link
Contributor

In the meantime, we worked on a new solution for the network. It doesn't modify DNS servers and only modify the hosts file. It also removes the use of the vSwitch.

The feature is still alpha but you can already try it with the latest version.

Instructions are here: https://github.com/code-ready/crc/wiki/VPN-support--with-an--userland-network-stack

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/enhancement New feature or request os/windows status/pinned Prevents the stale bot from closing the issue
Projects
None yet
Development

No branches or pull requests

9 participants