Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to grab VM's IP after 30s #61

Closed
quinncomendant opened this issue Feb 18, 2016 · 33 comments
Closed

Unable to grab VM's IP after 30s #61

quinncomendant opened this issue Feb 18, 2016 · 33 comments

Comments

@quinncomendant
Copy link

I've successfully downloaded the images and booting has commenced! Unfortunately, there is a new problem: after booting completes, the device is unable to get an IP address. I see error, Error: Unable to grab VM's IP after 30s (!)... Aborting.

I also had a similar problem when trying to use the xhyve docker-machine driver directly. You can see my report of that issue in issue #97 of docker-machine-driver-xhyve. I assume the underlying cause is the same in both. Does corectl use docker-machine-driver-xhyve? Or is it trying to get the VM's IP in the same way?

Full corectl run output attached here:
sudo corectl run output.txt

BTW, notice how the shell freaks out after coreos aborts (linefeeds are absent) at the end of that output. Same problem as in #60.

@AntonioMeireles
Copy link
Member

tricky. we've been hitting that issue with latest 2 alpha versions but not in a fully reproducible way ... it smells that for some reason systemd sometimes takes too much time to act... i hope to fully debug this over next weekend. OTOH we only have reports of this on latest alphas... trying to boot beta or stable (or older alpha revisions) should be a valid workaround for now

(MANY THANKS for your patience too!)

@quinncomendant
Copy link
Author

Cool, I'm downloading the beta now…

You might want to collaborate with @zchee since lots of people are having the same problem with docker-machine-driver-xhyve.

@quinncomendant
Copy link
Author

@AntonioMeireles, I've installed the beta version (sudo corectl run --channel beta) and have the same problem. The downloads were verified (it said beta/899.7.0 ready) and booted, then finally, Error: Unable to grab VM's IP after 30s (!)... Aborting. =(

@quinncomendant
Copy link
Author

And same problem with stable/835.13.0!

If I'm having problems with all three versions of corectl+coreos, as well as docker-machine-driver-xhyve, then the problem is either something shared between the two projects, or something in my local environment. Where does corectl+coreos discover its IP? Same place as docker-machine-driver-xhyve? How can I troubleshoot?

@rimusz
Copy link
Member

rimusz commented Feb 23, 2016

@quinncomendant it could be something with your local environment

@quinncomendant
Copy link
Author

Yep, it was a setting on my Macbook that prevented DHCP access from corectl. I had my OS X firewall set to "Block all incoming connections," and when I disabled that, coreos was able to get an IP. The following firewall settings are required for it to work:

screen shot 2016-02-22 at 23 21 45

Unfortunately, disabling the firewall didn't fix the problem I'm having with docker-machine-driver-xhyve. I'll continue working with @zchee to see why it can't get an IP.

@rimusz
Copy link
Member

rimusz commented Feb 23, 2016

@quinncomendant did you manage to get corectl working?

@quinncomendant
Copy link
Author

Yes, corectl works well. See my previous comment in case you missed the solution.

@quinncomendant
Copy link
Author

I'm reopening this because the problem has reoccured occasionally. I do think that my original problem may have been the OS X firewall “Block all incoming connections” settings, because with it enabled corectl would succeed 0% of the time. But now that I've been using it almost a week, I still get the “ Unable to grab VM's IP after 30s” error about 50% of the time. If I get this error, I simply run the same corectl run command again and eventually it will work. But just now, I started my computer from a cold boot, connected to wifi, connected to VPN, then launched Terminal and executed corectl run (all as I normally do), and I got this error six times in a row before it would start. =(

How can I help troubleshoot this?

@v1k0d3n
Copy link

v1k0d3n commented Apr 13, 2016

@rimusz just as an fyi: i'm running into this issue on one MBP, and not the other. if you want, we can try to dissect the issue together...or i can start peeling back the onion later today when i have some time (i haven't had a lot of "free" time lately). i'd be happy to help though.

@rimusz
Copy link
Member

rimusz commented Apr 13, 2016

@v1k0d3n the problem comes from corectltool, when @AntonioMeireles finds free time he is going to investigate the problem

@chrislovecnm
Copy link

@rimusz I am running into this now as well. Been working like a champ ... and now splat 😦

@chrislovecnm
Copy link

And my firewall is off

@AntonioMeireles
Copy link
Member

@chrislovecnm a quick question - happens to you sometimes or you are able to consistently reproduce it ? (if so how) ?

@chrislovecnm
Copy link

chrislovecnm commented Jun 14, 2016

@AntonioMeireles It is currently hosed ... you on slack?

$ sudo corectl --debug load settings/k8solo-01.toml
Password:
> booting k8solo-01
[corectl] stable/1010.5.0 already available on your system
[corectl] '/Users/clove' was already available to VMs via NFS
Error: Unable to grab VM's IP after 30s (!)... Aborting

Usage:
  corectl load path/to/yourProfile [flags]

Examples:
  corectl load profiles/demo.toml

Global Flags:
      --debug   adds extra verbosity, and options, for debugging purposes and/or power users

All flags can also be configured via upper-case environment variables prefixed with "COREOS_"
For example, "--debug" => "COREOS_DEBUG"

@AntonioMeireles
Copy link
Member

trying to get daughter to sleep (11pm here), ence away from real keyboard. Will reach you tommorrow morning :/

@chrislovecnm
Copy link

k ... will try to get debug out

@chrislovecnm
Copy link

chrislovecnm commented Jun 14, 2016

This is an awesome feature:

clove-mbp:kube-solo root# whoami
root

clove-mbp:kube-solo root# /Users/clove/kube-solo/bin/corectl load /Users/clove/kube-solo/settings/k8solo-01.toml
Error: not enough previleges to start or forcefully halt VMs. use 'sudo'

Usage:
  corectl load path/to/yourProfile [flags]

Examples:
  corectl load profiles/demo.toml

Global Flags:
      --debug   adds extra verbosity, and options, for debugging purposes and/or power users

All flags can also be configured via upper-case environment variables prefixed with "COREOS_"
For example, "--debug" => "COREOS_DEBUG"

Ummmmm I am root :)

@chrislovecnm
Copy link

chrislovecnm commented Jun 14, 2016

More debug

# clove @ clove-mbp in ~/kube-solo [15:54:51]
$ sudo COREOS_DEBUG=true /Users/clove/kube-solo/bin/corectl load /Users/clove/kube-solo/settings/k8solo-01.toml
> booting k8solo-01
[corectl] stable/1010.5.0 already available on your system
[corectl] '/Users/clove' was already available to VMs via NFS
{
    "Name": "k8solo-01",
    "Channel": "stable",
    "Version": "1010.5.0",
    "Cpus": 2,
    "Memory": 3072,
    "UUID": "90394897-5956-4F9D-AFC4-E274C45DAB31",
    "MacAddress": "a6:fe:46:15:cf:32",
    "CloudConfig": "/Users/clove/kube-solo/cloud-init/user-data",
    "CClocation": "localfs",
    "SSHkey": "ssh-rsa REDACTED",
    "Root": -1,
    "Ethernet": [
        {
            "Type": 1
        }
    ],
    "Storage": {
        "HardDrives": {
            "0": {
                "Slot": 0,
                "Type": "HDD",
                "Path": "/Users/clove/kube-solo/data.img"
            }
        }
    },
    "InternalSSHauthKey": "ssh-rsa REDATED,
    "InternalSSHprivKey": "-----BEGIN RSA PRIVATE KEY-----REDACTED",
    "Detached": true,
    "PreferLocalImages": true,
    "Pid": -1,
    "PublicIP": "",
    "CreatedAt": "2016-06-14T15:54:57.985406177-06:00"
}
Error: Unable to grab VM's IP after 30s (!)... Aborting

Usage:
  corectl load path/to/yourProfile [flags]

Examples:
  corectl load profiles/demo.toml

Global Flags:
      --debug   adds extra verbosity, and options, for debugging purposes and/or power users

All flags can also be configured via upper-case environment variables prefixed with "COREOS_"
For example, "--debug" => "COREOS_DEBUG"

@chrislovecnm
Copy link

Switched my channel to Beta

sudo COREOS_DEBUG=true /Users/clove/kube-solo/bin/corectl load /Users/clove/kube-solo/settings/k8solo-01.toml
> booting k8solo-01
[corectl] downloading and verifying beta/1010.4.0
Trusted hex key id 50E0885593D2DCB4 is decimal [80 224 136 85 147 210 220 180]
Trusted key id 5827807818951089332 matches keyid 5827807818951089332
30.78 MB / 30.78 MB [==================================================================================================================================================] 100.00 %
[corectl] SHA512 hash for coreos_production_pxe.vmlinuz OK
Trusted hex key id 50E0885593D2DCB4 is decimal [80 224 136 85 147 210 220 180]
Trusted key id 5827807818951089332 matches keyid 5827807818951089332
64.96 MB / 217.31 MB [==========================================>--------------------------------------------------------------------------------------------------] 29.89 % 1m45s

@chrislovecnm
Copy link

I pulled down the latest and built it locally (I code in go).

I added in some debug code in run.go

go func() {
        timeout := time.After(30 * time.Second)
        select {
        case <-timeout:
            fmt.Printf("launching c.Args %v", c.Args)
            fmt.Printf("launching c.Path %v", c.Path)
            fmt.Printf("launching c.Dir %v", c.Dir)
            fmt.Printf("launching c.Evn %v", c.Env)
            fmt.Printf("launching c.ProcessState %v", c.ProcessState)
            fmt.Printf("launching c.Stderr %v", c.Stderr)
            fmt.Printf("launching c.Stdout %v", c.Stdout)
            fmt.Printf("launching c.Stdin %v", c.Stdin)
            fmt.Printf("launching c.Process.Pid %s", c.Process.Pid)
            if p, ee := os.FindProcess(c.Process.Pid); ee == nil {
                p.Signal(os.Interrupt)
            }
            vm.errch <- fmt.Errorf("Unable to grab VM's IP after " +
                "30s (!)... Aborting")

Not much helpful ... not played with exec before :)

# clove @ clove-mbp in ~/kube-solo [16:31:13] C:255
$ sudo COREOS_DEBUG=true time ~/Workspace/src/github.com/TheNewNormal/corectl/corectl load /Users/clove/kube-solo/settings/k8solo-01.toml
> booting k8solo-01
[corectl] beta/1010.4.0 already available on your system
[corectl] '/Users/clove' was already available to VMs via NFS
{
    "Name": "k8solo-01",
    "Channel": "beta",
    "Version": "1010.4.0",
    "Cpus": 2,
    "Memory": 3072,
    "UUID": "90394897-5956-4F9D-AFC4-E274C45DAB31",
    "MacAddress": "a6:fe:46:15:cf:32",
    "CloudConfig": "/Users/clove/kube-solo/cloud-init/user-data",
    "CClocation": "localfs",
    "SSHkey": "ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDkUoAaqKkB1x8w6YnDnugyz29q0iz03rJR56KHFvH102CGgXETMczBfcRGgvIaf81Q2T3Nc0+LxLR336iyNwgHdk0zo7JuseV09IR1fhmsBcyrwi/yIklfyU4WvF5K9h+4taw8PpaRqbVFolU+W1cTYYhPcyN5zBrtLBBA8tRGD/11UYcND4mDgwCowV86fwXLR0jdSXd/b/FeYkc6Q6XEAjqbBhIKcSyRl7bcsUIOhHNBDpFi65ws8rDTCVo4Wej5KHzSzpwaHzGZzSNGhpLEBVyP78Wzn6zUKuzQ+6PsDQr6u/gXYQ4nymfFVxKC2tqWR0BlnRA92AOrluQYF7Bv clove@clove-mbp",
    "Root": -1,
    "Ethernet": [
        {
            "Type": 1
        }
    ],
    "Storage": {
        "HardDrives": {
            "0": {
                "Slot": 0,
                "Type": "HDD",
                "Path": "/Users/clove/kube-solo/data.img"
            }
        }
    },
    "InternalSSHauthKey": "ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAAA/DuMbYHwF8KvOU2e7K9tx7Sq1jac4xYAPnrcHd8JR9IF2b+1MscO8Q9U8GWCtKxpewLAnSF/HZKnPHATLn+s/FdoysxFJHz+InXYWmj0EuQKtqHaz8wo5teiUh7kRUqi262dWtbZQp/EH0cEhwF+YRiTx4Yf4Ubg9NCSTX42aWu/svJZ/k/k7Ad8/TcuFotpDkYW8xmlidk/h3sacpwISurm3F1289I/rCq5L4t9wZ81R7GJ5v9CEswuYOCkjN6mgwqFIw7BMu9zg8aC0MsR9/5cIu37nL/QmPeudW1MEKCDL668eFJjqQtLrt7BUR/Am6uEa9UEJBpYGX9nDQ==\n",
    "InternalSSHprivKey": "-----BEGIN RSA PRIVATE KEY-----\nMIIEhgIBAAKB/DuMbYHwF8KvOU2e7K9tx7Sq1jac4xYAPnrcHd8JR9IF2b+1MscO\n8Q9U8GWCtKxpewLAnSF/HZKnPHATLn+s/FdoysxFJHz+InXYWmj0EuQKtqHaz8wo\n5teiUh7kRUqi262dWtbZQp/EH0cEhwF+YRiTx4Yf4Ubg9NCSTX42aWu/svJZ/k/k\n7Ad8/TcuFotpDkYW8xmlidk/h3sacpwISurm3F1289I/rCq5L4t9wZ81R7GJ5v9C\nEswuYOCkjN6mgwqFIw7BMu9zg8aC0MsR9/5cIu37nL/QmPeudW1MEKCDL668eFJj\nqQtLrt7BUR/Am6uEa9UEJBpYGX9nDQIDAQABAoH8F4J/kXdKyjAHvZ9q7m+3EpsH\n65PRC9SY1kSk/nNZiY+Jwmg1XeWGQnlUV59c2VzjldiZsKnQ8R9Zex33k6ymC5Rx\nzGfX2gkzbbc3KoyCEMoyBICw3Xege624Ij4ZJTha5pZjUMOBzDXP62fcLTjJ+LwP\nX+KiPCuAPAAm3Bd68epahu/tXiEfUGu5XFm3WnzMIsMrld72Ab8KfdyayYXx8oZ7\nLR0fZ7dcuBBzwTxL6KDLRWgJFIHNvt5/v4uE8xT59c9772sERD4noqCfz0y6VsSg\n07AReqviJ1Opc1mu3R23UGm814Kh26iBN2a0m70tW9xxQ9KeOqUYsQsBAn5+0X13\nkFqokJc9zHr8hjrywbdmFrErR/VyYVmh9bwjDLCj9luiEQEmmNsZ8kSpqvTEmXJd\nT5wI6ljWLL6/jbQjEiZnol9Fhv4COcXCQXRp9XgIl2kT0Jg5MCwblrdq6+nrFJ9M\n/EikYvp05cyYsdWJ58tlP3I0XNYwik4ETG0Cfng08oX7uF8O3D4c7jFdR2uTHBGd\n7TAOlh3Mqgext5RXuNNQECQiXbdl8lXOdWcTooSQCmETvZcIsJgwJjcPt8yZHtmG\nyq7mTtEkVSa0OiBwKv6VnUFQqcAEzXk7kisbuLK8JjeDJaJ9utpyLvlDYxpZ0mGt\nlY5UsCY0Q8WhIQJ+Bx1/2Z56x6xdH/WCDPsHAanGvW6WgCQobEN0APdFl/3VR03C\nNGSZ2O2IlXsXSSGzW/h0BzI5QfLMCzA1f4/WxIT3LKdaiw8Y300YgF6lxnAkTO8Z\n0SfJ8qN2RhH9tn0IAc2FZo9dEAQRo09lXEfWAEfGDKczpCZ1SgHogKs9An4Wk7M9\nUPLm/66ECAEAyolAfifWczf6P7MYM7l3qnxVY0h2f+IYN1AswvwJ+7X3PyvWBAu/\nAeso89vfKaPz0YLu7Y9qwbNNuDnzdAR+OXl1YDvjBwy7GMBZLU2+SF8Cs7kz8Lyo\nznwCEeVHZAzIikyRxW8LI4QkcGmJM2XLN6ECfmZNrp6RZ6TF0hY6qC1tJrCzmEd/\nR9Jvpzn3cPS3EV4zpz+tx01jaDGjG7NwPOaaGb+Dfe0kMy2WNtpZyW69OLGC5KgV\nJzm2DNn0HI57/9MV5hlxSlrKFKUsyQdN2itYG+Cjno+Hi/GubLT0bmWOhkZEyJih\nFInfnSf4l7YkdA==\n-----END RSA PRIVATE KEY-----\n",
    "Detached": true,
    "PreferLocalImages": true,
    "Pid": -1,
    "PublicIP": "",
    "CreatedAt": "2016-06-14T16:32:35.985836065-06:00"
}
launching c.Args [/Users/clove/Workspace/src/github.com/TheNewNormal/corectl/corectl xhyve bGlieGh5dmVfYnVnIC1zIDA6MCxob3N0YnJpZGdlIC1zIDUsdmlydGlvLXJuZCAtbCBjb20xLHN0ZGlvIC1zIDMxLGxwYyAtVSA5MDM5NDg5Ny01OTU2LTRGOUQtQUZDNC1FMjc0QzQ1REFCMzEgLW0gMzA3Mk0gLWMgMiAtQSAtdSAtcyAyOjAsdmlydGlvLW5ldCAtcyA0OjAsdmlydGlvLWJsaywvVXNlcnMvY2xvdmUva3ViZS1zb2xvL2RhdGEuaW1n a2V4ZWMsL1VzZXJzL2Nsb3ZlLy5jb3Jlb3MvaW1hZ2VzL2JldGEvMTAxMC40LjAvY29yZW9zX3Byb2R1Y3Rpb25fcHhlLnZtbGludXosL1VzZXJzL2Nsb3ZlLy5jb3Jlb3MvaW1hZ2VzL2JldGEvMTAxMC40LjAvY29yZW9zX3Byb2R1Y3Rpb25fcHhlX2ltYWdlLmNwaW8uZ3os ZWFybHlwcmludGs9c2VyaWFsIGNvbnNvbGU9dHR5UzAgY29yZW9zLmF1dG9sb2dpbiB1dWlkPTkwMzk0ODk3LTU5NTYtNEY5RC1BRkM0LUUyNzRDNDVEQUIzMSBzc2hrZXk9InNzaC1yc2EgQUFBQUIzTnphQzF5YzJFQUFBQURBUUFCQUFBQkFRRGtVb0FhcUtrQjF4OHc2WW5EbnVneXoyOXEwaXowM3JKUjU2S0hGdkgxMDJDR2dYRVRNY3pCZmNSR2d2SWFmODFRMlQzTmMwK0x4TFIzMzZpeU53Z0hkazB6bzdKdXNlVjA5SVIxZmhtc0JjeXJ3aS95SWtsZnlVNFd2RjVLOWgrNHRhdzhQcGFScWJWRm9sVStXMWNUWVloUGN5TjV6QnJ0TEJCQTh0UkdELzExVVljTkQ0bURnd0Nvd1Y4NmZ3WExSMGpkU1hkL2IvRmVZa2M2UTZYRUFqcWJCaElLY1N5Umw3YmNzVUlPaEhOQkRwRmk2NXdzOHJEVENWbzRXZWo1S0h6U3pwd2FIekdaelNOR2hwTEVCVnlQNzhXem42elVLdXpRKzZQc0RRcjZ1L2dYWVE0bnltZkZWeEtDMnRxV1IwQmxuUkE5MkFPcmx1UVlGN0J2IGNsb3ZlQGNsb3ZlLW1icCIgZW5kcG9pbnQ9aHR0cDovLzE5Mi4xNjguNjQuMTo1NDg1MC9rOHNvbG8tMDEgY2xvdWQtY29uZmlnLXVybD1odHRwOi8vMTkyLjE2OC42NC4xOjU0ODUwL2s4c29sby0wMS9jbG91ZC1jb25maWc=]
launching c.Path /Users/clove/Workspace/src/github.com/TheNewNormal/corectl/corectl
launching c.Dir
launching c.Evn []
launching c.ProcessState <nil>
launching c.Stderr <nil>
launching c.Stdout <nil>
launching c.Stdin <nil>
launching c.Process.Pid %!s(int=16765)
Error: Unable to grab VM's IP after 30s (!)... Aborting

Usage:
  corectl load path/to/yourProfile [flags]

Examples:
  corectl load profiles/demo.toml

Global Flags:
      --debug   adds extra verbosity, and options, for debugging purposes and/or power users

All flags can also be configured via upper-case environment variables prefixed with "COREOS_"
For example, "--debug" => "COREOS_DEBUG"

       30.49 real         0.33 user         0.04 sys

Now running the above coreos xhyve bGlieG... started a coreos shell. kube-solo is not able to find it though ...

@chrislovecnm
Copy link

Any ideas?

@AntonioMeireles
Copy link
Member

ok. do this plz ... brew install corectl then just do ... sudo corectl run (to exit in the end call sudo halt).
inside this minimal VM do you still have not networking ? (also can you ping me in kubernetes' slack ? )

@AntonioMeireles
Copy link
Member

@chrislovecnm issue was an undeterminated host network issue that seemed to go away just by restarting nfs... after that everything seems to behave as expected.

@quinncomendant
Copy link
Author

quinncomendant commented Jun 15, 2016 via email

@AntonioMeireles
Copy link
Member

just sudo nfsd restart on the mac

@chrislovecnm
Copy link

Also what was the command that showed us that nfs was hung? I am not on my Mac. But one of the commands did not return.

@AntonioMeireles
Copy link
Member

AntonioMeireles commented Jun 15, 2016

showmount -ea localhost was hanging albeit nfsd status were telling us things were ok..

@clubanderson
Copy link

this solved the problem for me also - sudo nfsd restart

@AntonioMeireles
Copy link
Member

guys - i think we've cut most of the corner cases you've been hitting. please reopen if the issue comes back from the grave in the currently shipping releases. and, once again, many thanks for your patience!

@vkrishnasamy
Copy link

Please try the following to fix the unable to grab VM's IP issue

  • Change the CoreOS Channel to Alpha
    • Corectl stop / Corectl run
    • sudo Corectld stop / start
    • Kube-cluster UP

@maikelvl
Copy link

I noticed Container Linux 1967.5.0 is the latest image this problem does not suffer from.

I've tested these images:

2079.6.0
2079.5.1
2079.3.0
2023.5.0
2023.4.0
1967.6.0
1967.5.0
...
1911.3.0

@maikelvl
Copy link

After the update to macOS Big Sur this problem reoccurs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants