Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VM becomes unresponsive to bosh after reboot #35

Closed
gossion opened this issue Oct 19, 2018 · 6 comments
Closed

VM becomes unresponsive to bosh after reboot #35

gossion opened this issue Oct 19, 2018 · 6 comments
Labels

Comments

@gossion
Copy link

gossion commented Oct 19, 2018

Not sure if you realize this issue.

I tried a test to deploy a simple workload on Windows VM on Azure, it deployed successfully. The problem is that when I tried to restart the VM (via azure portal), the VMs became unresponsive forever and never came back to service.

guwe@guwecf0628:~/workspace/sample-go-windows-boshrelease$ bosh -e azure vms
Using environment '10.0.0.4' as client 'admin'

Task 967. Done

Deployment 'webapp'

Instance                                      Process State       AZ  IPs  VM CID
                   VM Type  Active
webapp1/56d68506-4ed6-4e0c-a9bf-fa9624a29b09  unresponsive agent  z2  -    agent_id:f95268c3-a067-4b09-ba9f-65c0ae2d2fe8;resource_gr
oup_name:guwe0628  small    true
webapp1/8452e005-bb6e-49ff-b7dc-0b74b492db0e  unresponsive agent  z1  -    agent_id:066b3e8d-6c9e-4c8e-84a3-fcf4b3b262a8;resource_gr
oup_name:guwe0628  small    true

2 vms

Succeeded

My env:

---
name: webapp

instance_groups:
- name: webapp1
  azs: [z1, z2, z3]
  instances: 2
  vm_type: small
  stemcell: windows1803
  networks:
    - name: default
      default: [gateway, dns]
    #- name: network2
  jobs:
  - name: simple-go-web-app
    release: sample-go-windows
    properties:
      port: 3000


variables: []

stemcells:
- alias: windows
  os: windows2012R2
  version: latest
- alias: "windows1803"
  os: "windows1803"
  version: "1803.2"

update:
  canaries: 1
  canary_watch_time: 1000-120000
  update_watch_time: 1000-120000
  max_in_flight: 1
  serial: false

releases:
- name: sample-go-windows
  version: 1.0.0
  url: https://github.com/cloudfoundry-community/sample-go-windows-boshrelease/releases/download/v1.0.0/sample-go-windows-1.0.0.tgz
  sha1: 7d15b2bd43acf849fac5f6ec805e0b6cfa1b9bb5

I believe the VM was up because it had response to RDP. Maybe there is an issue with bosh-agent, however, I don't have a credential to login to VM to check.

@cf-gitbot
Copy link

We have created an issue in Pivotal Tracker to manage this:

https://www.pivotaltracker.com/story/show/161338351

The labels on this github issue will be updated when the story is started.

@thisisnotashwin
Copy link

Hey @gossion

Bosh does not currently support a workflow where VM restarts are managed outside of the Bosh director. In case you do need to restart VMs, the bosh cli has commands to do the same. Restarting the VM from the portal leads to the BOSH director thinking the VMs have been torn down and it tries to recreate them.

Was there a particular use case that you had which required the VMs to be restarted from the Azure portal? It would help us understand the requirement a little better.

Thanks!!

@gossion
Copy link
Author

gossion commented Oct 23, 2018

Thanks @ashwin-venkatesh . Currently, I don't have any workload blocked, I just saw the error and thought that it could be an issue.

A use case that I can think is a test environment, people would like to stop the VMs for purpose of cost saving, and later on when they need to use the test environment again they start the VMs, and finally they find that the VMs are not recovered.

@thisisnotashwin
Copy link

Hey @gossion

That is a reasonable use case. Bosh does currently support this workflow using the bosh stop command. It also includes the --hard flag which deletes the VMs but holds onto the persistent disks.

Those would be the recommended way to achieve the desired result. This should ensure things work seamlessly.Bosh maintains it's own state of the world and does not work as intended when changes are made to a VM state via mechanisms external to it.

I hope this addresses the above concern. Do you have other questions or would it be alright if I closed this issue?

@thisisnotashwin
Copy link

Additionally, you can use bosh start to restart a stopped deployment.

@gossion
Copy link
Author

gossion commented Oct 25, 2018

Thanks @ashwin-venkatesh .

There is no API for CPI to stop (which is allocate on Azure) a VM, for purpose of cost saving I need to delete the VM (bosh stop --hard) which is not convenient to find the instance id once it is deleted (it is not shown on bosh vms anymore). So I think it is better if the VM can come back to service after an expected/unexpected reboot.

Anyway, I am not blocked. I will close this issue, but think it is a nice-to-have feature.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants