-
Notifications
You must be signed in to change notification settings - Fork 657
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deployment locked by a vm. Error Unmarshalling monit status #1754
Comments
Saw the same some time ago. Director error handling isn't very good in these situations, so we definitely should fix this. |
In case others stumble upon this:
Still todo: Make the Director able to cope with that message and show the unresponsive VM to the user. |
Fixed with c9cc3ff |
@voelzmo As far as I can tell, that fix is only included in the 262.x series. It doesn't appear to be included in 265.x. We're currently hitting this issue running v265.2.0. I see that https://www.pivotaltracker.com/n/projects/1456570/stories/151434806 is marked as accepted. What needs to happen to get this fix included in 265? |
Good catch, @alext! I think we might have missed this when branching off the 265.x branch. Version 266 should contain the fix again: bosh/src/bosh-director/lib/bosh/director/deployment_plan/assembler.rb Lines 120 to 125 in fa2d5cf
@cppforlife can you backport c9cc3ff also to 265.x? It's also not in 264.x and 263.x, not sure what your long maintenance plans are for those. |
Turns out that fix was not included in 262.x either: https://github.com/cloudfoundry/bosh/blob/262.x/src/bosh-director/lib/bosh/director/deployment_plan/assembler.rb#L90 It's just an incorrect github tag that's suggesting this commit is in a 262 release. 266 will be the first version containing this fix. |
We have encountered an issue [1] of a process putting a VM under load that causes monit to drop the connections. This is fixed in the 266.x series, therefore we should upgrade to that. We bump the stemcell to the latest versions because: * we want to have the latest security updates * the precompiled BOSH release is only created for the latest stemcell versions [1] cloudfoundry/bosh#1754
We have encountered an issue [1] of a process putting a VM under load that causes monit to drop the connections. This is fixed in the 266.x series, therefore we should upgrade to that. We bump the stemcell to the latest versions because: * we want to have the latest security updates * the precompiled BOSH release is only created for the latest stemcell versions [1] cloudfoundry/bosh#1754
We have encountered an issue [1] of a process putting a VM under load that causes monit to drop the connections. This is fixed in the 266.x series, therefore we should upgrade to that. Version 266.6.0 may have fixed another issue as well[2]. [1] cloudfoundry/bosh#1754 [2] cloudfoundry/bosh#1990 For reference, the Precompiled releases are available in the public bucket: https://s3.amazonaws.com/bosh-compiled-release-tarballs Version IDs can be extracted using something like: ```bash aws s3api list-object-versions --bucket bosh-compiled-release-tarballs --prefix bosh-266.6 ```
Closing this issue as it has been fixed in v266+. |
I consistently have a problem with bosh director loosing control on a deployment (see traces below)
A single vm on a depl under load prevents us from any operation (instances / vms / deploy).
Tested with bosh 262.3
Id expect an inconsistent vm not to block bosh director operation on a deployment.
Here the related bosh task --debug
bosh agent logs
The text was updated successfully, but these errors were encountered: