-
Notifications
You must be signed in to change notification settings - Fork 658
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bosh cck / hm resurrector broken with multi-cpi and different stemcells (eg: vpshere and openstack) #2287
Comments
We have created an issue in Pivotal Tracker to manage this: https://www.pivotaltracker.com/story/show/175639525 The labels on this github issue will be updated when the story is started. |
@any insight / update on this issue ? |
Summary of the reverse-engineering performed with @o-orand, where the problem also reproduces with a bosh deployment on a single az while the director has available stemcells of multiple types (vsphere and openstack) With a manifest file structure as below, and the cpi-config.yml cloud-config.yml described in #2287 (comment)
When the cloud-check command fails, the following stack trace is displayed in bosh debug task
We're observing that the We suspect that this
Bosh::Director::DeploymentPlan::InstancePlan in its nested Bosh::Director::DeploymentPlan::Instance.stemcell field
We're suspecting that unlike a bosh/src/bosh-director/lib/bosh/director/deployment_plan/instance_group.rb Lines 30 to 31 in b4d0df7
and bosh/src/bosh-director/lib/bosh/director/deployment_plan/instance_group_spec_parser.rb Lines 240 to 248 in 7bcc9bc
This reverse engineering is based on source code analysis as well as adding debugging traces in the local source to dump models and stack traces. We will now see if the bosh team could help us finding software architecture documentation, or potentially hint into the direction to contribute a fix. |
thanks a lot @cunnie for your work. We're eager to help testing any fix you might be able to develop in our 3 az lab (2 vsphere and 1 openstack). As a stop-gap workaround, we're currently patching our director instances
model = @models.last . In our 3 az environments (2 vsphere and 1 openstack), this should select the vsphere stemcell which is present on more AZs than the openstack one, making it possible after a bosh deploy --recreate to use bosh cloud-check on vsphere instances part of a multi-cpi deployment.
|
This issue was marked as |
Hello @cunnie , did you have an opportunity to investigate this issue ? |
This issue was marked as |
@o-orand I'll try to look at it tomorrow. |
I think I may have a fix. Am testing it now. |
It seems promising, great ! |
Prior to this commit, `bosh cck` would sometimes fail on multi-CPI installations with a "Required stemcell ... not found for cpi" message. This commit fixes that failure by selecting the stemcell appropriate for the particular CPI (rather than merely grabbing the first stemcell, which was the prior behavior). An obvious question is, "why could `bosh deploy` find the correct stemcell, but `bosh cck` couldn't?" The answer is that `bosh deploy` follows a different codepath (deploy uses `CreateVmStep`). Fixes, during `bosh cck` on a multi-CPI Director: ``` Task 41558 | 21:16:40 | Applying problem resolutions: VM for 'dummy-vsphere/812e9491-a615-465a-b975-5f0c044f7739 (0)' with cloud ID 'dummy-vsphere_sslipio_f41a348db816' is not responding. (unresponsive_agent 1945): Recreate VM without waiting for processes to start (00:00:21) L Error: Required stemcell {"name"=>"bosh-aws-xen-hvm-ubuntu-bionic-go_agent", "version"=>"1.25"} not found for cpi vsphere, please upload again ``` [fixes #2287] Signed-off-by: Maria Shaldybin <mariash@vmware.com>
Prior to this commit, `bosh cck` would sometimes fail on multi-CPI installations with a "Required stemcell ... not found for cpi" message. This commit fixes that failure by selecting the stemcell appropriate for the particular CPI (rather than merely grabbing the first stemcell, which was the prior behavior). An obvious question is, "why could `bosh deploy` find the correct stemcell, but `bosh cck` couldn't?" The answer is that `bosh deploy` follows a different codepath (deploy uses `CreateVmStep`). Fixes, during `bosh cck` on a multi-CPI Director: ``` Task 41558 | 21:16:40 | Applying problem resolutions: VM for 'dummy-vsphere/812e9491-a615-465a-b975-5f0c044f7739 (0)' with cloud ID 'dummy-vsphere_sslipio_f41a348db816' is not responding. (unresponsive_agent 1945): Recreate VM without waiting for processes to start (00:00:21) L Error: Required stemcell {"name"=>"bosh-aws-xen-hvm-ubuntu-bionic-go_agent", "version"=>"1.25"} not found for cpi vsphere, please upload again ``` [fixes #2287] Signed-off-by: Maria Shaldybin <mariash@vmware.com>
Prior to this commit, `bosh cck` would sometimes fail on multi-CPI installations with a "Required stemcell ... not found for cpi" message. This commit fixes that failure by selecting the stemcell appropriate for the particular CPI (rather than merely grabbing the first stemcell, which was the prior behavior). An obvious question is, "why could `bosh deploy` find the correct stemcell, but `bosh cck` couldn't?" The answer is that `bosh deploy` follows a different codepath (deploy uses `CreateVmStep`). Fixes, during `bosh cck` on a multi-CPI Director: ``` Task 41558 | 21:16:40 | Applying problem resolutions: VM for 'dummy-vsphere/812e9491-a615-465a-b975-5f0c044f7739 (0)' with cloud ID 'dummy-vsphere_sslipio_f41a348db816' is not responding. (unresponsive_agent 1945): Recreate VM without waiting for processes to start (00:00:21) L Error: Required stemcell {"name"=>"bosh-aws-xen-hvm-ubuntu-bionic-go_agent", "version"=>"1.25"} not found for cpi vsphere, please upload again ``` [fixes #2287] Signed-off-by: Maria Shaldybin <mariash@vmware.com>
Prior to this commit, `bosh cck` would sometimes fail on multi-CPI installations with a "Required stemcell ... not found for cpi" message. This commit fixes that failure by selecting the stemcell appropriate for the particular CPI (rather than merely grabbing the first stemcell, which was the prior behavior). An obvious question is, "why could `bosh deploy` find the correct stemcell, but `bosh cck` couldn't?" The answer is that `bosh deploy` follows a different codepath (deploy uses `CreateVmStep`). Fixes, during `bosh cck` on a multi-CPI Director: ``` Task 41558 | 21:16:40 | Applying problem resolutions: VM for 'dummy-vsphere/812e9491-a615-465a-b975-5f0c044f7739 (0)' with cloud ID 'dummy-vsphere_sslipio_f41a348db816' is not responding. (unresponsive_agent 1945): Recreate VM without waiting for processes to start (00:00:21) L Error: Required stemcell {"name"=>"bosh-aws-xen-hvm-ubuntu-bionic-go_agent", "version"=>"1.25"} not found for cpi vsphere, please upload again ``` [fixes #2287] Signed-off-by: Maria Shaldybin <mariash@vmware.com>
Prior to this commit, `bosh cck` would sometimes fail on multi-CPI installations with a "Required stemcell ... not found for cpi" message. This commit fixes that failure by selecting the stemcell appropriate for the particular CPI (rather than merely grabbing the first stemcell, which was the prior behavior). An obvious question is, "why could `bosh deploy` find the correct stemcell, but `bosh cck` couldn't?" The answer is that `bosh deploy` follows a different codepath (deploy uses `CreateVmStep`). Fixes, during `bosh cck` on a multi-CPI Director: ``` Task 41558 | 21:16:40 | Applying problem resolutions: VM for 'dummy-vsphere/812e9491-a615-465a-b975-5f0c044f7739 (0)' with cloud ID 'dummy-vsphere_sslipio_f41a348db816' is not responding. (unresponsive_agent 1945): Recreate VM without waiting for processes to start (00:00:21) L Error: Required stemcell {"name"=>"bosh-aws-xen-hvm-ubuntu-bionic-go_agent", "version"=>"1.25"} not found for cpi vsphere, please upload again ``` [fixes #2287] Signed-off-by: Maria Shaldybin <mariash@vmware.com>
Successful Multi-CPI
|
Describe the bug
When leveraging bosh multi-cpi feature, targeting 2 different iaas (vsphere iaas, and openstack iaas)
To Reproduce
Steps to reproduce the behavior :
2b. Upload openstack stemcell
Expected behavior
As bosh deploy is ok with multi CPI on 2 target iaas types, I expect bosh cck to be usable in that context (and also bosh hm resurrector)
Logs
see https://github.com/orange-cloudfoundry/paas-templates/issues/840
==> mismatch, as region 2 is configured as vpshere iaas type
Versions (please complete the following information):
Deployment info:
Plain bosh deployment, using bosh ops files, with multi cpi az
Deployment:
Here is the cpi-config.yml :
Here is the cloud-config.yml:
Additional context
Add any other context about the problem here.
The text was updated successfully, but these errors were encountered: