-
Notifications
You must be signed in to change notification settings - Fork 657
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for multiple CPIs (mvp) #1471
Conversation
…(solution without schema changes)
…the deployment model
* develop: (34 commits) Introducing the DiskManagerFactory. Extracting OrphanDiskManager from DiskManager. Fix cis spec Don't log interpolated values in output Refactoring to replace InstanceGroup#persistent_disk_type with InstanceGroup#persistent_disks. Bump bosh-agent stemcell_builder: force-remove SSH host keys stemcell/google: Disable SSH set host keys by Google agent Disable IPv6 Redirect Acceptance Integration test for resource pool placeholders refactor deep_hash_replacement ignore Allow to use root partition as ephemeral disk Fully qualify syslog reload command when logrotating Quote all proxy environment variables to account for spaces more robust config server integration test Config server better SSL error messages Fix config server integration test fix hash first level interpolation integration tests for env interpolation uninterpolated resource pool env are now handled ...
* develop: (203 commits) More go cli integration test migration Continue porting test to go cli Add additional tests to gocli integration suite Copy syslog to tmp directory before scp'ing Bump Agent Attach disk if necessary before update settings Fix unit test Switch to use update settings call for associate disks Adding final release for build 3262.14 (v257.14) finalized on master Remove tests for removed gocli functionality Un-pend external CPI tests for gocli Add gocli integration tests Add azure-hyperv to list of stemcell artifacts Use xen hypervisor when building softlayer stemcells Add stemcell smoketest for /var/log permission fixes Change concourse build stemcell task when moving a stemcell from tmp -> output to use the stemcell filename instead of a wildcard since tmp now also has bosh-257.3.tgz. Bump OS images Only use Syslog::Logger with Ruby >= 2 Write director events also to syslog Bump os-images ...
…context['cpi_properties']
* develop: Instance returns 'lifecycle' from spec if available Add 'lifecycle' attribute to instance apply spec disable bosh_micro building in new stemcell pipeline Add additional int. test for cert generation Split out separate go cli instance class
…ic test results" This reverts commit 6f7f86f.
* develop: (49 commits) Bump bosh agent Remove polluted test setup Revert "Temporarily remove managed_persistent_disk_spec.rb" Allow custom configuration of semver resource for stemcells Removing policy from README in favor of file Bump the parallel_tests gem Rename os images directories to reflect move to containers Revert "Prepend namespace to config keys." Bump OS images Prepend namespace to config keys. Create IAM policy set to the bosh:os-image s3 resource Migrate tests for gocli integration Bump bosh-agent for fixing ephemeral disk test Temporarily remove managed_persistent_disk_spec.rb Move git tag step into publish job Push new os-images with public-read permissions Fix incorrect semver syntax in initial_version bosh pipeline should create candidate release tarballs clarify use of dry_run flag in job_runner Bump bosh-agent ...
… vars (which might no be set globally, i.e. for rvm)
@MatthiasWinzeler @voelzmo Thanks for this PR! We've been looking over the code and doing local testing and it's looking good so far. There are a couple issues that we'd like to bring to your attention before we finalize the review, though.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks really good overall. There's just a handful of things we'd like you to take care of, plus a few questions we had.
@@ -66,12 +67,15 @@ def ping(*arguments); invoke_cpi_method(__method__.to_s, *arguments); end | |||
private | |||
|
|||
def invoke_cpi_method(method_name, *arguments) | |||
context = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If context is always going to be initialized in this way, then the CPIs (Openstack CPI currently does this) should not test if context && context[<CPI TYPE>]
before merging in the context CPI properties because context
will always contain a value (holds true for 3146.x, 256.x, 257.x - backwards compatibility)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. I removed the check for context
in the openstack cpi: https://github.com/swisscom/bosh-openstack-cpi-release/commit/77970370f428447586925001414c4a84cb892d6d
@@ -79,6 +90,32 @@ def spec | |||
} | |||
end | |||
|
|||
def cid_for_az(az) | |||
raise 'please bind model first' if @models.nil? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be nice to have a more detailed error/specific error type here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I kept it a generic runtime exception on purpose - it should always happen during develop time and bubble up to the developer (it means he didn't bind the model before using cid_for_az
).
What's the preferred way in such situations? Should we still introduce a more specific error? Does it need to be put in errors.rb
(even though it should never leave the system)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a fair point, this will be fine for now.
module Bosh::Director | ||
class CloudFactory | ||
def self.create_from_deployment(deployment, | ||
cpi_config = Bosh::Director::Api::CpiConfigManager.new.latest, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since CloudFactoryHelper is the only place that this method is called and it doesn't override the default_cloud
, remove the default_cloud
method parameter. (See comment in CloudFactoryHelper as well)
# that could be passed to cloud factory, we use the latest cloud config | ||
CloudFactory.new(CloudFactory.create_cloud_planner(Bosh::Director::Api::CloudConfigManager.new.latest), | ||
CloudFactory.parse_cpi_config(Bosh::Director::Api::CpiConfigManager.new.latest), | ||
Config.cloud) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Already the default for this parameter. Remove, also see comment on cloud_factory.rb:15
status(201) | ||
end | ||
|
||
get '/', scope: :read do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tighten access to this endpoint; cpi_configs can have sensitive credentials so we don't want to be as permissive as we are with CloudConfig.
@@ -5,6 +5,7 @@ module Jobs | |||
class UpdateStemcell < BaseJob | |||
include ValidationHelper | |||
include DownloadHelper | |||
include CloudFactoryHelper | |||
|
|||
UPDATE_STEPS = 5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can no longer be hard-coded because it's dependent on number of configured CLIs. Currently update_stemcell
output with multiple CPIs looks like
Started update stemcell
Started update stemcell > Extracting stemcell archive. Done (00:00:00)
Started update stemcell > Verifying stemcell manifest. Done (00:00:00)
Started update stemcell > Checking if this stemcell already exists on cloud foo. Done (00:00:00)
Started update stemcell > Uploading some-name/some-version to the cloud foo. Done (00:00:12)
Started update stemcell > Save some-name/some-version (some-ami) for cloud foo. Done (00:00:00)
Done update stemcell (00:00:12)
Started update stemcell > Checking if this stemcell already exists on cloud bar. Done (00:00:00)
Started update stemcell > Uploading stemcell some-name/some-version to the cloud bar. Done (00:00:07)
Started update stemcell > Save stemcell some-name/some-version (some-ami) for cloud bar. Done (00:00:00)
The Done update
message should print after all stemcells have been updated successfully.
} | ||
|
||
subject(:parsed_cpi_config) { described_class.new(cpis) } | ||
describe '#find_cpi_by_name' do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since cpi_manifest_parser
doesn't allow CPIs with duplicate names, is there value in testing that the ParsedCpiConfig returns the first cpi?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, there's not. I changed it.
@@ -15,7 +15,7 @@ module Bosh::Director | |||
let(:dns_manager) { DnsManagerProvider.create } | |||
let(:event_log) { Config.event_log } | |||
|
|||
let(:cloud) { instance_double('Bosh::Cloud') } | |||
let(:cloud) { Config.cloud } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line can be deleted; cloud
was only needed for initializing the subject.
@@ -9,8 +9,8 @@ def make_deployment(name) | |||
BD::Models::Deployment.make(:name => name) | |||
end | |||
|
|||
def make_stemcell(name, version, os = 'os1') | |||
BD::Models::Stemcell.make(:name => name, :operating_system=>os, :version => version) | |||
def make_stemcell(name, version, os = 'os1', params = {}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like cpi
could be a separate argument and default to nil. Then params
arg could be removed; doesn't seem necessary to have a params hash when only one param is ever used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not only cpi
, but also cid
(see further below in the spec). Should I add both as separate arguments? I prefer the params
way since it's more readable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, didn't catch the cid
. I don't have strong feelings about params vs separate args in this case, so this seems fine as is.
@@ -329,43 +328,6 @@ | |||
end | |||
end | |||
end | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why was this test removed? Don't see how it relates to this PR.
@cppforlife @MatthiasWinzeler I realize this thought is coming a little late, but do we actually need a separate The distinction is more or less an implementation detail as the cloud config is already full of properties that will be passed to the CPI. I'm worried users not familiar with the internals of BOSH won't understand why some IaaS properties need to go in cloud-config and others in cpi-config. I'd imagine this also simplifies the multi-cpi implementation and solves the issue mentioned by @tjvman where multi-cpi could break |
thinking behind separation was that cpi config isnt versioned just like cloud config. its likely that you will have to update/shift configuration over time for your cpi (cred change, etc) and i dont think we want users to have to redeploy for that. Sent from my iPhone
|
@ljfranklin @cppforlife Furthermore, cpi-config contains credentials and should potentially handled differently by admins and also the director than the rest of the cloud-config. |
…traint for stemcell cpi
Thanks for reviewing the changes! |
It's best explained when looking at this integration test which tests the switch from non-cpi-config to cpi-config:
My assumption was that this works out of the box. Without my ugly fix, this did not work - BOSH used the older cloud config while deploying, so it did not know which AZ uses which CPI and thus deployed to the director cpi. You can easily reproduce this by removing my hack and run the integration test above. Maybe this is some other bug? |
* develop: (159 commits) Fix various gocli integration tests Separate stemcell buckets for candidate and publish Fix `environment` command calls in gocli integration test Migrate lifecycle to instance spec Fix ruby integration test Fix go cli integration test If a http with a payload returns a 401, then return a response instead of an error Fix gocli integration tests that used stateful env Guard against credentials being nil. Raise an auth error when director returns 401 Fix stemcell upload expectations in ruby CLI Quietly ignore stemcells which already exist try to fix integration tests for gocli statefullness removal Fix broken integration tests Unpend dry-run integration tests Unpend passing test Drop agent_client from GemComponents list Allow release sha1 to be passed in body or as query param Drop `agent_client` source folder Drop simple_blobstore_server Remove genisoimage package from release ...
* develop: When excluding dns records by ip, cast ip to string Add tags from runtime_config to deployment_plan Formatting Add logic to ensure duplicate dns agent records are not sent to agent. Add logic to remove stale dns records for an instance Fix publish stemcell and integration test Use the proper variable Unpend cck --auto integration tests Bump bosh-agent Upload blobs for s3cli/s3cli-0.0.53-linux-amd64 Bump s3cli to 0.0.53 Allow trimming empty lines in templates Adding mkdir of tmp because it fails without folder created
Pushed some last improvements - it's ready now for another review - and merged current develop. integration-1.9-postgres has some broken specs, but it's the same on develop. |
@MatthiasWinzeler we are planning to take a look at the changes next week. |
@MatthiasWinzeler Thanks for making those changes. We looked into how the change to when the deployment model gets updated, and we believe we've identified the source of the bug as well as a way to fix it. With that in mind, we intend to merge this PR as it currently stands and make our change to how |
@MatthiasWinzeler the |
Nice work @MatthiasWinzeler ! Thanks tons for contributing this! |
This PR provides basic support for multiple CPIs in a single BOSH director.
The high level concept is outlined here: https://github.com/cloudfoundry/bosh-notes/blob/master/multi-cpi.md
@cppforlife supported me with implementation details and reviews of some code parts.
Some notable points:
CpiConfig
which stores the multiple CPI endpoint configurations. CloudFactory then looks up the cpi config for a given AZ and returns an instantiatedCloud
. To achieve this lookup, the cloud factory uses the latest cpi config and the deployment's cloud config (which contains the AZ -> CPI relation).deployment.cloud_config
. In some cases, i.e. while updating the deployment,deployment.cloud_config
did not contain the most recentCloudConfig
but an older one, because this association was only set and persisted after updating the deployment. I had to do some ugly hacks to setdeployment.cloud_config
before every cloud interaction. I guess this needs some improvements.cpi
attribute. But since postgresql seems to completely ignore the uniqueness constraint if one of the fields isNULL
(which is the case for thecpi
attribute when not using cpi config), I added a second validation. I assume this validation could fail if there is a stemcell with name=x, version=y, cpi=z in DB and we add one with name=x, version=y, cpi=nil. Is there a better way to do this? A custom validator? Ignore this case?context
hash. We implemented this small change for the openstack cpi.We tested this in our lab and were successfully able to stretch deployments over multiple openstacks. I will file the PR there as soon as this PR is merged and a new version of the
bosh_cpi
gem is ready for downstream use. (cc @voelzmo)