Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Runtime resource discovery #188

Closed
wants to merge 3 commits into from
Closed

Runtime resource discovery #188

wants to merge 3 commits into from

Conversation

stefanmb
Copy link
Contributor

@stefanmb stefanmb commented Oct 17, 2017

WARNING: Some of this description is now obsolete (Jan26/2018)

What
This PR introduces automatic discovery of CustomResourceDefinitions (1.7+) and ThirdPartyResources (1.6) in a backwards compatible manner. Once this PR is implemented, the monitoring logic for these resources can be removed from kubernetes-deploy.

Motivation
Currently we are handling each resource type by adding a new subclass of KubernetesResource to handle its state (as per the documentation).

This approach makes sense for core resources, but is not scalable for CustomResourceDefinitions (and ThirdPartyResources) for multiple reasons:

  • We are including logic to handle CRDs such as cloudsql and redis which are not open sourced.
  • There is too tight a coupling between our buddy/controller framework (closed source) and k8s-deploy
  • We cannot support 3rd party use cases unless they contribute a resource definition back to k8s-deploy - we cannot centralize all logic for all possible resources users may wish to implement.

The goals of the PR are:

  • Automatically detect custom resources present in the cluster at deploy time
  • Decouple the concern of determining a custom resource's status from k8s-deploy by enabling controllers to annotate resources with their respective statuses.
  • Remain fully backward compatible so that controllers can be upgraded with annotation support incrementally, and the CloudSQL, Redis, etc subclasses of KubernetesResource can be removed one at a time.
  • Support both TPR and CRD custom resources.

How
TODO

TODO
Update README file.

Review
cc: @Shopify/cloudplatform @KnVerey

@stefanmb stefanmb changed the title (WIP - Don't look at me) Runtime resource discovery Runtime resource discovery Oct 19, 2017
@@ -24,11 +24,12 @@ Gem::Specification.new do |spec|

spec.required_ruby_version = '>= 2.3.0'
spec.add_dependency "activesupport", ">= 4.2"
spec.add_dependency "kubeclient", "~> 2.4"
spec.add_dependency "kubeclient", "~> 2.5.1"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Required for raw json support.

spec.add_dependency "googleauth", ">= 0.5"
spec.add_dependency "ejson", "1.0.1"
spec.add_dependency "colorize", "~> 0.8"
spec.add_dependency "statsd-instrument", "~> 2.1"
spec.add_dependency "jsonpath", "0.8.8"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Required to parse out the custom status field.

discover_tpr(v1beta1_kubeclient(context))
begin
discover_crd(v1beta1_crd_kubeclient(context))
rescue KubeException => err
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is thrown for Kubernetes < 1.7 - I am also open to either:

  1. Eliminating this check and dropping support for < 1.7.
  2. Making the check against the version number of the client/server.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we merged the server version check, maybe we can use that?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should use a version check as Karan suggested. I also think that these discovery methods should be retried a couple times so we don't fail the deploy on server blips here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using a version check now.

end

def self.discovered(group:, type:, version:, annotations:)
resource_class = Class.new(self) do
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I previously spoke about using meta-programming with @mkobetic. Since that time I've convinced myself this is the right approach.

We are modelling a class/instance relationship (resource definition vs resource instances). Each definition has specific properties shared by all instances (e.g. whether it's prunable, or its configured timeout). This information should be stored somewhere, since Ruby supports OOP it makes sense to generate classes that can contain the specific settings shared by all resources (instances).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DISCLAIMER: I've only worked with this kind of thing a couple times in the past, and I'm going to want to pull in another experienced Rubyist before we merge this PR.

Agreed, but I think we can take it a step further and make the generated classes "normal" rather than using class variables, by using class_eval. Here's a working example:

module ScratchPadEphemeral
  class DynamicParent
    require 'active_support/core_ext/class/subclasses'

    def self.define_child(child_name, child_type, some_bool)
      new_class = self.class_eval <<-STRING
        class #{child_name.capitalize} < DynamicParent
          def type
            '#{child_type}'
          end

          def some_bool
            #{some_bool}
          end
        end
      STRING
    end
  end
end
[35] pry(ScratchPadEphemeral):1> DynamicParent.define_child("foo", "test", true)
=> :some_bool
[36] pry(ScratchPadEphemeral):1> DynamicParent.define_child("bar", "test", false)
=> :some_bool
[37] pry(ScratchPadEphemeral):1> DynamicParent.subclasses
=> [ScratchPadEphemeral::DynamicParent::Bar, ScratchPadEphemeral::DynamicParent::Foo]
[38] pry(ScratchPadEphemeral):1> foo = DynamicParent::Foo.new
=> #<ScratchPadEphemeral::DynamicParent::Foo:0x007fd8e4106fe8>
[39] pry(ScratchPadEphemeral):1> foo.type
=> "test"
[40] pry(ScratchPadEphemeral):1> foo.some_bool
=> true

This should incidentally be more performant than define_method?, from what I've been told in the past.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adopted this approach in the rewritten version, thanks.

getter = "get_#{type.downcase}"
@client ||= DiscoverableResource.kubeclient(context: @context, resource_class: self.class)
raw_json = @client.send(getter, @name, @namespace, as: :raw)
query_path = JsonPath.new(status_field)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We allow the CRD implementors to specify how the resources should be queried via arbitrary jsonpath to determine its status.

add_resource(resource_class)
end

def self.add_resource(resource_class)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While we generate resource classes dynamically, we still need a way to map resource names to their classes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's very possible that I'm missing something, but I believe that if you use the class_eval technique I mentioned above, we can subclass from KubernetesResource as usual, and the existing lookup technique (KubernetesDeploy.const_defined?(definition["kind"])) should work.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part has been re-written.

@@ -0,0 +1,26 @@
# frozen_string_literal: true
module KubernetesDeploy
class GenericResource < KubernetesResource
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've extracted this base class that represents a generic resource. This code is shared by several existing resources.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, KubernetesResource (the superclass) is already kinda "generic resource", no? Should we simply change the defaults in that class to what you have here? Regardless, please extract this change to a separate PR. 😄

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.

@@ -0,0 +1,6 @@
# frozen_string_literal: true
module KubernetesDeploy
class CustomResourceDefinition < GenericResource
Copy link
Contributor Author

@stefanmb stefanmb Oct 19, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This class allows k8s-deploy to monitor deployment of static custom resource definitions.

@@ -0,0 +1,12 @@
# frozen_string_literal: true
module KubernetesDeploy
class ThirdPartyResource < GenericResource
Copy link
Contributor Author

@stefanmb stefanmb Oct 19, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This class allows k8s-deploy to monitor deployment of static third party resource definitions.

end

def self.kubeclient(context:, resource_class:)
_build_kubeclient(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This value cannot be memoized because we must refresh what CRD/TPRs are available on each invocation (even if it's during a single lifespan of ak8s-deploy process, for example if using this code as a library instead of a standalone executable during test).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that because we might have deployed CRDs? If we want to make that work, we should predeploy them before doing the discovery (or maybe just accept that this does not work for v1, since that'd be really complicated). Doing discovery every time we need to use the client sounds expensive. Alternatively, could we be using instances of this class and discarding them on every run?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Predeploying CRDs now, thanks.

end
end

def has_resource?(res)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Necessary when running without TPR support (k8s > 1.7) or without CRD support (k8s < 1.7).

end

def self.build(namespace:, context:, definition:, logger:)
return super if KubernetesDeploy.const_defined?(definition["kind"])
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This allows us to temporarily fall back to the custom classes in k8s-deploy if they are present for a given resource. This behaviour is useful for progressively moving the monitoring to the controllers and outside of this gem.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(irrelevant if we can make the instances direct subclasses of KubernetesResource, but...) Doesn't this end up covering the regular resources, i.e. is not temporary?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is more explicit in the re-written version. If a matching (== kind) symbol exists in the k8s-deploy module it is used, otherwise a class is generated dynamically.

@stefanmb
Copy link
Contributor Author

We may want to also override failure_message to read a custom message (and possibly fail fast), for example by introducing:

  • (Optional) kubernetes-deploy.shopify.io/status-failed: The value that indicates unrecoverable failure.
  • (Optional) kubernetes-deploy.shopify.io/status-failure-message-field: jsonpath query for detailed human-readable status message. Should be populated when status-failed is returned.

Any thoughts? @KnVerey

end

def self.prunable?
return @prunable if defined?(@prunable)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instance vars that don't exist should still be falsey, just being explicit?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just my Java/C#/C background. ;) Simplified, thanks!


def self.discover(context:, logger:)
logger.info("Discovering custom resources:")
@resources = nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is also a nop afaik

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unless you are resetting them

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They should be reset each time we (re)discover.

@stefanmb
Copy link
Contributor Author

cc: @kirs I'd like to hear your opinion on this PR as well please.

@stefanmb stefanmb force-pushed the resource_discovery branch 3 times, most recently from a6ff545 to 2627ec1 Compare October 27, 2017 18:26
@stefanmb
Copy link
Contributor Author

Rebased on master and addressed some cosmetic comments.

@resources[group][type][version] = resource_class
end

def self.parse_bool(value)
Copy link
Contributor

@devonboyer devonboyer Nov 28, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kubernetes also uses yes for the value of a boolean annotation. These annotations are added to persistent volume claims once it is bound.

pv.kubernetes.io/bind-completed=yes
pv.kubernetes.io/bound-by-controller=yes

This should also look for yes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets do ON, yes, true and 1 wdyt?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The value needs to be a string so "1" might get confusing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switched to ActiveModel, thanks!

@stefanmb
Copy link
Contributor Author

This PR rebased pretty cleanly. I know you have a full plate, but when you get a chance I'd like to hear your thoughts @KnVerey, also with respect to #188 (comment).

Copy link
Contributor

@karanthukral karanthukral left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

👍 on the cleanup of the common code too

discover_tpr(v1beta1_kubeclient(context))
begin
discover_crd(v1beta1_crd_kubeclient(context))
rescue KubeException => err
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we merged the server version check, maybe we can use that?

Copy link
Contributor

@KnVerey KnVerey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a bunch of comments, but this is a great start. I'm really excited about this feature!


def exists?
# TPRs take time to become available.
_, _err, st = kubectl.run("get", @name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only sync methods should make API calls. I really should look into adding a test to enforce that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.

def predeploy_sequence
resources = DiscoverableResource.all.select(&:predeploy?)
identities = resources.map(&:identity)
PREDEPLOY_SEQUENCE + identities
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless/until we support more powerful ordering, it's probably safer to assume that the custom resources need to go first, and pods (which are at the end of the constant list) need to be very last. This was true for all our own CRs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed, thanks.

@@ -206,13 +223,15 @@ def validate_definitions(resources)

def discover_resources
resources = []
# Explicitly discovering will discard all cached resources.
DiscoverableResource.discover(context: @context, logger: @logger)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like this should be its own step in the "Initializing deploy" phase. It isn't resource discovery in the same sense as this method used to mean; this method should probably be renamed to something like "parse_templates" or "load_resource_from_file" or whatever, now that we have something that is "discovery" in the k8s api sense.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to separate method, renamed existing method as suggested.

discover_tpr(v1beta1_kubeclient(context))
begin
discover_crd(v1beta1_crd_kubeclient(context))
rescue KubeException => err
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should use a version check as Karan suggested. I also think that these discovery methods should be retried a couple times so we don't fail the deploy on server blips here.

end

def self.discover_crd(client)
return unless client.respond_to? :get_custom_resource_definitions
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are basically another version check, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.

end

def self.build(namespace:, context:, definition:, logger:)
return super if KubernetesDeploy.const_defined?(definition["kind"])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(irrelevant if we can make the instances direct subclasses of KubernetesResource, but...) Doesn't this end up covering the regular resources, i.e. is not temporary?

return super if KubernetesDeploy.const_defined?(definition["kind"])

# We only discover once per kubernetes-deploy invocation
discover(context: context, logger: logger) unless @resources
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should assume discovery is has already been done. It would be very heavy to have it actually happen here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doing discovery once now, removed the call from here.

STATUS_SUCCESS_ANNOTATION = 'kubernetes-deploy.shopify.io/status-success'
TIMEOUT_ANNOTATION = 'kubernetes-deploy.shopify.io/timeout'
PREDEPLOY_ANNOTATION = 'kubernetes-deploy.shopify.io/predeploy'
PRUNABLE_ANNOTATION = 'kubernetes-deploy.shopify.io/prunable'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are so many annotations here that I'm wondering if we should instead have a single one that contains JSON.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed timeout as it's been superseded by a different PR, collected the rest of the methods under metadata. I'm open to suggestions on the json field name.

@@ -0,0 +1,26 @@
# frozen_string_literal: true
module KubernetesDeploy
class GenericResource < KubernetesResource
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, KubernetesResource (the superclass) is already kinda "generic resource", no? Should we simply change the defaults in that class to what you have here? Regardless, please extract this change to a separate PR. 😄


def cleanup(*resources)
resources.each do |res|
_, err, st = kubectl.run("delete", res, "--all")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it impossible/hard to do this with kubeclient?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doing it with kubeclient now.

@stefanmb
Copy link
Contributor Author

stefanmb commented Jan 18, 2018

@KnVerey I've updated this branch with the latest approach, it's still a work in progress (+ need some more tests) so feel free to ignore it for now. I'll ping you for review once I feel it's ready.

Edit: That said, comments are always welcome, thank you.

@stefanmb stefanmb changed the title Runtime resource discovery (WIP) Runtime resource discovery Jan 18, 2018
@stefanmb stefanmb force-pushed the resource_discovery branch 2 times, most recently from 2a7b298 to c5f60de Compare January 23, 2018 19:23
@stefanmb stefanmb changed the title (WIP) Runtime resource discovery Runtime resource discovery Jan 24, 2018
@stefanmb stefanmb force-pushed the resource_discovery branch 4 times, most recently from fd740a0 to 1361834 Compare January 27, 2018 03:16
bin/test Outdated
if [[ ${CI:="0"} == "1" ]]; then
echo "--- :ruby: Bundle Install"
bundle install --jobs 4
bundle install --jobs 4 || exit 1
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tests should exit if bundler fails.

resources = DiscoverableResource.all + KubernetesResource.all
# Omit unqualified kinds (they are unsupported on this cluster, or discovery hasn't been performed yet)
resources.select do |res|
res.constants.include?(:GROUP) && res.constants.include?(:VERSION)
Copy link
Contributor Author

@stefanmb stefanmb Jan 27, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check is somewhat inelegant, but the GROUP and VERSION are now detected via discovery for most resources. If these fields are not present we can't query the API server, so we skip over them. This is not an issue if the user runs discovery first.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does it mean for the "user" to run discovery? i.e. would it be a programming error in the gem if it doesn't happen before this is called? If so, can we detect (e.g. by setting an ivar when we do it) whether or not it has been done and raise an exception if not?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does it mean for the "user" to run discovery?

It means invoking discover_resources at some point, normally before attempting to deploy.

I don't think we use k8s-deploy as a library anywhere, so I'm not sure what the normal entry point to it would be if used as a library.

If used normally as an executable discover_resources is executed as part of run in the deploy_task so this is a non-issue.

The other important part is that not all statically defined resources are supported on all k8s versions.

DEPLOY_METADATA_ANNOTATION = 'kubernetes-deploy.shopify.io/metadata'

def self.inherited(child_class)
DiscoverableResource.child_classes.add(child_class)
Copy link
Contributor Author

@stefanmb stefanmb Jan 27, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overriding allows us to distinguish dynamically created resources (discovery) versus statically defined resources (static subclass of KubernetesResource).

def self.discover_kinds(context)
kinds = {}

# At the top level there is the core group (everything below /api/v1),
Copy link
Contributor Author

@stefanmb stefanmb Jan 27, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately for historic reasons not all resources are available under /apis/$NAME/$VERSION, so we have to look in two places.

resources = json_response['resources']
resources.each do |res|
kind = res['kind']
next if kinds.key?(kind) # Respect the preferred version
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For each group we'll attempt to detect resources under the preferred group version, but some resources are only available under newer (beta/alpha) versions, so we have to check those too.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be worth mentioning in a comment. My first thought in reading through the above was "why don't we check only the preferred version?" even though I should have known the answer.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this behave correctly when a resource exists in multiple GV, but not the preferred one? batch would be an example of this situation: its preferred version in 1.8 is batch/v1, but CronJob only exists in v1beta1 and v2alpha1. We'd want to make sure to pick the beta.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this code iterates through each version of each group.

Preference is given to the preferred version, if something is not present in the preferred version then it will be added based on the first version that it is encountered in.

The order of the version list returned by the API server is oldest version to newest version, so the old(er) version will be found first.

e.g.

kubectl get --raw /apis/batch | jq .
{
  "kind": "APIGroup",
  "apiVersion": "v1",
  "name": "batch",
  "versions": [
    {
      "groupVersion": "batch/v1",
      "version": "v1"
    },
    {
      "groupVersion": "batch/v1beta1",
      "version": "v1beta1"
    }
  ],
  "preferredVersion": {
    "groupVersion": "batch/v1",
    "version": "v1"
  },
  "serverAddressByClientCIDRs": null
}

In this case CronJob will be encountered first in v1beta1 and associated with that version.

@@ -2,10 +2,13 @@
module KubernetesDeploy
class Bugsnag < KubernetesResource
TIMEOUT = 1.minute
PREDEPLOY = true
GROUP = 'stable.shopify.io'
VERSION = 'v1'
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We provide these values in code for statically defined classes. They can also be provided dynamically by annotations to CRDs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should delete this class, actually... we 🔥 this CR a while ago.

@@ -0,0 +1,29 @@
# frozen_string_literal: true
module KubernetesDeploy
class HorizontalPodAutoscaler < KubernetesResource
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The HPA was previously defined only as a string in the prune list, we now need an actual class backing it.

@@ -2,7 +2,9 @@
module KubernetesDeploy
class Pod < KubernetesResource
TIMEOUT = 10.minutes

PREDEPLOY = true
PREDEPLOY_DEPENDENCIES = %w(ResourceQuota ServiceAccount ConfigMap PersistentVolumeClaim)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extracted from the previous static predeploy dependency list. Note that this is a partial order (the previous list by definition was a total order).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about the fact that this can depend on custom resources, such as cloudsql/redis?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then we'll need some way to specify such custom dependencies in the pod spec.

Alternatively we could optionally specify something like PREDEPLOY_DEPENDENTS on resources, so for e.g. CloudSQL could specify that it is a dependency of Pod which is its dependent.

# frozen_string_literal: true
require 'test_helper'

class ResourceDiscoveryTest < KubernetesDeploy::IntegrationTest
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests affect global resources (CRD) so they are not parallel-friendly.

klass.predeploy_dependencies.each do |dep|
pos = order.map.with_index { |e, i| e == dep ? i : nil }.compact.first
# The current resource is deployed *after* its deps
assert_operator idx, :>, pos, "#{r} requires #{dep} but got #{order}"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not want to hardcode a particular total order here because that's not important. What's important is that the result is in any valid toplogical sort order, which may not be unique.

@stefanmb
Copy link
Contributor Author

@KnVerey Could I ask you to have another look? A lot has changed since the initial version. All tests are currently passing.

@stefanmb stefanmb mentioned this pull request Jan 29, 2018
Copy link
Contributor

@KnVerey KnVerey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only did one pass and didn't make it through the tests, but this is looking great! Can you let me know when you have time to sit down and walk me through some of it IRL? Some things I'd like to discuss:

  • jq vs jsonpath
  • timeouts/failures for dynamic resources
  • dynamic classes for core resources
  • the predeploy graph (I think I'm forgetting something about why we need to do this)
  • any performance implications

@prune_whitelist ||= _build_prune_whitelist
end

def _build_prune_whitelist
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be more conventional to make these methods private rather than prefixing to suggest they should be. Couldn't the prune_whitelist be private too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, was just following the previous examples in the file.

resources = DiscoverableResource.all + KubernetesResource.all
# Omit unqualified kinds (they are unsupported on this cluster, or discovery hasn't been performed yet)
resources.select do |res|
res.constants.include?(:GROUP) && res.constants.include?(:VERSION)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does it mean for the "user" to run discovery? i.e. would it be a programming error in the gem if it doesn't happen before this is called? If so, can we detect (e.g. by setting an ivar when we do it) whether or not it has been done and raise an exception if not?

@@ -217,14 +238,21 @@ def validate_definitions(resources)
end

def discover_resources
# (Lazily) rebuild these lists after discovery if they were present.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lazily likely means in the middle of the mutating part of the deploy. Since they can fail, I'm thinking it would be better to build them eagerly, during the "Initializing deploy" stage... I'd suggest making prune_whitelist a simple attr_reader and setting them explicitly here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

predeploy_sequence and prune_whitelist are local computations only without any API calls, they should never fail.

resources = json_response['resources']
resources.each do |res|
kind = res['kind']
next if kinds.key?(kind) # Respect the preferred version
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be worth mentioning in a comment. My first thought in reading through the above was "why don't we check only the preferred version?" even though I should have known the answer.

resources = json_response['resources']
resources.each do |res|
kind = res['kind']
next if kinds.key?(kind) # Respect the preferred version
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this behave correctly when a resource exists in multiple GV, but not the preferred one? batch would be an example of this situation: its preferred version in 1.8 is batch/v1, but CronJob only exists in v1beta1 and v2alpha1. We'd want to make sure to pick the beta.

@found = st.success?
end

def deploy_succeeded?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's actually a more sophisticated success condition available for this; check out waitCRDReady in cloudbuddies.

exists?
end

def deploy_failed?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might make sense to look at the NamesAccepted condition here (see waitCRDReady in cloudbuddies)

@@ -0,0 +1,29 @@
# frozen_string_literal: true
module KubernetesDeploy
class Job < KubernetesResource
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't insist on this, but it could be helpful if you extracted these new class additions into a separate PR. Making sure each one has appropriate checks and test coverage is not really related to this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a prerequisite PR? Without the new classes this PR won't work.

@@ -2,7 +2,9 @@
module KubernetesDeploy
class Pod < KubernetesResource
TIMEOUT = 10.minutes

PREDEPLOY = true
PREDEPLOY_DEPENDENCIES = %w(ResourceQuota ServiceAccount ConfigMap PersistentVolumeClaim)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about the fact that this can depend on custom resources, such as cloudsql/redis?


def self.discover(context:, logger:, server_version:)
logger.info("Discovering custom resources:")
with_retries { discover_groups(context) }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Am I reading correctly that this is solely for the purpose of determining which GV to use in the prune whitelist for the statically defined classes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe the chain of calls here is unclear. In order to discover groups (including for the statically defined classes) we have to discover everything based on what the API server returns, i.e. discover_groups requires discover_kinds which performs all discovery.

But yes, discover_groups itself is responsible for populating info in the static classes.

@airhorns
Copy link
Contributor

@stefanmb this would be so awesome to get in for all of us without all the *Buddies nowadays!!! <3<3

@dturn dturn self-assigned this Jul 18, 2018
@stefanmb stefanmb closed this Oct 30, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants