Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Claim portability using portable classes #743

Merged
merged 1 commit into from
Sep 10, 2019

Conversation

hasheddan
Copy link
Member

This PR updates the default class design doc to include the iterations that have taken place since the original implementation, as well as a proposal to re-establish claim portability using Policies as discussed in #703 and #723.

Checklist

I have:

  • Run make reviewable to ensure this PR is ready for review.
  • Ensured this PR contains a neat, self documenting set of commits.
  • Updated any relevant documentation, examples, or release notes.
  • Updated the RBAC permissions in clusterrole.yaml to include any new types.

@upbound-bot
Copy link
Collaborator

62% (+0.05%) vs master 62%

@upbound-bot
Copy link
Collaborator

62% (+0.03%) vs master 62%

Copy link
Member

@suskin suskin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neat! :coffee_parrot:

the first iteration of default classes allowed developers to have the option to
omit the class reference and rely on falling back to whatever operators deemed
an appropriate default. The default resource class was distinguished via the
`defaultForClaimKinds` field:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is the defaultForClaimKinds field? Since this says via the . . . field: and ends with a colon right before the example, I was expecting the example to have it, but it doesn't. Should this be updated to talk about the labeling thing that appears to be describing a default?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a typo that should be corrected! Thanks for pointing that out :)

With the implementation of [strongly-typed resource
classes](./one-pager-strongly-typed-class.md), the generic `ResourceClass`
became obsolete and the `Policy` kind was introduced. Each claim kind had a
corresponding policy kind (i.e. `MySQLInstancePolicy` for `MySQLInstance`,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: these and other uses of i.e. seem like examples to me, which means they should probably be e.g. instead of i.e.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Appreciate the attention to detail, will correct!

`ResourceClass` that specified
`mysqlinstance.database.crossplane.io/default:true`, there were now many
different class `kinds` that the claim could potentially reference (i.e. GCP
`CloudsqlInstanceClass`, AWS `RDSInstanceClass`, etc.). `Policies` were
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like that I feel like I'm reading a story right now 📖

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should probably change "there were now many" to just "there were many". I took a very different approach with this design doc in that I tried to preserve historical thinking because I believe it informs our current direction. I think the verdict is still out on if this is a good way to propose updated designs though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a strong believer in including this kind of context in design documents. I like to include it in a "background" section ahead of any goals or proposals, presuming that my audience has limited familiarity with the topic at hand. Even folks who know the project well might be rusty on on the deep technical subject matter, so it helps you to help them reach the same conclusion you have by telling a story.


Currently, each Crossplane resource kind (i.e. GKE Cluster, AWS S3 Bucket, etc.) has a controller that reconciles claims for that resource by binding them to the corresponding managed type. These controllers use [predicates](https://github.com/crossplaneio/crossplane/blob/master/pkg/resource/predicates.go) to ensure that there is a provisioner defined for the class referenced by the claim. If the claim contains no reference to a class, the controller will not act on the claim.
They would then create a `MySQLInstancePolicy` that referenced this class:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In most "real" use-cases, the administrator would most likely create a Namespace next, because they would probably want to create their things in a namespace which is not crossplane-system.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point! I will incorporate that thinking here.

claims. Previously, claims could reference a generic `ResourceClass` by `name`
and `namespace`, and could be moved across providers as long as there existed a
class with that `name`. Now, claims must omit a `classRef` and rely on the
existence of a `Policy` to achieve portability.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did we already define the term portable and its variants? I am realizing this would probably be good to define, in case any readers are not sure what we mean by it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe "portable" is defined anywhere in this doc, and I agree that it is important to do so.

[`Policy`](https://github.com/crossplaneio/crossplane-runtime/blob/e4d61ee2805af680baf16fc2a1d8f79538d0f9bb/apis/core/v1alpha1/resource.go#L93),
then bumping the dependency in core Crossplane.
1. Remove the `classRef` field from all claims entirely. This would require the
same steps as above but with
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems as though the motivation for removing classRef is to force claims to always be portable (in the sense that there will never be references to a strongly-typed Thing which is provider-specific). If this is the case, I would suggest that we also state that as one of the goals, instead of simply saying we want the level of portability that we had previously.

Here is the same thing as a question: for the stated goals, do we really need to remove the classRef field? What is the motivation for doing it this way instead of just adding the support for the policy ref?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had this same question, but @prasek had some good arguments as to why it should be removed. I will let him provide further context.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the motivation for doing it this way instead of just adding the support for the policy ref?

One consideration is that if we support both a policyRef and a classRef we support two ways to specify resource class a claim should use; directly or via a policyRef. This is not necessarily an argument for removing classRef (though my intuition is that we should), but if we did keep it we'd have to be able to clearly communicate why we support setting the classRef two ways, which way is preferred in which situations, and what happens if both are specified.

claims that have a `policyRef` to a `Policy` that has a `classRef` of that
kind's strongly-typed class. For instance, the `RDSInstance` claim controller
will only reconcile `MySQLInstance` claims that have `policyRef` to a
`MySQLInstancePolicy` that has a `classRef` to an `RDSInstanceClass`. This
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we imagine the mapping of RDSInstance to RDSInstanceClass as a generated/dynamic mapping?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://godoc.org/github.com/crossplaneio/crossplane-runtime/pkg/resource#NewClaimReconciler

This mapping is currently specified when instantiating each resource claim controller's generic reconciler. I imagine under this proposal we'd also have to specify which which policy kind the reconciler should use. We could do so using the same pattern, but might want to rethink it since the NewClaimReconciler signature is getting fairly long.

[`ResourceClaimSpec`](https://github.com/crossplaneio/crossplane-runtime/blob/e4d61ee2805af680baf16fc2a1d8f79538d0f9bb/apis/core/v1alpha1/resource.go#L49).
1. Alter all claim controllers (which live in provider stacks) to only reconcile
claims that have a `policyRef` to a `Policy` that has a `classRef` of that
kind's strongly-typed class. For instance, the `RDSInstance` claim controller
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are the performance considerations to this approach, if any? I am having trouble imagining an efficient way to do this lookup without having every controller inspect every claim and its policy.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be totally honest, I am not quite sure of potential performance implications. However, I imagine a simple object reference lookup would be relatively inexpensive. This would obviously grow with number of claim reconcilers introduced though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As @hasheddan implies, we instantiate a resource claim controller for each (resource claim, managed resource) tuple. So there's a resource claim controller for (MySQLInstance, RDSInstance), (KubernetesCluster, GKECluster), etc. This allows resource claim controllers to live in cloud provider stacks, making it easy for new cloud providers to add managed resources that satisfy the set of portable resource claims defined by Crossplane core.

Before we introduced strongly typed resource classes the generic ResourceClass kind contained a .provisioner string that would specify the kind of managed resource that should be used to satisfy any claim referencing the class. This field read like provisioner: rdsinstance.database.aws.crossplane.io/v1alpha1. This meant that the resource claim controllers determined which claims they should operate on by:

  1. Watching for changes to all claims of the kind they were concerned with.
  2. Using a watch predicate function to filter out claims that did not reference a ResourceClass whose provisioner mapped to the managed resource kind they were concerned with.

The predicate function mentioned in step 2 of the above process used a controller-runtime client.Client to Get the referenced ResourceClass. So every time a resource claim changed in the Kubernetes API, one controller for each managed resource kind that could possibly satisfy that claim would Get its ResourceClass to check whether they should. This sounds pretty bad, but fortunately the default client.Client instance is backed by a cache of API objects so we were not likely to make an actual API call for each controller. We managed to cut out this client.Client when we switched to strongly typed resource classes, instead using the HasClassReferenceKind predicate.

I imagine we'd return to a similar predicate under this proposal, only we'd Get and inspect the relevant policy instead. For example the (MySQLInstance, RDSInstance) resource claim controller would:

  1. Watch for changes to all MySQLInstance resource claims.
  2. Use a watch predicate to filter out claims that did not reference a MySQLInstancePolicy whose classRef was of kind: RDSInstanceClass. This would involve a Get, but frequently hit the cache as described above.


This feature is similar to the original default class model in that it uses
labels to specify which class of service to use as default when a `policyRef` is
omitted. For each `namespace`, there must be only one policy that has the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there must be only one policy

How do we expect to enforce this? Is there a way to enforce this at the time the policy is created?

Related: what is the reason for this restriction? What if I want to round-robin my created resources between multiple configurations? This is sounding more like a Best Practice than a Hard Requirement to me.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now this requirement is because we would have no way of knowing which policy to default to if multiple were defined as default. I definitely see a more intelligent and varied defaulting pattern in the future. The first iteration could involve a weight label or something of that nature where the highest weighted policy in the namespaces could be applied. However, I am not sure that this is in scope for this iteration. Would love to hear other thoughts on this.

cc @prasek @negz

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only way I'm aware to enforce this would be to use an admission webhook (or rather a series thereof) that rejected new policies that were labelled default for any namespace that already contained the same kind of policy labelled such. Crossplane currently has no webhooks, and likely won't for 0.3, so at least at first we'd probably just have to make it clear that the default policy is undefined when multiple policies are labelled default.

resource classes in the same `Policy` and picking one based on specific
parameters defined in the claim.

It is also likely that strongly-typed claims could be introduced at the provider
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

strongly-typed claims could be introduced

How are we imagining this working? Is this talking about using something like bringing back the classRef field in a claim, or is it talking about something like a new claim type which only works with a single provider?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The latter. I think it is definitely possible that we will want non-portable resources to still be able to be dynamically provisioned.

Copy link
Member

@negz negz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This proposal seems sound to me. Thanks @hasheddan and @prasek for working through this!

@@ -36,10 +55,12 @@ providerRef:
reclaimPolicy: Delete
```

This class would likely be created by an operator as a type of database that developers may deploy as part of their application. Currently, for a developer to deploy an RDS instance on AWS, they would have to explicitly reference it:
This class would likely be created by an operator as a type of database that
developers may deploy as part of their application. Currently, for a developer
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
developers may deploy as part of their application. Currently, for a developer
developers may deploy as part of their application. Originally, for a developer

`ResourceClass` that specified
`mysqlinstance.database.crossplane.io/default:true`, there were now many
different class `kinds` that the claim could potentially reference (i.e. GCP
`CloudsqlInstanceClass`, AWS `RDSInstanceClass`, etc.). `Policies` were
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a strong believer in including this kind of context in design documents. I like to include it in a "background" section ahead of any goals or proposals, presuming that my audience has limited familiarity with the topic at hand. Even folks who know the project well might be rusty on on the deep technical subject matter, so it helps you to help them reach the same conclusion you have by telling a story.

classes](./one-pager-strongly-typed-class.md), the generic `ResourceClass`
became obsolete and the `Policy` kind was introduced. Each claim kind had a
corresponding policy kind (i.e. `MySQLInstancePolicy` for `MySQLInstance`,
etc.). Claims controllers could no longer simply omit a `classRef` because their
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
etc.). Claims controllers could no longer simply omit a `classRef` because their
etc.). Claims could no longer simply omit a `classRef` because their

```

And a resource claim that defaulted to the resource class above could look like this in the general case:
To continue to provide the same level of portability for claims that was
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything from the Proposal: Default Class Reference v2 & Claim Portability heading to this point of the document is more continued background than a proposal. I suggest moving it under some kind of "background" or "problem" heading.

The `Policy` method continues to enable default class references, but
strongly-typed resource classes introduce a reduction in portability of resource
claims. Previously, claims could reference a generic `ResourceClass` by `name`
and `namespace`, and could be moved across providers as long as there existed a
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Moved across providers" reads a little like a satisfied claim could be moved from one provider to another, suggesting that perhaps Crossplane could move your MySQLInstance from AWS to Azure which it sadly cannot.

Perhaps "could be satisfied by a compatible managed resource regardless of cloud provider"?

[`Policy`](https://github.com/crossplaneio/crossplane-runtime/blob/e4d61ee2805af680baf16fc2a1d8f79538d0f9bb/apis/core/v1alpha1/resource.go#L93),
then bumping the dependency in core Crossplane.
1. Remove the `classRef` field from all claims entirely. This would require the
same steps as above but with
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the motivation for doing it this way instead of just adding the support for the policy ref?

One consideration is that if we support both a policyRef and a classRef we support two ways to specify resource class a claim should use; directly or via a policyRef. This is not necessarily an argument for removing classRef (though my intuition is that we should), but if we did keep it we'd have to be able to clearly communicate why we support setting the classRef two ways, which way is preferred in which situations, and what happens if both are specified.

[`ResourceClaimSpec`](https://github.com/crossplaneio/crossplane-runtime/blob/e4d61ee2805af680baf16fc2a1d8f79538d0f9bb/apis/core/v1alpha1/resource.go#L49).
1. Alter all claim controllers (which live in provider stacks) to only reconcile
claims that have a `policyRef` to a `Policy` that has a `classRef` of that
kind's strongly-typed class. For instance, the `RDSInstance` claim controller
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As @hasheddan implies, we instantiate a resource claim controller for each (resource claim, managed resource) tuple. So there's a resource claim controller for (MySQLInstance, RDSInstance), (KubernetesCluster, GKECluster), etc. This allows resource claim controllers to live in cloud provider stacks, making it easy for new cloud providers to add managed resources that satisfy the set of portable resource claims defined by Crossplane core.

Before we introduced strongly typed resource classes the generic ResourceClass kind contained a .provisioner string that would specify the kind of managed resource that should be used to satisfy any claim referencing the class. This field read like provisioner: rdsinstance.database.aws.crossplane.io/v1alpha1. This meant that the resource claim controllers determined which claims they should operate on by:

  1. Watching for changes to all claims of the kind they were concerned with.
  2. Using a watch predicate function to filter out claims that did not reference a ResourceClass whose provisioner mapped to the managed resource kind they were concerned with.

The predicate function mentioned in step 2 of the above process used a controller-runtime client.Client to Get the referenced ResourceClass. So every time a resource claim changed in the Kubernetes API, one controller for each managed resource kind that could possibly satisfy that claim would Get its ResourceClass to check whether they should. This sounds pretty bad, but fortunately the default client.Client instance is backed by a cache of API objects so we were not likely to make an actual API call for each controller. We managed to cut out this client.Client when we switched to strongly typed resource classes, instead using the HasClassReferenceKind predicate.

I imagine we'd return to a similar predicate under this proposal, only we'd Get and inspect the relevant policy instead. For example the (MySQLInstance, RDSInstance) resource claim controller would:

  1. Watch for changes to all MySQLInstance resource claims.
  2. Use a watch predicate to filter out claims that did not reference a MySQLInstancePolicy whose classRef was of kind: RDSInstanceClass. This would involve a Get, but frequently hit the cache as described above.

claims that have a `policyRef` to a `Policy` that has a `classRef` of that
kind's strongly-typed class. For instance, the `RDSInstance` claim controller
will only reconcile `MySQLInstance` claims that have `policyRef` to a
`MySQLInstancePolicy` that has a `classRef` to an `RDSInstanceClass`. This
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://godoc.org/github.com/crossplaneio/crossplane-runtime/pkg/resource#NewClaimReconciler

This mapping is currently specified when instantiating each resource claim controller's generic reconciler. I imagine under this proposal we'd also have to specify which which policy kind the reconciler should use. We could do so using the same pattern, but might want to rethink it since the NewClaimReconciler signature is getting fairly long.

as well as the actual logic of the [shared claim
reconciler](https://github.com/crossplaneio/crossplane-runtime/blob/e4d61ee2805af680baf16fc2a1d8f79538d0f9bb/pkg/resource/claim_reconciler.go#L281),
both in `crossplane-runtime`. It should require minimal, if any, updates to
the actual claim controllers in each of the provider stacks.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the claim controllers will need to be updated to tell their generic claim reconcilers which policy kind they should use, but this should be a fairly mechanical, one line change for each controller.


This feature is similar to the original default class model in that it uses
labels to specify which class of service to use as default when a `policyRef` is
omitted. For each `namespace`, there must be only one policy that has the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only way I'm aware to enforce this would be to use an admission webhook (or rather a series thereof) that rejected new policies that were labelled default for any namespace that already contained the same kind of policy labelled such. Crossplane currently has no webhooks, and likely won't for 0.3, so at least at first we'd probably just have to make it clear that the default policy is undefined when multiple policies are labelled default.

@upbound-bot
Copy link
Collaborator

62% (-0.4%) vs master 62%

Copy link
Member

@negz negz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a solid direction to me. I'm approving, but please ensure all outstanding comments are address where relevant. We should follow up on @bassam's thoughts about possibly renaming the Policy type before merging.

@prasek
Copy link
Member

prasek commented Sep 6, 2019

LGTM.

Was talking with @bassam about naming and his concern was that Policies seemed to be playing the role that Resource Classes used to play before strongly-typed classes were introduced, and suggested that Policies should be called Classes.

If you consider that the Generic ResourceClass played 2 roles: (a) provided a named class of service and could be marked as the default class of service, and (b) provided a specTemplate that was moving from generic to strongly typed templates, then should we move from claim->policy->class to claim->class->template.

The claim->class->template naming would let us say “default resource classes” once again, and it feels natural to say a cluster admin would create a namespace for a team or project and populate it with the available classes of service that could be used in that namespace.

Having a future intelligent class-of-service could make sense as well and the claim->class->template names would likely map more closely to what they represent.

...
policyRef:
name: standard-mysql
namespace: crossplane-system
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

currently in master, policies need to live in the same namespace as the claim. Is this a new ability in this PR as well, that policies can live in a different namespace from a claim that references them?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that policies can be referenced by the claim, they can reference a policy that exists in a different namespace (just like they can currently reference a class in a different namespace). If a policyRef was omitted, they would only be able to fall back on the policy specified as default in their namespace though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this thread might be out of date now that we're planning to use portable classes rather than policies.

Given defaulting behaviour will be namespace scoped (i.e. we'll look for a portable class with label default="true" in the namespace of the claim) I suggest we consider requiring that portable classes live in the same namespace as the claims that reference them; i.e. we use a corev1.LocalObjectReference from claim to (portable) class.

As long as we use a corev1.ObjectReference from the portable class to its underlying non-portable class this still gives us flexibility on which namespace the non-portable class (and thus managed resource) are created in.

Consider the following flow:

  1. Infrastructure operator creates infra namespace (e.g. prod).
  2. Infrastructure operator creates a provider and some non-portable classes (e.g. RDSInstanceClass) in said namespace.
  3. Infrastructure operator "publishes" the aforementioned non-portable classes to the app operator's namespace by creating portable some classes (e.g. MySQLInstanceClass) that reference the non-portable classes created in step 2, and labelling one of each kind as the default for said app namespace.
  4. App operator lists the portable classes that are available in their app namespace, picks one, and creates a resource claim that references it.

@jbw976
Copy link
Member

jbw976 commented Sep 6, 2019

@prasek @hasheddan I'm struggling a bit with the suggested claim -> class -> template naming, I think mostly because I liked the idea of Policies having more useful functionality in the future, such as ranges and constraints to help matching a class to a claim, and others.

if we went with claim -> class -> template, do you foresee us introducing the concept of policies again in the future when we'd actually have useful functionality to put in them? Or would that type of functionality fit OK on these "classes", instead of creating a new policy type of object?

@hasheddan
Copy link
Member Author

@jbw976 it seems like it would make sense to have ranges and constraints in a "class". The naming is indicating that they represent a "class of service", and ranges and constraints would still make it a class of service IMO, just a more flexible / intelligent one. It is also important to note that this naming change would require significant updates and refactoring of docs.

@prasek
Copy link
Member

prasek commented Sep 6, 2019

+1 for intelligent classes of service in the future.

@bassam
Copy link
Member

bassam commented Sep 6, 2019

Stepping back a bit, I think we are trying to accomplish the following:

  • Claims need to reference a Class for a provisioning/binding to happen
  • Claims without a classRef should use a DefaultClass within a scope (namespace to start with)
  • Claims should reference compatible Classes
  • Portable Claims should not reference non-portable Classes

This solution described here (and also as a result of supporting strongly typed classes and defaulting) solves this by introducing a Policy construct that plays two roles:

  1. Defines a default classRef for claims in a given scope (namespace)
  2. Acts as a more portable indirection to a class.

I think doing both of these in a Policy object can be confusing. I would recommend that we separate the two. Specifically:

  1. Rename Policy to DefaultClass and let it handle the defaulting of classRef within a scope only.
  2. Introduce a portable Class that is similar to the concept of Class today but points at another classRef that is non-portable.

@suskin
Copy link
Member

suskin commented Sep 6, 2019

Policy construct that plays two roles:

  1. Defines a default classRef for claims in a given scope (namespace)
  2. Acts as a more portable indirection to a class.

I would suggest that a third role is also being discussed:

  1. Match a claim with a class based on desired capabilities or characteristics of the class, in a portable way (for example, available memory).

I believe this is why people have been talking about "default" classes as more like policies; defaulting is just an edge case of matching. The edge case is: how does matching work if no capabilities were specified?

Because of this, I would also suggest that "match" or "resolve" could be a clearer term than "reference". When we say:

Claims need to reference a Class for a provisioning/binding to happen

it would probably be clearer to say that Claims need to match with (or resolve to) a Class for provisioning/binding to happen. The matched Class can be explicitly specified on the claim. Or, the desired Class can be described by listing capabilities, and a matchmaker (the Policy, using current terms) can match the claim with an actual class.

@bassam
Copy link
Member

bassam commented Sep 7, 2019

I've been thinking of the claim to class matching as a "scheduling" problem. Specifically you might need to match multiple claims and workloads all together to a specific set of classes.

Also seeing some of the slack discussion I wanted to clarify one thing in my proposal. I’m not suggesting that we re-add ResourceClass or even more generally a Class. Instead, I'm suggested that we continue with the strongly typed approach and create a portable class definition for each claim. So, add MySqlClass for the MySqlClaim. This is a consistent portable experience — every claim type point to exactly one class type. It might also enable us to go back to simple classNames instead of complex ObjectReferences classRef

@hasheddan hasheddan changed the title Claim portability using Policies Claim portability using portable classes Sep 9, 2019
@upbound-bot
Copy link
Collaborator

62% (-0.88%) vs master 63%

@negz
Copy link
Member

negz commented Sep 9, 2019

"match" or "resolve" could be a clearer term than "reference".

@suskin The use of the term reference is due to Kubernetes API conventions. One object (the claim) specifies another (the class) via a reference. API conventions state that references should be named fooRef or fooName.

The idea of a claim "matching" (rather than referencing) a class would imply using a label selector, which typically implies the matcher can match with one or more matchees. This may be appropriate for the claim -> class relationship, but we'd need to determine what happens when a claim matched more than one class.

Perhaps it is worth diverging from Kubernetes conventions and terminology in this case, but as software within the Kubernetes ecosystem we should make that predictability tradeoff carefully.

@negz
Copy link
Member

negz commented Sep 9, 2019

I've thought about this design over the weekend, and broadly agree with @bassam's proposal. I wrote up a use case from a perspective I understand well to help work through this:

As an infrastructure administrator I support various teams of app operators. I want to empower these app operators to make requests for infrastructure in a fashion that is portable across clouds. App operators request infrastructure using resource claims. The portable nature of these resource claims means they cannot be high fidelity representations of the external resources they represent; they suffer the “lowest common denominator” problem.

Resource classes are the solution to the lowest common denominator problem. They allow me to provide a set of configuration parameters, or managed resource "shapes", that app operators authoring resource claims may choose from. I’d like most app operators to choose a class like “mysql-large” or “kubernetes-prod” without concerning themselves with which managed resource kind or even which cloud will be used to satisfy their claim.

I expect my some of my app operators will have use cases that I have not considered. For example, I may not have predicted that one of my app operators has a need for very small SSD backed PostgreSQL databases. Because app operators can’t use portable resource claims to configure parameters like storage type and disk size, and because I don’t want my infrastructure team to block the creation of new resources, it’s important that my app operators be allowed to author and select their own resource classes.

While I want to enable my app operators to author their own resource classes, I also want to provide some constraints as to what they can choose. I’m concerned about VM cost so I don’t want to allow any Kubernetes cluster to have more than 20 nodes. My organisation gets deep discounts on Azure, so I don’t want to allow any resources to use GCP or AWS without prior discussion.

I like the terms "portable" and "non-portable" for user facing concerns, but I'll use the terms "resource claim aligned" (e.g. MySQLInstanceClass) and "managed resource aligned" (e.g. RDSInstanceClass) in this post because I think they're a little more explicit in this context.

I believe we should today:

  • Introduce resource claim aligned classes such as MySQLInstanceClass as a simple "pointer" type to managed resource aligned classes such as RDSInstanceClass.
  • Limit resource claims to referencing resource claim aligned classes. That is, a MySQLInstance classRef omits type metadata and inherently references a MySQLInstanceClass. A MySQLInstance author who wishes to reference an RDSInstance must reference (and possibly author) a MySQLInstanceClass that in turn points to their desired RDSInstanceClass. This limitation simplifies both our user's mental model and our codebase logic at the expense of a small amount of extra YAML.
  • Use a simple label, i.e. default="true" to specify the default resource class for any claim that omits a class reference. These default classes would always be resource claim aligned. Both classes and claims are namespaced, so this implies resource classes would be configured in the "app" namespace, not the "infrastructure" namespace where managed resources are created.

The idea behind the contemporary resource claim aligned "policy" kinds was that they might, in addition to specifying a default class, be used to constrain the parameters a resource claim could specify (via a class) in future. For example a MySQLInstancePolicy might have said "use the cheap RDSInstanceClass by default, and don't let anyone select a CloudSQLInstanceClass over 100GB". Conceptually I see "policy" as something imposed on app operators (e.g. resource claim authors) by infrastructure operators to constrain what they can do, so on reflection I don't think it makes sense for an app operator to be able to choose which policy applies to their resource claims. I think we should consider introducing managed resource aligned "policy" kinds in future in order to constrain the kinds and shapes of managed resources app operators can deploy. These policies would be specified in the namespace of the managed resources they apply to, not the namespace where claims are made.

Putting this all together looks something like the following YAML.

This first block pertains exclusively to infrastructure. It defines a namespace that will logically group production infrastructure managed resources, and some policies that restrict the shape of those resources. In organisations with a distinct team of infrastructure operators this block will almost certainly be authored by infrastructure operators:

---
# A namespace that will logically group "production" managed resources.
apiVersion: v1
kind: Namespace
metadata:
  name: production
---
# A GCP Provider with service account secret reference
apiVersion: gcp.crossplane.io/v1alpha1
kind: Provider
metadata:
  namespace: production
  name: coolorg
spec:
  credentialsSecretRef:
    name: coolorg-gcp
    key: credentials.json
  projectID: coolorg-prod
---
# A policy that limits CloudSQL instances to 100GB of storage in production.
apiVersion: v1alpha1
kind: CloudSQLInstancePolicy
metadata:
  namespace: production
  name: maximum-storage-size
constraints:
  storageSizeGB:
    max: 100
---
# A policy that prevents the creation of RDS instances in production.
apiVersion: v1alpha1
kind: RDSInstancePolicy
metadata:
  namespace: production
  name: disallow-rds
enabled: false

This second block pertains to an application. It creates a namespace to logically group the application's Kubernetes resources, and a claim and classes used to provision infrastructure. Whether the classes are authored by the app operators, the infrastructure operators, or both, depend on how liberal the organisation is with regard to allowing app operators to choose their own configuration.

---
# A namespace that will logically group the "coolapp" application.
apiVersion: v1
kind: Namespace
metadata:
  name: coolapp
---
apiVersion: database.gcp.crossplane.io/v1alpha1
kind: CloudSQLInstanceClass
metadata:
  namespace: coolapp
  name: coolapp-production
providerRef:
  # This reference, not the classRef, determines which namespace the managed
  # resource will be created in.
  namespace: production
  name: coolorg
specTemplate:
  storageSizeGB: 80
  # Pretend all the other required fields are set here...
---
apiVersion: database.crossplane.io/v1alpha1
kind: MySQLInstanceClass
metadata:
  namespace: coolapp
  name: coolapp-production
  labels:
    # This class is the default for any MySQLInstance that omits a classRef
    default: "true"
underlyingClassRef:
  # This reference is implicitly to the named kind in the same namespace.
  apiVersion: database.gcp.crossplane.io/v1alpha1
  kind: CloudSQLInstanceClass
  name: coolapp-production
---
apiVersion: database.crossplane.io/v1alpha1
kind: MySQLInstance
metadata:
  namespace: coolapp
  name: coolapp-production-a
spec:
  # This reference is implicitly to a MySQLInstanceClass in the same namespace.
  classRef:
    name: coolapp-production
  engineVersion: MYSQL_5_7

@jbw976
Copy link
Member

jbw976 commented Sep 9, 2019

Solid write-up @negz, thanks for adding the detailed use case as well to keep this grounded.

I assume that the policy and constraints aspects of this design is not for v0.3. Is the key part of this design that is in scope for v0.3 simply the MySQLInstance -> MySQLInstanceClass -> Strongly typed class, as well as supporting a simple default: "true" label on portable classes (e.g. MySQLInstanceClass)?

@hasheddan
Copy link
Member Author

I believe the document is currently aligned with @negz write-up. To address one comment:

Limit resource claims to referencing resource claim aligned classes.

This kind of restriction is implemented at the claim reconciler logic level. In the current proposal in this PR, I have proposed a flexible model with the shared claim reconciler that allows for claim controllers to either limit claims to a portable class instance (i.e. "claim-aligned class") or allow for references to strongly typed class instances (i.e. "managed-aligned class") directly. Referencing strongly-typed classes directly would not be employed in any stack at the moment, but it leaves the door open for strongly-typed claims in the future. Any thoughts on this model @jbw976 @negz ?

@negz
Copy link
Member

negz commented Sep 9, 2019

I assume that the policy and constraints aspects of this design is not for v0.3.

Correct. I'm definitely not suggesting we implement policy and constraint support for 0.3.

Is the key part of this design that is in scope for v0.3 simply the MySQLInstance -> MySQLInstanceClass -> Strongly typed class, as well as supporting a simple default: "true" label on portable classes (e.g. MySQLInstanceClass)?

It is. I wrote my comment without reading the latest update of the one pager, but I see it sounds like I'm in agreement with what is currently proposed.

I think there are two ideas to track that do not need to be in scope for 0.3:

  • Creating managed resources in the namespace of the provider, not the namespace of the non-portable, managed resource aligned, class.
  • Introducing policy or constraint support.

I'll raise separate issues to track these.

@negz
Copy link
Member

negz commented Sep 9, 2019

Any thoughts on this model @jbw976 @negz ?

@hasheddan I would strongly prefer to simplify the model such that a claim can only reference a portable class. I'll add comments to the doc to this effect.

* 2.0
* Updating document to reflect the current method of defaulting using
`Policies`
* Introducing `Policies` as a class of service definition
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this now be portable classes?

- Allow operators and administrators the opportunity to provide a well defined, sane default class of commonly used resources that developers commonly submit claims for within a team or organization
- Minimize the burden of determining acceptable resource claims to submit for approval to an operations team. The ability to fall back on the underlying resource that has been deemed acceptable reduces unnecessary workflow stoppages and side channel communication
- Provide an *optional* feature that does not necessarily have to be implemented within a team or organization
* Allow for claim portability by using policies, much how the generic
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another reference to policies; should this be updated?

to define a "class of service" for a given claim `kind` by referencing a
strongly-typed resource class instance. However, in the future, these portable
classes will likely be expanded to define ranges and constraints for portable
claim kinds that reference them.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After some reflection I personally don't think portable classes (or non-portable classes) are the right place to define ranges and constraints, so I suggest we think of portable classes as simply a pointer to a non-portable class.

In organisations with distinct infrastructure operator and application operators I expect infrastructure operators would be placing constraints on the parameters application operators could choose. I think this should be done separately from resource classes because I think it should be possible for an application operator to author their own resource classes, but still be subject to the constraints configured by the infrastructure operator. I discuss this in a little more detail in #743 (comment).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would tend to agree. However, if others have differing thoughts, I don't think the work designated for v0.3 has to make this decision.

In order to reintroduce the ability to select a class of service in claim, the
following steps can be taken:

1. Change all `Policy` types to `Class` types and include a `classRef` field
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potentially bikeshedding, but I wonder whether this should be underlyingClassRef, nonPortableClassRef, or something similar, to clarify that it must point to a non portable class rather than another portable class.

reconciler](https://github.com/crossplaneio/crossplane-runtime/blob/e4d61ee2805af680baf16fc2a1d8f79538d0f9bb/pkg/resource/claim_reconciler.go#L281)
in `crossplane-runtime`. It should require minimal updates to the actual
claim controllers in each of the provider stacks in order to indicate the
portable class `kind` that they should use (example below).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest that we avoid designing for the non-portable (aka strongly typed) resource claim scenario for now; it may make sense to write a separate generic resource claim reconciler to handle non-portable resource claims instead of introducing complexity to the existing reconciler.

I believe it would make both our code and our user's mental model of how Crossplane resources plumb together simpler if we required that portable claims reference only portable classes, even in cases where a user explicitly wanted to reference a non-portable class.

resource.ClaimKind(databasev1alpha1.MySQLInstanceGroupVersionKind),
resource.PortableClassKind(databasev1alpha1.MySQLInstanceClassGroupVersionKind),
resource.ClassKind(v1alpha1.RDSInstanceClassGroupVersionKind),
resource.ManagedKind(v1alpha1.RDSInstanceGroupVersionKind),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could introduce a container type to keep the signature of NewClaimReconciler a little more manageable. For example:

type ClassKinds struct {
    Portable    ClassKind
    NonPortable ClassKind
}

As a side note, writing out "nonportable" feels kind of gross in code. I wish we had a better antonym for portable.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this container pattern. How would you feel about Strong? I don't love it but it is succinct and descriptive.

...
policyRef:
name: standard-mysql
namespace: crossplane-system
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this thread might be out of date now that we're planning to use portable classes rather than policies.

Given defaulting behaviour will be namespace scoped (i.e. we'll look for a portable class with label default="true" in the namespace of the claim) I suggest we consider requiring that portable classes live in the same namespace as the claims that reference them; i.e. we use a corev1.LocalObjectReference from claim to (portable) class.

As long as we use a corev1.ObjectReference from the portable class to its underlying non-portable class this still gives us flexibility on which namespace the non-portable class (and thus managed resource) are created in.

Consider the following flow:

  1. Infrastructure operator creates infra namespace (e.g. prod).
  2. Infrastructure operator creates a provider and some non-portable classes (e.g. RDSInstanceClass) in said namespace.
  3. Infrastructure operator "publishes" the aforementioned non-portable classes to the app operator's namespace by creating portable some classes (e.g. MySQLInstanceClass) that reference the non-portable classes created in step 2, and labelling one of each kind as the default for said app namespace.
  4. App operator lists the portable classes that are available in their app namespace, picks one, and creates a resource claim that references it.

@upbound-bot
Copy link
Collaborator

62% (-0.92%) vs master 63%

@upbound-bot
Copy link
Collaborator

62% (-0.92%) vs master 63%

Copy link
Member

@prasek prasek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks for the quick turn on this!

- Provide an *optional* feature that does not necessarily have to be implemented within a team or organization
* Allow for claim portability by using portable classes, much how the generic
`ResourceClass` operated
* Enable claims with no reference
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... with no class reference

```

And a resource claim that defaulted to the resource class above could look like this in the general case:
A claim referencing a portable class will now look as follows:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really like this, much simpler for the consumer.

```

Or in the more specific case:
With an arbitrary number of potential portable `MySQLInstanceClass` objects that
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line makes it seem like the claim above could map to an arbitrary number of potential portable class object, when it's classRef.name is standard-mysql so it will only map to a single portable class object in a given environment.

Suggest rewording to something like this: "A portable class can be referenced by name, and the portable class definition can be customized to use the cloud services a given application or team wants to use, e.g. RDS in one environment and CloudSQL in another."

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated in most recent commit. I didn't use the exact wording but I think I communicated the same sentiment.

namespace: crossplane-system
```

### Denote Default via Label
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really like the use of this simple label to mark the default.

@upbound-bot
Copy link
Collaborator

62% (-0.92%) vs master 63%

Copy link
Member

@bassam bassam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking great. Thanks for the collaborative effort on this one. Small question remaining for me above.

spec:
...
classRef:
name: standard-mysql
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this become className: instead of classRef: name:

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, this may be tricky. It would involve changing the embedded ResourceClaimSpec in crossplane-runtime which embeds corev1.ObjectReference. We are going to be changing this to corev1.LocalObjectReference, which only has a name field. If we were to make this inline, the field would just be called name, not className. I am sure we could do some trickery to have a string field called className then create a LocalObjectReference from it, but the Spec would be less familiar to Kubernetes users and we would not be following common practice.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it should block the design being merged, but wold be great to avoid naming the group/version given the 1:1 mapping between claim and class in this design.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yes, it will only accept name as it will now be LocalObjectReference

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Coming in late here, but either className or classRef using a LocalObjectReference (with only a name field) seem to be blessed by the Kubernetes API conventions. I think LocalObjectReference is a little more consistent with the rest of our API (we don't currently have any fooName fields), but we would benefit from a general API overhaul in this area; we have plenty of LocalObjectReference types that could be simple fooName fields, and I believe we may have a few ObjectReference fields that could be LocalObjectReference.

Signed-off-by: hasheddan <georgedanielmangum@gmail.com>
@upbound-bot
Copy link
Collaborator

62% (-0.89%) vs master 63%

@jbw976 jbw976 merged commit c12d116 into crossplane:master Sep 10, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants