Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Design sketch: Crossplane as a Kubernetes native OAM runtime #1132

Closed
wants to merge 1 commit into from

Conversation

negz
Copy link
Member

@negz negz commented Dec 21, 2019

This PR introduces a design that proposes how Crossplane might become an Open Application Model runtime - a project capable of enacting OAM configuration. The design is currently an early sketch, and is the result of discussions with @bassam and @prasek. It proposes Crossplane support a variant of the OAM specification that maintains OAM's core concepts while aligning them more closely with Crossplane's "Kubernetes (API) native" approach to application and infrastructure orchestration.

namespace: default
name: coolest-app-n3v3y
spec:
replicaCount: 3
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Traits should be applied into the runtime at installation/upgrade time. Traits should not be checked "lazily", but should be checked when the trait-holding components are created.

Traits are applied in order so that should some set of traits express dependency (e.g. ingress must be set up before SSL), this can be resolved by setting an explicit ordering.

A deployment SHOULD NOT be marked complete until all trait configurations are completed. For example, if a web server component is deployed with an autoscaler trait, the web server should not be considered "running" until (a) the web server itself is running, as determined by health checks, and (b) the autoscaler trait is running, as determined by the underlying platform.

Each ContainerizedWorkload resource would need to maintain references to the traits that should apply to it in order for the controller that processes it to satisfy the above requirements of the contemporary OAM spec. Perhaps this would take the form of a .spec.traitRefs array, or perhaps we could model this using owner references (from trait to the component they apply to).

https://github.com/oam-dev/spec/blob/master/5.traits.md#traits-system-rules-and-characteristics

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, the deployment of traits only happens in AppConfig:

kind: AppConfig
metadata: ...
spec:
  components:
  - componentName: ContainerizedWorkload
    instanceName: coolest-app-f92dm
    parameterValues: ...
    traits:
    - type: ManualScalerTrait
      properties:
        replicaCount: 3

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're suggesting that an ApplicationConfiguration effectively be a template for instances of components and their traits - a little like how KubernetesApplication is a template for arbitrary Kubernetes resources (Deployment, etc). So under this pattern the OAM runtime (i.e. a Kubernetes controller) watching for ApplicationConfiguration would not actually deploy anything - instead it would create instances of each of the templated custom resource (in this case a ContainerizedWorkload, a ManualScalerTrait, and a RedisCluster. This would allow separate controllers to reconcile each of these custom resources. One advantage of this pattern is that it allows Crossplane's current resource claim controllers to work with OAM without a layer of abstraction (e.g. a RedisCluster workload type). What do you think?

classSelector:
matchLabels:
environment: production
region: europe
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally this classSelector would relate to/be derived from the hypothetical SchedulingScope. Doing so would require either:

  • The referenced SchedulingScope be used to modify the templated RedisCluster (which would then differ from its template).
  • The RedisCluster resource maintain references to any referenced Scope resources, and the RedisCluster controllers be aware of those scope resources.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally this classSelector would relate to/be derived from the hypothetical SchedulingScope.

Exactly!
In OAM this relationship is declared via AppConfig:

kind: AppConfig
metadata: ...
spec:
  components:
  - componentName: ContainerizedWorkload
    instanceName: coolest-app-f92dm
    scopes:
    - name: scheduling-scope-eu-prod
    parameterValues: ...
    traits:
    - type: ManualScalerTrait
      properties:
        replicaCount: 3

@upbound-bot
Copy link
Collaborator

81% (0.0%) vs master 81%

Signed-off-by: Nic Cope <negz@rk0n.org>
@upbound-bot
Copy link
Collaborator

81% (0.0%) vs master 81%

Copy link
Member

@hongchaodeng hongchaodeng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome work!

I think there are some gap between what is an instance and a schema in OAM concepts :)
I have commented it out and let's have more close collaboration on this!

```yaml
---
apiVersion: oam.crossplane.io/v1alpha1
kind: SchedulingScope
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually a new scope is defined as follows:

kind: ApplicationScope
metadata:
  name: scheduling-scope
spec:
  type: oam.crossplane.io/v1alpha1. SchedulingScope
  properties:
    ... the spec schema ...

Then a deployment would be done via AppConfig:

kind: ApplicationConfiguration
metadata:
  name:  scheduling-scope-prod-eu
spec:
  scopes:
    - name: scheduling-scope-prod-eu
      type: oam.crossplane.io/v1alpha1. SchedulingScope
      properties:
        ... the real config ...

Currently an AppConfig containing only scopes would be deployment of Scopes and scopes are global.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this document we're suggesting a change to how OAM scopes are specified. Specifically, we're exploring the idea that each type of scope would be a specific kind (like SchedulingScope, NetworkScope, etc), defined by a custom resource definition. This way the scope is an instance of configuration, and the CRD for that scope is its schema. We feel this is more "Kubernetes native" than having one CR (ApplicationScope) define a schema (i.e. define parameters) and another CR (ApplicationConfiguration) instantiate a configuration (i.e. set parameters). What do you think about this pattern?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's great, we saw the core idea of this proposal. This is mainly about whether we could use CRD as the generic definition of schema. Essentially, OAM has a "schematic object" to carry the schema of Component/Trait/Scope. In non-k8s runtimes, they could interpret this "schematic object" as OAM object schemas.

If we change to use CRD to model these schemas, how would you imagine, for example, if we want to define workload on FaaS (or other non-K8s etc)? I can think of: 1. always install a kube-apiserver; 2. implement a OAM apiserver.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we change to use CRD to model these schemas, how would you imagine, for example, if we want to define workload on FaaS (or other non-K8s etc)? I can think of: 1. always install a kube-apiserve; 2. implement a OAM apiserver.

We're strong proponents of approach one - it's exactly the approach we've taken with Crossplane and where we're betting the Kubernetes ecosystem will head. I think the fact that the OAM spec is inspired by Kubernetes API documents, and that the first OSS OAM implementation (Rudr) builds on Kubernetes is telling. It's a great general infrastructure control plane, even if you're not doing anything with containers.

We've always imagined Crossplane would grow to include a ServerlessApplication, VMApplication, etc alongside KubernetesApplication. These resources could orchestrate applications running outside of Kubernetes, but we'd still use the Kubernetes API server and controllers to orchestrate them.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. I agree we could assume K8s as the generic underlying control plane. This aligns with our current practice internally in Alibaba for VM workloads.

```yaml
---
apiVersion: core.oam.dev/v1alpha1
kind: ContainerizedWorkload
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually this would be kind ComponentSchematic:

kind: Component
metadata:
  name: containerized-workload
spec:
  parameters:
    ... the schema of the spec of ContainerizedWorkload ...

A deployment would be done in ApplicationConfiguration components section:

kind: AppConfig
metadata: ...
spec:
  components:
  - componentName: ContainerizedWorkload
    instanceName: coolest-app-f92dm
    parameterValues: ...

Note that we are trying to migrate all of the schema to WorkloadType. So ideally ContainerizedWorkload is a Workload type, as the name suggested itself :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to my comment above regarding application scopes, we're proposing a change to how OAM components are defined here. Specifically we're exploring the idea the one ComponentSchematic kind be broken down into many kinds of component, e.g. ContainerizedWorkload, RedisCluster, etc. The schema for each of these kinds would be defined by their custom resource definitions - not a workload type. In general we feel that Kubernetes CRDs should define configuration schema, and CRs should instantiate configuration. What do you think about this idea?

namespace: default
name: coolest-app-n3v3y
spec:
replicaCount: 3
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, the deployment of traits only happens in AppConfig:

kind: AppConfig
metadata: ...
spec:
  components:
  - componentName: ContainerizedWorkload
    instanceName: coolest-app-f92dm
    parameterValues: ...
    traits:
    - type: ManualScalerTrait
      properties:
        replicaCount: 3

replicaCount: 3
---
apiVersion: cache.crossplane.io/v1alpha1
kind: RedisCluster
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be modelled as an OAM Workload.

Proposal oam-dev/spec#281 will map it more intuitively.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to my comments above, we're proposing that custom resource definitions (not component schematics) be used to define the available kinds of workload and their configuration schema.

classSelector:
matchLabels:
environment: production
region: europe
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally this classSelector would relate to/be derived from the hypothetical SchedulingScope.

Exactly!
In OAM this relationship is declared via AppConfig:

kind: AppConfig
metadata: ...
spec:
  components:
  - componentName: ContainerizedWorkload
    instanceName: coolest-app-f92dm
    scopes:
    - name: scheduling-scope-eu-prod
    parameterValues: ...
    traits:
    - type: ManualScalerTrait
      properties:
        replicaCount: 3

Copy link
Member Author

@negz negz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fast feedback @hongchaodeng, and apologies for my slow reply - I just returned from vacation. I've replied to most of your comments, mostly clarifying that we're actually proposing changes to, not misunderstanding, the contemporary OAM specification. We feel these proposed changes would allow a runtime to implement OAM using idiomatic Kubernetes resources, which would be well aligned with Crossplane's current approach. We'd love your feedback on the proposed changes!

```yaml
---
apiVersion: oam.crossplane.io/v1alpha1
kind: SchedulingScope
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this document we're suggesting a change to how OAM scopes are specified. Specifically, we're exploring the idea that each type of scope would be a specific kind (like SchedulingScope, NetworkScope, etc), defined by a custom resource definition. This way the scope is an instance of configuration, and the CRD for that scope is its schema. We feel this is more "Kubernetes native" than having one CR (ApplicationScope) define a schema (i.e. define parameters) and another CR (ApplicationConfiguration) instantiate a configuration (i.e. set parameters). What do you think about this pattern?

```yaml
---
apiVersion: core.oam.dev/v1alpha1
kind: ContainerizedWorkload
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to my comment above regarding application scopes, we're proposing a change to how OAM components are defined here. Specifically we're exploring the idea the one ComponentSchematic kind be broken down into many kinds of component, e.g. ContainerizedWorkload, RedisCluster, etc. The schema for each of these kinds would be defined by their custom resource definitions - not a workload type. In general we feel that Kubernetes CRDs should define configuration schema, and CRs should instantiate configuration. What do you think about this idea?

namespace: default
name: coolest-app-n3v3y
spec:
replicaCount: 3
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're suggesting that an ApplicationConfiguration effectively be a template for instances of components and their traits - a little like how KubernetesApplication is a template for arbitrary Kubernetes resources (Deployment, etc). So under this pattern the OAM runtime (i.e. a Kubernetes controller) watching for ApplicationConfiguration would not actually deploy anything - instead it would create instances of each of the templated custom resource (in this case a ContainerizedWorkload, a ManualScalerTrait, and a RedisCluster. This would allow separate controllers to reconcile each of these custom resources. One advantage of this pattern is that it allows Crossplane's current resource claim controllers to work with OAM without a layer of abstraction (e.g. a RedisCluster workload type). What do you think?

replicaCount: 3
---
apiVersion: cache.crossplane.io/v1alpha1
kind: RedisCluster
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to my comments above, we're proposing that custom resource definitions (not component schematics) be used to define the available kinds of workload and their configuration schema.

replicaCount: 3
- component:
apiVersion: cache.crossplane.io/v1alpha1
kind: RedisCluster
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nits: seems this RedisCluster need to be changed to an extended workload spec.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually by design - one thing we were going for in this proposal is that existing Kubernetes kinds like RedisCluster (a Crossplane's resource claim) could effectively be "extended workloads". For a runtime like Crossplane this means we can tightly integrate our first class concepts (like resource claim) with OAM, instead of needing to define a new layer of abstraction (an extended workload kind) that would produce a resource claim behind the scenes.

Copy link
Contributor

@resouer resouer Jan 18, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see.

Can you accept:

- component:
    instance:
      apiVersion: cache.crossplane.io/v1alpha1
      kind: RedisCluster
      spec:
        classSelector:
          matchLabels:
            environment: production
            region: europe
            ...

The only tiny compromise is a instance field.

The intention is we want to support both annotation and another layer obj for manageability and discoverability fields.

the `ApplicationConfiguration` `components` field? One option may be to use a
validating webhook that finds the CRD that defines the templated kind and uses
its OpenAPI schema to validate it.
* How could we maintain the existing user experience around easily discovering
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CRD categories will list all resources belong to that category. In fact we just want to list Trait CRDs, so we may need to use label.

kubectl get crds -l  oam.dev/part-of=trait

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CRD categories will list all resources belong to that category.

I see your point that listing the CRDs (the schema) would be closer to today's kubectl get trait, given that today traits are schema. It could also be useful to be able to list all instances of traits (even if they were different CR kinds), though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I think we could still use WorkloadType and Trait to do register/discovery thing.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Traits are meant to be behavioral characteristics on instances of components. It wouldn't make sense to list instances of traits.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It wouldn't make sense to list instances of traits.

Would it make sense if these characteristics were modelled as distinct API objects when instantiated? If I follow correctly you're saying a trait, applied by the app config, might somehow mutate the component instance produced by a schematic. Our thinking was more along the lines that a component with traits (at the app config level) would produce a component instance and several trait instances - there would be relationships (references) between these documents such that the runtime (controller) reconciling the component instance could factor in the trait instances.

* What is the best approach to validate the array of components and traits under
the `ApplicationConfiguration` `components` field? One option may be to use a
validating webhook that finds the CRD that defines the templated kind and uses
its OpenAPI schema to validate it.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't we just use WorkloadType and Trait to define the schema? We can also use validating webhook to check the schema. In fact, as we use ApplicationConfiguration, we already have a layer between user application and K8s CRD. So WorkloadType and Trait didn't bring any burden, they can help unify different CRDs.

@vturecek
Copy link

vturecek commented Jan 7, 2020

Thanks for pulling this together @negz . So the overall idea, if I'm understanding it correctly, is to define OAM in terms of the Kubernetes API, rather than just using Kubernetes conventions, and in doing so, do it in a way that's more natural to the Kubernetes API, i.e., CRDs for each type within the primary constructs (traits, scopes, workloads).

A few questions that came up for me:

  • How does this affect the separation of concerns between an application developer, an application operator, and an infrastructure operator?
  • How does it affect portability or consistency across implementations of the spec?
  • Does it still make sense to think of OAM as a platform-neutral specification?
  • How does this change tooling and user experience around discovery of capabilities for a given implementation of the spec?

@negz
Copy link
Member Author

negz commented Jan 8, 2020

Thanks for pulling this together @negz .

My pleasure!

So the overall idea, if I'm understanding it correctly, is to define OAM in terms of the Kubernetes API, rather than just using Kubernetes conventions, and in doing so, do it in a way that's more natural to the Kubernetes API, i.e., CRDs for each type within the primary constructs (traits, scopes, workloads).

Exactly. Our feeling is that if we can assume the use of the Kubernetes API server, and thus the use of the actual Kubernetes API the OAM spec could be both simpler and a more natural fit for projects like Crossplane.

How does this affect the separation of concerns between an application developer, an application operator, and an infrastructure operator?

I don't believe there's a dramatic change here - it's more a reshaping of existing concepts. An application developer can still publish the shape/schema of their components by publishing a component CRD. The same is true for infrastructure operators with trait/scope CRDs. Application operators would tie these all together by publishing an ApplicationConfiguration that references these things.

How does it affect portability or consistency across implementations of the spec?

I don't believe this would be affected in any meaningful way. We could require all OAM runtimes to support a certain set of resource kinds, just as we currently require all OAM runtimes to support a certain set of workload types and scopes.

Does it still make sense to think of OAM as a platform-neutral specification?

Is it important that the specification be platform-neutral, or that the runtimes that implement the specification be platform neutral? We're proposing that the specification assume it is implemented atop a specific API and technology (the Kubernetes API server), but this is distinct from proposing that the specification only manage Kubernetes applications or only be useful to folks who use Kubernetes. I'd argue that Crossplane is a platform neutral runtime, for example. We built Crossplane atop the API server, but we expect folks who do not use Kubernetes clusters at all will use Crossplane to manage their applications and infrastructure.

How does this change tooling and user experience around discovery of capabilities for a given implementation of the spec?

It seems that the main impact under this proposal would be that a concept ("a component", or "a scope") would have more than one resource kind. Our thinking is that labels on CRDs (e.g. kubectl get crd -l component=true) (for schema discovery) and CRD categories (for instance discovery) could help with this.

@negz
Copy link
Member Author

negz commented Jan 8, 2020

Hi folks, thanks for the great discussion around this proposal yesterday. It seems like we have a lot of alignment around the need for separation of concerns between application and infrastructure persona, and the problems we're trying to solve. Do you think we can align on OAM runtimes assuming the actual Kubernetes API (e.g. the use of the API server)? I think that once we do that we can workshop any potential improvements to the spec it might enable.

@vturecek
Copy link

@negz the Alibaba Cloud team has a second draft of this proposal. It retains the main point of using unique resource kinds for each trait, scope, and workload. I posted it up to our mailing list so we can get more eyes on it (with a link back to here for the source material).

https://groups.google.com/forum/#!forum/oam-dev

Thanks again for putting in the time on the first draft, it's exciting to see this take shape.

@negz negz self-assigned this Jan 23, 2020
@negz
Copy link
Member Author

negz commented Jan 30, 2020

Hi all,

The latest iteration of this design has moved to a Google Document. I'm going to close this pull request - please comment on the document!

Thanks,
Nic

@negz negz closed this Jan 30, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants