Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sidecar KEP API implementation #919

Merged
merged 1 commit into from Jun 19, 2019

Conversation

Joseph-Irving
Copy link
Member

@Joseph-Irving Joseph-Irving commented Mar 26, 2019

This is my suggestion for how we could implement the API for defining the sidecars #753.
Having a bool was not very popular, having an enum instead seemed to have more support so I'm suggesting we have lifecycle.type: Sidecar

I'm not sure what the default value should be for type, I suggested Standard here but some other suggestions could be Default, Main, Primary.

/sig apps
/sig node
/assign @kow3ns @enisoc

@k8s-ci-robot k8s-ci-robot added sig/apps Categorizes an issue or PR as relevant to SIG Apps. sig/node Categorizes an issue or PR as relevant to SIG Node. labels Mar 26, 2019
@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Mar 26, 2019
@k8s-ci-robot
Copy link
Contributor

Hi @Joseph-Irving. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory labels Mar 26, 2019
@Joseph-Irving Joseph-Irving mentioned this pull request Mar 26, 2019
11 tasks
@Joseph-Irving
Copy link
Member Author

/kind api-change

@k8s-ci-robot k8s-ci-robot added the kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API label Mar 26, 2019
@fejta-bot
Copy link

This PR may require API review.

If so, when the changes are ready, complete the pre-review checklist and request an API review.

Status of requested reviews is tracked in the API Review project.

Copy link
Member

@thockin thockin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with this. It's consistent with guidance.

@@ -88,7 +88,7 @@ spec:
command: ['do something']
- name: sidecar
image: sidecar-image
sidecar: true
containerLifecycle: Sidecar
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/containerLifecycle/lifecycle/g

It's already nested under "containers[]"

disagree?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@thockin yeah I would've preferred that but lifecycle is already a field in the container spec, where PostStart and PreStop are defined.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even though lifecycle up until now has only been used to specify lifecycle hooks, my opinion is that it's better to extend it with something like type: Sidecar or behavior: Sidecar instead of introducing an additional lifecycle field (one called containerLifecycle).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think using lifecycle is definitely cleaner, the struct is currently just used for lifecycle hooks, but the name is more generic than that. So unless people think it would cause confusion to add a new non hook field to it, might be the best solution.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding a field under the existing lifecycle: struct makes sense to me. It would only need a slight rewording of the documented purpose of the lifecycle section, which is a bit hook-specific right now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, seems reasonable, I can rework this to be within the Lifecycle struct. In regards to what the field should be called, I originally liked behaviour, but then I remembered that Americans spell that in their own way, so maybe type is a bit more internationally friendly.

//One of Standard, Sidecar.
//Defaults to Standard
//+optional
ContainerLifecycle ContainerLifecycle `json:"containerLifecycle,omitempty" protobuf:"bytes,22,opt,name=containerLifecycle,casttype=ContainerLifecycle"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

InitContainers are also using type Container - this is meaningless for them, right?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are use-cases where you want to have a sidecar running during the pod init phase. For example, you might want to run the Envoy proxy as a sidecar during init in case any init containers try to perform network calls and you'd like to shape that network traffic also (in the same way you do with regular containers).

@@ -156,9 +181,9 @@ Please view this [video](https://youtu.be/4hC8t6_8bTs) if you want to see what t

### Risks and Mitigations

You could set all containers to be `sidecar: true`, this would cause strange behaviour in regards to shutting down the sidecars when all the non-sidecars have exited. To solve this the api could do a validation check that at least one container is not a sidecar.
You could set all containers to be `containerLifecycle: Sidecar`, this would cause strange behaviour in regards to shutting down the sidecars when all the non-sidecars have exited. To solve this the api could do a validation check that at least one container is not a sidecar.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think requiring at least one non-sidecar makes sense.


Init containers would be able to have `sidecar: true` applied to them as it's an additional field to the container spec, this doesn't currently make sense as init containers are ran sequentially. We could get around this by having the api throw a validation error if you try to use this field on an init container or just ignore the field.
Init containers would be able to have `containerLifecycle: Sidecar` applied to them as it's an additional field to the container spec, this doesn't currently make sense as init containers are ran sequentially. We could get around this by having the api throw a validation error if you try to use this field on an init container or just ignore the field.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not exactly your problem, but maybe we should just not be sharing the base type?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

e.g.

--- a/pkg/apis/core/types.go
+++ b/pkg/apis/core/types.go
@@ -1932,7 +1932,7 @@ type ResourceRequirements struct {
 }
 
 // Container represents a single container that is expected to be run on the host.
-type Container struct {
+type BaseContainer struct {
        // Required: This must be a DNS_LABEL.  Each container in a pod must
        // have a unique name.
        Name string
@@ -1993,6 +1993,9 @@ type Container struct {
        // If set, the fields of SecurityContext override the equivalent fields of PodSecurityContext.
        // +optional
        SecurityContext *SecurityContext
+}
+type Container struct {
+       BaseContainer // Embed
 
        // Variables for interactive containers, these have very specialized use-cases (e.g. debugging)
        // and shouldn't be used for general purpose containers.
@@ -2003,6 +2006,9 @@ type Container struct {
        // +optional
        TTY bool
 }
+type InitContainer struct {
+       BaseContainer // Embed
+}

This has wide-ranging code impact and probably openapi impact, but may be worth doing anyway...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think refactoring fields into a nested struct a level deeper is wire-compatible with our proto serializations

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the direction we're heading for ephemeral containers, but the primary motivator is to have separate docs so we can label some fields as "this doesn't work with ephemeral containers".

Separating fields would mean we wouldn't need to do this and an EphemeralContainer would only have fields allow for ephemeral containers, but complexity would increase with the differences between container types. Something like:

type BaseContainer struct {
    // All containers have a name
    Name string
    ...
}

type ServingContainer struct {
    // Only serving containers have ports, probes, etc.
    Ports []ContainerPorts
    ReadinessProbe *Probe
    Lifecycle *Lifecycle
    ...
}

type InteractiveContainer struct {
    TTY bool
    ...
}

type Container {
    BaseContainer
    ServingContainer
    InteractiveContainer
}

type InitContainer {
    BaseContainer
    ServingContainer
}

type EphemeralContainer {
    BaseContainer
    InteractiveContainer
}

It would be painful if we added a new container type or if we wanted to add a previously-disallowed field to EphemeralContainer or add a new container type that doesn't fit in these categories, especially considering @liggitt's point about proto serialization.

@Joseph-Irving
Copy link
Member Author

Ok, I've updated this to be nested in lifecycle as this seems preferable to having containerLifecycle and lifecycle.

So it would now be

name: sidecarContainer
image: foo
lifecycle:
  type: Sidecar

Let me know what you think

@mattfarina
Copy link
Contributor

This looks ok to me but an API reviewer needs to look at the new design.

@kow3ns @liggitt @thockin Can you please look at the changed API being proposed.

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Apr 15, 2019
@Joseph-Irving
Copy link
Member Author

@kow3ns @liggitt @thockin if any of you could have a look at the revised design that would be appreciated.

@thockin thockin self-assigned this May 14, 2019
@thockin
Copy link
Member

thockin commented May 14, 2019

Sorry to lose this, I wasn't assigned. Will look as soon as I can, but I am crushed with KubeCon prep :(

@smarterclayton
Copy link
Contributor

@verb Are ephemeral containers going to have to deal with sidecar status? Since ephemeral is a little further ahead of sidecar, I'd like you to indicate whether an ephemeral container has "lifecycle type" or not.

@smarterclayton
Copy link
Contributor

Actually, both ephemeral and sidecar containers have parallel api concerns. The current discussion with ephemeral around how the docs change - I would also expect the docs for sidecar containers to change. The current argument with ephemeral containers is that field can change over the lifetime of the pod, while init and regular containers cannot. Right now, I might argue ephemeral containers have a different lifecycle type, but are still in a separate field. But I'd like to hear good arguments for/against folding ephemeral containers into regular containers and lifecycleType being Ephemeral, vs Sidecar.

@verb
Copy link
Contributor

verb commented May 27, 2019

Replying to @smarterclayton:

I'd like you to indicate whether an ephemeral container has "lifecycle type" or not.

Lifecycle is currently disallowed for ephemeral containers. They have an implicit lifecycle type which is similar to sidecars in some ways (they will not prevent a pod from exiting) and different in others (there's no startup/shutdown ordering, they're not restarted).

Since we're defining a lifecycle type, we should either:

  1. create an Ephemeral type because it does have a different lifecycle as defined by this field, and we should fill this field accurately.
  2. disallow it completely in ephemeral containers because it's redundant. Anything in Pod.Spec.EphemeralContainers must have a type of Ephemeral, so why make the user specify it in two locations? We could always add it later.
  3. Require the field for EphemeralContainers to be Ephemeral if present and default it to Ephemeral if not. Having an Ephemeral type would probably make the kubelet lifecycle changes more straightforward.

I suggest the third option.

I'd like to hear good arguments for/against folding ephemeral containers into regular containers and lifecycleType being Ephemeral, vs Sidecar.

The primary motivation for the API change for ephemeral containers is the desire for most of containers spec to remain immutable. I think there's general consensus on this point that this encourages best practices and doesn't disturb long standing assumptions in code.

This is easiest with a separate Pod.Spec.EphemeralContainers list, but we could also make Pod.Spec.Containers semi-mutable for containers with lifecycle type Ephemeral. We already require changes to EphemeralContainers happen through a subresource of the pod, we could instead have the subresource update Pod.Spec.Containers.

Arguments for:

  • A more succinct API with fewer container types and lists of containers in Pod.Spec. This would simplify the frequent operation of "iterate through all containers in a pod".
  • Eliminating the EphemeralContainer type eliminates the need for frequent type conversions. (Though this is only an argument for using the Container type for ephemeral containers, not for merging the lists.)

Arguments against:

  • We would need to extend Container with fields specific to ephemeral containers such as TargetContainerName for targeting the namespace of a particular container.
    • This specific field could become generic if sidecar containers wanted to tie their lifecycle to a particular subset of containers, but I doubt the increase in complexity would be justified.
  • Some fields in Container would be incompatible with lifecycle type Ephemeral (for example, Resouces or ReadinessProbe). It's probably already the case that some fields are incompatible with others, but it could lead to a frustrating user experience. Perhaps it's just a documentation challenge.
  • We're not sure what makes assumptions that containers will not be added to Pod.Spec.Containers, either within Kubernetes or external code operating on serialized resources.
  • Patch strategy will be complicated without a separate list of ephemeral containers.
  • Humans interact frequently with these objects, and separate lists are easier for humans than a longer single list that changes (based on my sample set of 1 human)

I'll reflect on it a bit more, but so far I'm leaning towards separate list of ephemeral containers. Not to kick the can down the road, but I'm starting to become curious how a v2.Pod would look.

@verb
Copy link
Contributor

verb commented May 28, 2019

@Joseph-Irving @smarterclayton @liggitt I'm no API reviewer, but I have bandwidth and this seems to have some overlap with ephemeral containers. Let me know if I can help with first-level reviews or anything to speed along sidecar lifecycle.

@Joseph-Irving
Copy link
Member Author

@verb any help with reviews, etc is most appreciated :)

If you like you could look at the POC branch over here kubernetes/kubernetes#75099, it's very much a first draft and hasn't been touched in a while as I'm waiting for us to settle on the API before continuing development, but any comments on the general implementation would be handy.

@smarterclayton
Copy link
Contributor

re: separate ephemeralContainers array

The strongest argument you didn't mention is that disabling the field (for alpha) is a requirement, and filtering out ephemeral containers from containers would be difficult / impossible to convey to a user (a container disappears from an array).

We need to consider that for sidecars as well (alpha disablement), especially when it comes to sidecars in run once pods. Since lifecycle type will be ignored, there would be no way to have a long running sidecar, but that's less of an issue.

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 31, 2019
Copy link
Member

@thockin thockin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As an API this seems fine tome -- I hope this is not blocking the actual work?

When you have API code changes, we can do an API review properly, even before the full impl is done

Thanks!

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 13, 2019
@Joseph-Irving
Copy link
Member Author

Joseph-Irving commented Jun 14, 2019

Hey @janetkuo, @kow3ns, please can you have a look at this when you get a chance. we need an approve from a sig-apps lead

@kow3ns
Copy link
Member

kow3ns commented Jun 17, 2019

/approve

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 17, 2019
@JoelSpeed
Copy link

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 19, 2019
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: JoelSpeed, Joseph-Irving, kow3ns, thockin

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. sig/apps Categorizes an issue or PR as relevant to SIG Apps. sig/node Categorizes an issue or PR as relevant to SIG Node. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet