Skip to content
This repository has been archived by the owner on Jul 11, 2023. It is now read-only.

[request] Support to create a K8s Deployment instead of Statefulset resource #94

Open
cmoulliard opened this issue May 15, 2020 · 12 comments

Comments

@cmoulliard
Copy link

Request

When cf is used with Eirini and command cf push executed, then a k8s statefulset resource is created.

While this resource addresses a specific use case needed for master/slave topology, this is not the best k8s resource to be created to deploy a microservice application which only requires to be stateless (see : https://stackoverflow.com/questions/41583672/kubernetes-deployments-vs-statefulsets).

Would it be possible to offer an option or config parameter that we could pass between CAPI and Eirini in order to specify the type fo the resource to be created ?

@cf-gitbot
Copy link

We have created an issue in Pivotal Tracker to manage this:

https://www.pivotaltracker.com/story/show/172858637

The labels on this github issue will be updated when the story is started.

@julz
Copy link
Contributor

julz commented May 18, 2020

Hi @cmoulliard - the main reason we use StatefulSet is to continue to support the INSTANCE_INDEX variable, which some apps rely on. We'd like to deprecate that environment variable (in favour of cf task support), but it would break some apps so we're not rushing to do that.

I'm interested if there are specific problems with StatefulSet vs. Deployment you're aware of that would make this higher priority? We set .spec.podManagementPolicy=Parallel on the created StatefulSets so in practice there isn't (we think!) much of a difference between the two approaches.

@cmoulliard
Copy link
Author

the main reason we use StatefulSet is to continue to support the INSTANCE_INDEX variable, which some apps rely on. We'd like to deprecate that environment variable (in favour of cf task support), but it would break some apps so we're not rushing to do that.

Very interesting. If the INSTANCE_INDEX is needed, then we could use a kubernetes label which offers the possibility to search easily about the pod running an application using the query kubectl get pod -l app=INSTANCE_INDEX and by consequence it is not needed to use a statefulset resource

@cmoulliard
Copy link
Author

in favour of cf task support

Which kind of kubernetes resource are you gonna to create for a cf task ?

@julz
Copy link
Contributor

julz commented May 18, 2020

Very interesting. If the INSTANCE_INDEX is needed, then we could use a kubernetes label

Yeah, unfortunately I think we looked in to this and it's quite difficult to do with Deployment, even with a label (it's hard to apply a label to individual pods in a deployment in an ordered way). Again the good news is we can't actually find many ways in which a StatefulSet with .spec.podManagementPolicy=Parallel is worse, other than that it looks bad :-D.

We'd still definitely like to move back to Deployments once we deprecate Instance_Index, but we'd be more urgent about it if we had a use case where using StatefulSets actively hurt anything :).

@julz
Copy link
Contributor

julz commented May 18, 2020

Which kind of kubernetes resource are you gonna to create for a cf task ?

Creates a Kubernetes Job.

@loewenstein
Copy link

Again the good news is we can't actually find many ways in which a StatefulSet with .spec.podManagementPolicy=Parallel is worse, other than that it looks bad :-D.

Did you happen to write down the results of investigating the differences?

I didn't (yet) look any deeper, but what immediately comes to mind is that Deployment rolling update strategy supports a maxSurge parameter, which will make sure that the number of Pods doesn't drop below the requested number of instances.

This alone will not help cover the drain behaviour as executed by Diego and discussed in cloudfoundry/cf-for-k8s#600, but also application updates should rather overprovision an app than underprovision it imho.

@jsimao71
Copy link

jsimao71 commented May 6, 2022

Actually it is a big problem to use StafefulSets rather than Deployments.
As pointed out by above, CF apps are intended to be mostly stateless -- so they are a better match for K8s Deployments. Namely, they don't rely on permanent disk storage by concept design.
Statefulsets have a key characteristic that pods are assigned to workers and don't move around if the worker fails.
In particular, if the worker where the pod is running is shutdown/crashes, the pod will stay in Terminate state forever (till the node restarts or is remove from the cluster). We experimented with this extensively.
That is, there is no automatic migration of the pod to a healthy worker -- which is a key feature and hallmark of Kubernetes. In fact, one of the main reason, if not the main reason, to use a Kubernetes managed cluster.
This is might be acceptable and desirable for a pod that needs to be close to the local disk storage, but a big no-no for all other pods.
With Deployment, a failed pod will be migrated to other worker. Thus providing reliably and some HA, even if there a single application instance running. See an example in this blog post: https://medium.com/tailwinds-navigator/kubernetes-tip-what-happens-to-pods-running-on-node-that-become-unreachable-3d409f734e5d
The current state of affairs in CF4K8s is very sad. Because to have some decent level of HA, apps need to be deployed always with multiple instances. Unfortunately, ssome apps might not be prepared to run with multiple instance (e.g. they interact with external systems, and not all operations are idempotent). This a big problem for us at this point.
The justification that a StatefulSet is used because of the env var INSTANCE_INDEX does not seam to be too strong. It should be possible to emulate with other way, even if with a slightly weaker semantics, or at least should be optional. In clusters where there are no legacy apps that need this feature (most of the time), this is an unnecessary deal breaker.
My recommendation and polite request, is to have a global option setting for CF4K8s at installation time where is possible to specify if statefulsets vs. deployments should be should be used. In case deployments is select, no strong guarantee is given about the semantics of INSTANCE_INDEX. Does not really matter which option is the default option, as long as it is possible to set the option.
This will allow to use CF4K8s in Kubernetes without loss of key functionality.
As it is now, using CF4K8s kills a good part of the rationale of Kubernetes. And it might in itself be a big put back to adoption of CF4K8s in may projects.
The proposed change should be relatively easy to implement, fortunatelly.

@gcapizzi
Copy link
Contributor

gcapizzi commented May 6, 2022

Hey @jsimao71,
we've been aware of this issue for quite some time, see also cloudfoundry/eirini-release#207. Addressing this in cf-for-k8s was pretty much impossible as the assumption of numeric instance IDs was sprinkled across many components from the original Cloud Foundry we relied on.

Nowadays the community is focusing on Korifi, which is a reimplementation of CF done from scratch on top of Kubernetes. It still supports the CF v3 API via a shim, so we still try to be compatible with the original CF as much as we can and to support existing clients. That said, we don't reuse any of the original CF components, which should make it easier to introduce changes like this.

We're currently in the process of introducing abstractions that would allow anyone to customise how Korifi builds container images and runs workloads. Once these abstractions are in, it should be relatively easy to introduce a Deployment-based runner to replace the current StatefulSet-based one.

We'd still have to figure out the best way not to break existing clients which expect a numeric instance ID, and probably introduce string IDs too. It should still be a lot easier than on cf-for-k8s.

I understand these issues might be frustrating but please keep in mind that we're trying to reimplement a system on top of a different one with different behaviours, guarantees and design choices, while trying hard not to break existing clients and not to surprise existing users too much. Keeping a balance between Kubernetes functionality and CF compatibility is one of our biggest challenges and we put a lot of thought into it: most of the times it's not as simple as it looks.

@jsimao71
Copy link

jsimao71 commented May 9, 2022

Hello, @gcapizzi !
Thanks for the pointer to Korifi! Looks promising and interesting.
Unfortunately, for you current project this is not an option anymore. Since we are going to production in a few months, after 1+year of development, and CF4K8s is an integral part of the overall project architecture.
So we really need to find a way to deploy CF4K8s apps as Deployment.
We don't have the need for INSTANCE_INDEX.
The key point is that most apps and projects (99%+) don't need this. Services that are deployed as StatefulSet in K8s are usually prepared to deal with coordination across multiple instances, using a service specific protocol (e.g. a DB leader/primary->replica(s)/secondary) so it makes sense that they are bound to a specific worker. But apps very rarely need or have this requirement.
So my question to you is:
can you add the option to support Deployments rather than StatefulSets in the CF4K8s codebase?!? It could be a simple global configuration option.

The INSTANCE_INDEX could be always 0 in this installation mode.
If you really want to have INSTANCE_INDEX different per instance, this can be implemented as an additional mechanisms.
Set the environment of the pod to give access the POD id (metadata.gui), and optionally the DeploymentID (Controlled By field), and the cf4k8s app UID (labels['cloudfoundry.org/app_guid'])
https://kubernetes.io/docs/tasks/inject-data-application/downward-api-volume-expose-pod-information/
https://kubernetes.io/docs/tasks/inject-data-application/environment-variable-expose-pod-information/

env:
        - name: POD_ID
          valueFrom:
            fieldRef:
              fieldPath: metadata.uid

There are several ways to then implement or emulate this:

  • the app container opi can make a callback to CF4K8s with app ID, and pod ID, and ask for the index.
  • callback the k8s api (e.g. with kubectl installed) to find index by querying the list of pods for the Deployment
  • Emulate by hashing the pod.gui with some max value (pod.gui % MAX_INSTANCES)
  • Set INSTANCE_INDEX always to 0 for all instances
    For the first 3 approaches, the container opi bootstrap need to set the INSTANCE_INDEX env for the app before starting the app code. (i.e. changes/augments the env before the app starting).
    For the first 2 approaches a callback is needed -- so require more chang.

The global config of cf4k8s could be used to select this two options:
cf48s.workload_type=deployment|statefulset (|sts)
cf4k8s.instance_index_mode=strict|zero|mod|hash|callback|...

If you and the team of CF4K8s don't have bandwidth to work on this option(s) in the next couple of months, can you provide some hints about the code base so we can do it in our own private fork?!?
We are already doing some minor changes in the installation (e.g. to work in airgapped mode), so I guess in theory we can also make this change in the code if it not too complicated. Although, we would prefer the CF4K8s team to do it -- so give more confidence on the change. But we still need to know how to do it. So we would need your tips where/how to change the codebase to that the resource type create is Deployment rather than StatefulSet.
The "introducing abstractions" roadmap for Korifi you mention, does it have any relevance to CF4K8s?!?
Thanks,
Jorge.

@gcapizzi
Copy link
Contributor

gcapizzi commented May 9, 2022

Hey @jsimao71,
I wouldn't say INSTANCE_INDEX is legacy: it is a current feature of Cloud Foundry and yes, it is used in a bunch of places (e.g. metric and log collections need it, some frameworks use it, etc).

cf-for-k8s isn't actively developed anymore, so no, we won't be implementing this. We're working on Korifi which is a completely new project, so any work we do there doesn't impact cf-for-k8s.

If you want to do this yourself, you should check cloudfoundry/eirini-release#207, which was an experiment to test the consequences of doing the change. That issue references a bunch of commits which should guide through the changes. Our constraints back then were different from yours, so you might get away with it with fewer or no issues, but I'm afraid it might be tricky.

@jsimao71
Copy link

jsimao71 commented May 9, 2022

OK! Understood! We will study the eirini-release#207 to see if we can make sense of it and use it. Thanks!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants