Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integration in error phase can't be scaled: why don't we just rebuild it? #2640

Closed
nicolaferraro opened this issue Sep 16, 2021 · 2 comments · Fixed by #2645
Closed

Integration in error phase can't be scaled: why don't we just rebuild it? #2640

nicolaferraro opened this issue Sep 16, 2021 · 2 comments · Fixed by #2645
Assignees
Labels
kind/bug Something isn't working
Milestone

Comments

@nicolaferraro
Copy link
Member

It seems silly to scale an integration if it does not work, but now the error condition includes also pod failures.

So if a container loses liveness because it receives too much traffic, then you can't scale the integration up.

The number of replicas is not synced back into the deployment/ksvc when the phase is "Error", and we know it:

// TODO check also if deployment matches (e.g. replicas)

What prevents us to just add "replicas" to the set of fields considered for the digest, so that when the number of replicas change the integration restarts from "Initialization"? All steps from "Initialization" to "Running" are idempotent, so why is there such a shortcut for the "replicas" field, just to keep the phase in "Running"? It often causes issues with the GC, like #2639

cc: @astefanutti , @squakez , @lburgazzoli

@nicolaferraro nicolaferraro added the kind/bug Something isn't working label Sep 16, 2021
@nicolaferraro nicolaferraro added this to the 1.7.0 milestone Sep 16, 2021
@astefanutti
Copy link
Member

astefanutti commented Sep 16, 2021

The replicas field is already a bit special for Deployment, as it does not change the PodSpec hash, and does not trigger a rollout when changed. We kind of do the same thing for Integration. I'd be inclined to think core k8s developers had good reasons to do so, and I would find a bit weird to re-initialise an Integration when scaled.

Now for the relation to GC, there are multiple things. In the context of #2639, I still think creating a Kit that's owned by the Integration in that case is a shortcut, see #2365. In the context of other issues about GC, AFAIR, it was about forgetting to add both IntegrationPhaseDeploying and IntegrationPhaseRunning phases. It turns out both are always added, so the question arises, what is the point of having both?

I'm working on scaling right now as I've found issues in the context of #2443. So I can piggy-back any new issues related to scaling.

@astefanutti
Copy link
Member

astefanutti commented Sep 16, 2021

Let me assign this to myself. The Error phase is kind of messy, and I'll fix that in the context of #2443.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants