Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Re-apply old canary selector label to canary service is rollout is aborted #2781

Closed
meeech opened this issue May 15, 2023 · 3 comments
Closed
Labels
bug Something isn't working

Comments

@meeech
Copy link
Contributor

meeech commented May 15, 2023

(Not sure if this is a bug or enhancement) is bug.

Summary

When AR manages a canary service, it manages the selectors to ensure the canary service points at the canary pods.

Once a rollout is fully promoted, the service continues to point at the stable pods. So if you have something pointing at the service, it continues to operate.

If you abort a rollout, the canary service selector label is not rolled back as well, leading to a 'dead'; service. I would like to change this behaviour that when you rollback, the selector should be restored as well.

Use Cases

Whats interesting about how this works is that is allows you to rig up a fully canary path for your services assuming a certain naming convention.

eg: canary.ingress.example.com > canary-service.ns.svc.local > another-service.ns.svc.local

You can then have a trigger that whenever a request comes in to host canary.ingress.example.com that you hit up all the canary-* service endpoints internally. If other services are running a canary at the moment, great! if not, also great, since everything will respond on that code path.

Wanted to check if this is desired before doing the work.
Let me know if this needs more details.

[edit] Since this seems to be a bug (#2540) a few more details

  • Reproduced using 1.4 and 1.5
  • Abort was done via dashboard

Message from the maintainers:

Impacted by this bug? Give it a 👍. We prioritize the issues with the most 👍.

@meeech meeech added the enhancement New feature or request label May 15, 2023
@zachaller
Copy link
Collaborator

Is this just a missing condition of this #2540

@meeech
Copy link
Contributor Author

meeech commented May 16, 2023

@zachaller probably? This is a rollout not using any service mesh. Just number of pods to control %.

@zachaller zachaller added bug Something isn't working and removed enhancement New feature or request labels May 17, 2023
@meeech
Copy link
Contributor Author

meeech commented May 31, 2023

@zachaller I started looking into this, but it looks like the issue is higher up - it exits early if there is no TrafficRouting Reconcilers found.

https://github.com/zachaller/argo-rollouts/blob/820f516bd1aa0b003e9ebc956528053d8f69e8a8/rollout/trafficrouting.go#L124

I think the the solution would be in modifying the reconcileStableAndCanaryService: https://github.com/zachaller/argo-rollouts/blob/820f516bd1aa0b003e9ebc956528053d8f69e8a8/rollout/service.go#L256

see related pr

meeech added a commit to meeech/argo-rollouts that referenced this issue May 31, 2023
meeech added a commit to meeech/argo-rollouts that referenced this issue May 31, 2023
Signed-off-by: mitchell amihod <4623+meeech@users.noreply.github.com>
meeech added a commit to meeech/argo-rollouts that referenced this issue Jun 2, 2023
Signed-off-by: mitchell amihod <4623+meeech@users.noreply.github.com>
meeech added a commit to meeech/argo-rollouts that referenced this issue Jun 6, 2023
…ting is used

fixes argoproj#2781

Signed-off-by: mitchell amihod <4623+meeech@users.noreply.github.com>
meeech added a commit to meeech/argo-rollouts that referenced this issue Jun 12, 2023
…ting is used

fixes argoproj#2781

Signed-off-by: mitchell amihod <4623+meeech@users.noreply.github.com>
meeech added a commit to meeech/argo-rollouts that referenced this issue Jun 19, 2023
…ting is used

fixes argoproj#2781

Signed-off-by: mitchell amihod <4623+meeech@users.noreply.github.com>
meeech added a commit to meeech/argo-rollouts that referenced this issue Jun 21, 2023
…ting is used

fixes argoproj#2781

* remove redundant block of code

Signed-off-by: mitchell amihod <4623+meeech@users.noreply.github.com>
meeech added a commit to meeech/argo-rollouts that referenced this issue Jun 29, 2023
…ting is used

fixes argoproj#2781

* remove redundant block of code

Signed-off-by: mitchell amihod <4623+meeech@users.noreply.github.com>
meeech added a commit to meeech/argo-rollouts that referenced this issue Jun 29, 2023
…ting is used

fixes argoproj#2781

* remove redundant block of code

Signed-off-by: mitchell amihod <4623+meeech@users.noreply.github.com>
agaudreault added a commit to agaudreault/argo-rollouts that referenced this issue Mar 22, 2024
… on rollout abort argoproj#2781 (argoproj#2818)"

This reverts commit 3c5ac36.

Signed-off-by: Alexandre Gaudreault <alexandre_gaudreault@intuit.com>
zachaller added a commit to zachaller/argo-rollouts that referenced this issue Mar 25, 2024
… on rollout abort argoproj#2781 (argoproj#2818)"

This reverts commit 3c5ac36.

Signed-off-by: Zach Aller <zachaller@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants