Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

controller stuck when rollout fails #138

Open
Jean-Daniel opened this issue Dec 8, 2023 · 2 comments
Open

controller stuck when rollout fails #138

Jean-Daniel opened this issue Dec 8, 2023 · 2 comments
Assignees

Comments

@Jean-Daniel
Copy link

When pushing a change on a DragonFly resource, a rollout, if an other update is pushed, the controller will wait until first rollout is done before applying any change.

This is an issue, as if the first change contains a typo (invalid image url for instance), there is no way to fix it, as the change with the right image url will never be applied.

To reproduce:

  • deploy an operator and a DragonFly instance with 2 replicas.
  • update the dragonfly instance and specify an image with an non existing tag.
  • push an other change with a good tag.

The controller wait for the first change to be fully applied but it never occurs as the pod are failing to start with ImagePullError.

@Pothulapati
Copy link
Collaborator

This is a good find! The fix would be to give up on the roll out if the first deleted pod isn't coming back and then accept more updates! @Jean-Daniel Do you want to take it up?

@parera10
Copy link

parera10 commented Feb 7, 2024

I've just found this situation modifying dragonfly resource with not enough memory. Operator it's not able to rollback nor interrupt the current rollout with a new one applying valid settings.

@Abhra303 Abhra303 self-assigned this Feb 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants