You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
Usually chaos engineering experiments run over a predetermined period of time. This has many benefits:
Users don't need to terminate the experiments manually.
It acts as a safety net; it is fairly easy to forget to terminate experiments.
Many times users of chaos engineering frameworks don't even have access to tools like kubectl to run a kubectl delete. This is the case for most of our users internally.
CI/CD platforms like Spinnaker can be used to run experiments. The entire lifecycle of an experiment needs to be handled in that case. Examples of integrations with CI/CD platform include Chaos Monkey, Litmus, and more.
Describe the solution you'd like
The duration of the experiment (in seconds) will be defined in the CRD. The controller sleeps for duration seconds and sends an exit signal (SIGINT/SIGTERM) to the injector pods when this is exceeded.
Describe alternatives you've considered
The alternative is to handle the lifecycle of an experiment manually but this not always possible and desirable as mentioned earlier.
The text was updated successfully, but these errors were encountered:
Hello @nikos912000 and thanks for the feedback. It is a feature that we are planning to implement soon, because of most of the reasons you mentioned. The global idea would be to have a default (but customizable for long experiments) timeout on disruptions so the disruption would expire by itself.
We monitor long running disruptions on our side and we indeed see a lot of people forgetting about an applied disruption.
No ETA yet for the feature but definitely planned as a high priority feature in our Q3 OKRs (starting July) so you can expect it to be done soon.
We have this feature in a similar controller internally so happy to provide feedback. The main difference is the architecture (we don't have an injector pod) but I think it'll work in a similar way as described in my message above.
If that helps:
We also set a default duration in the CRD
We have validation both on the CRD + server side
For certain disruptions which are high-risk (e.g. AZ failures) we set a maximum duration which is sensible in the context of the disruption.
Is your feature request related to a problem? Please describe.
Usually chaos engineering experiments run over a predetermined period of time. This has many benefits:
kubectl delete
. This is the case for most of our users internally.Describe the solution you'd like
The duration of the experiment (in seconds) will be defined in the CRD. The controller sleeps for
duration
seconds and sends an exit signal (SIGINT
/SIGTERM
) to the injector pods when this is exceeded.Describe alternatives you've considered
The alternative is to handle the lifecycle of an experiment manually but this not always possible and desirable as mentioned earlier.
The text was updated successfully, but these errors were encountered: