Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

User Request: Support for duration #350

Closed
nikos912000 opened this issue Jun 23, 2021 · 2 comments
Closed

User Request: Support for duration #350

nikos912000 opened this issue Jun 23, 2021 · 2 comments
Labels
enhancement New feature or request

Comments

@nikos912000
Copy link
Contributor

Is your feature request related to a problem? Please describe.
Usually chaos engineering experiments run over a predetermined period of time. This has many benefits:

  • Users don't need to terminate the experiments manually.
  • It acts as a safety net; it is fairly easy to forget to terminate experiments.
  • Many times users of chaos engineering frameworks don't even have access to tools like kubectl to run a kubectl delete. This is the case for most of our users internally.
  • CI/CD platforms like Spinnaker can be used to run experiments. The entire lifecycle of an experiment needs to be handled in that case. Examples of integrations with CI/CD platform include Chaos Monkey, Litmus, and more.

Describe the solution you'd like
The duration of the experiment (in seconds) will be defined in the CRD. The controller sleeps for duration seconds and sends an exit signal (SIGINT/SIGTERM) to the injector pods when this is exceeded.

Describe alternatives you've considered
The alternative is to handle the lifecycle of an experiment manually but this not always possible and desirable as mentioned earlier.

@Devatoria
Copy link
Contributor

Hello @nikos912000 and thanks for the feedback. It is a feature that we are planning to implement soon, because of most of the reasons you mentioned. The global idea would be to have a default (but customizable for long experiments) timeout on disruptions so the disruption would expire by itself.

We monitor long running disruptions on our side and we indeed see a lot of people forgetting about an applied disruption.

No ETA yet for the feature but definitely planned as a high priority feature in our Q3 OKRs (starting July) so you can expect it to be done soon.

@nikos912000
Copy link
Contributor Author

nikos912000 commented Jun 23, 2021

Awesome, thanks @Devatoria.

We have this feature in a similar controller internally so happy to provide feedback. The main difference is the architecture (we don't have an injector pod) but I think it'll work in a similar way as described in my message above.

If that helps:

  • We also set a default duration in the CRD
  • We have validation both on the CRD + server side
  • For certain disruptions which are high-risk (e.g. AZ failures) we set a maximum duration which is sensible in the context of the disruption.

@ptnapoleon ptnapoleon added the enhancement New feature or request label Jun 23, 2021
@DataDog DataDog locked and limited conversation to collaborators Jul 29, 2021

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants