Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chaos Engineering as a Service #1462

Open
WangXiangUSTC opened this issue Jan 28, 2021 · 5 comments
Open

Chaos Engineering as a Service #1462

WangXiangUSTC opened this issue Jan 28, 2021 · 5 comments
Labels
type/enhancement Request for an improvement.

Comments

@WangXiangUSTC
Copy link
Contributor

WangXiangUSTC commented Jan 28, 2021

Feature Request

Is your feature request related to a problem? Please describe:

The following problems currently exist in Chaos Mesh:

  • Poor observability: the result of chaos experiments are not easy to observe and judge, the users need to check whether the Chaos effects by manual.
  • Chaosd(for physic node) is too simple: only supports command line operation, does not support task scheduling and life cycle management.
  • The costs of learning operation and maintenance are high: the maintenance of Chaos Mesh and Chaosd are not unified.

And Chaos Mesh is not like Chaos Engineering as a Service.

Describe the feature you'd like:

There is a unified place to manage Chaos experiments for multiple platforms and multiple clusters, and you can see the monitoring data of the experiment.

The new architecture may look like this:
image

@WangXiangUSTC WangXiangUSTC added the type/enhancement Request for an improvement. label Jan 28, 2021
@shivanshs9
Copy link
Contributor

Hey @WangXiangUSTC, I'd like to work on these issues.

Poor observability: the result of chaos experiments are not easy to observe and judge, the users need to check whether the Chaos effects by manual.

From what I understand, Prometheus integration (as described in your proposed new architecture) could help for that.

I'm not yet clear about the other two issues. Could you please elaborate on those?

@WangXiangUSTC
Copy link
Contributor Author

Welcome! Yes, we may use prometheus to save the metrics, and another point is how to collect metrics. For example, how to collect the network latency when injecting Network Chaos, and how to get the CPU/Memory usage when injecting Stress Chaos?

The other two issues are focus on implementing a unified manager(Dashboard(manager) in the architecture picture ) to manage chaos experiments on multiple k8s cluster and multiple platforms. @shivanshs9

@WangXiangUSTC
Copy link
Contributor Author

@shivanshs9
Copy link
Contributor

The other two issues are focus on implementing a unified manager(Dashboard(manager) in the architecture picture ) to manage chaos experiments on multiple k8s cluster and multiple platforms.

Ah, got it. So Dashboard will be a unified manager for K8S cluster and physic node. And each cluster will have its own controller operator to manage the CRDs lifecycle.

This issue has been taken into CNCF Community Bridge. cncf/mentoring@master/lfx-mentorship/2021/01-Spring/project_ideas.md#chaos-engineering-as-a-service

Cool, I need to apply for this project on Linux Foundation then, right?

@WangXiangUSTC
Copy link
Contributor Author

The other two issues are focus on implementing a unified manager(Dashboard(manager) in the architecture picture ) to manage chaos experiments on multiple k8s cluster and multiple platforms.

Ah, got it. So Dashboard will be a unified manager for K8S cluster and physic node. And each cluster will have its own controller operator to manage the CRDs lifecycle.

This issue has been taken into CNCF Community Bridge. cncf/mentoring@master/lfx-mentorship/2021/01-Spring/project_ideas.md#chaos-engineering-as-a-service

Cool, I need to apply for this project on Linux Foundation then, right?

Yeah, need to apply for it 😸

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/enhancement Request for an improvement.
Projects
None yet
Development

No branches or pull requests

2 participants