-
Notifications
You must be signed in to change notification settings - Fork 830
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Debug information collector for chaos-mesh #694
Comments
|
I support this feature. We can create a single command tool to help us to collect the debug information. |
|
Hi @YangKeao I tried to get debug info of NetworkChaos and StressChaos, following your steps. But for me, the project still looks like a script, like to use one command instead of several ones. Like for The only problem with this is I can't see the extensibility of it... I think I need some guidance for this problem. Or could you provide some specific examples that could not write a script, like what could not been get from ctr |
|
Hi @cwen0 @YangKeao, since we are running daemonset with privileged mode and if I understood it correctly, the examples you mentioned (NetworkChaos, StressChaos and IOChaos) are also executed by the daemonset, then is it possible to have the daemonset collect the debug data we need and stream back to the controller through gRPC? I believe this way is extendable and could be helpful if we decide to expose the debug info on the dashboard to provide the users with a one-stop service. |
|
@namco1992 Integrating with dashboard is really a great idea. That would be much easier to use, especially for complicated situations like one chaos affecting multiple nodes. Also using daemonset to collect debug info is quite promising, since from what we need right now, debug info is usually the config of commands executed by daemonset, so it's very easy to collect them. Actually I don't think we need to transfer the data to controller, we just need to save the lastest one to somewhere and ready to use them.
|
|
Hello @Yiyiyimu, I think probably the debug mode doesn't have to be always on and most of the time could be just a one-off data collection for getting a snapshot of current status. Another approach could be just return a stream when calling One of the use cases I can think of is that for running StressChaos, currently, we just use IMO it's important to have an all-in-one dashboard for better user experience, but I'm happy to discuss more on this. I just feel it could be a good opportunity to have a standard data collection interface and it might be somewhat related to the |
|
@namco1992 I don't think streaming debug information (like logs or statistic data) back to controller-manager is a good choice for this issue, because:
IMO, in the first stage of debug collector, we only need a simple tool to collect these information (for a chaos resource) and print it to stdout or dashboard, so that the user can C-c & C-v it to the github issues 😸 . In the future, it may diagnostic chaos-mesh automatically, but I think it's too early to consider it here. |
|
@Yiyiyimu The extensibility means the developers could extend the collector simply, when he add a new kind of chaos resource. |
|
@YangKeao Yes I agree with you, regarding the scope of this issue, streaming back the debug info is not the ideal way. I was trying to shove the live status and reporting into this scenario and it might not be a good idea. 😅 |
|
This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 15 days |
Feature Request
Nowadays it's really hard to collect debug information for some chaos. For example, the user should follow below four steps to collect debug information #688 :
And it's also hard to write a script to collect the debug information, as there are several nodes affected by one chaos, and some information cannot be collected from CLI (the cli of
containerdis really feature-poor 😭 ).So we need a process to help us collect related debug information. This process can be shipped as a standalone executable file or embedded in controller-manager image. It can be accessed by running directly (with enough priviledge) or
kubectl exec.Here is a list of information we needed for different kinds of chaos:
NetworkChaos. iptables, ipsets and tc qdisc/filter configs and related
podnetworkchaos.StressChaos: cgroup configs for stress-ng and target process/container.
IoChaos: mounts information
This list may become richer with the progress of development, so the extensibility should be considered carefully.
The text was updated successfully, but these errors were encountered: