-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rfc: Network Chaos on Kubernetes Service #36
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: dengruilong <dengruilong@zuoyebang.com>
- It is reasonably clear how the feature would be implemented. | ||
- Corner cases are dissected by example. | ||
- How the feature is used. --> | ||
I still have no specific detailed design for this feature, but have a rough proposal like this: moving the network chaos experiment from Pods to Nodes, transparently transmit the Pods's info(like ip, namespace, appname, etc) to the Nodes, and add some tc/iptable rules based on Nodes according to the specified namespace/appname. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we could apply tc/iptables on node network namespace for Service NetworkChaos.
The Linux network namespace is the basic way to keep isolation and control the blast radius, I prefer to keep using pods' network namespace for Pod NetworkChaos.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This way for ClusterIP
and in-cluster network traffic sounds great. Another concerning thing is the networking path for a LoadBalancer
service and ClusterIP
might be not exactly the same.
I have no idea about how cloud providers treat the LoadBalancer
service. I am very glad if you could provide more introduction about that @dengruilong . ❤️
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@STRRL As we have discussed on the last weekly meeting, the Service
should only be allowed for the destination of the connection (e.g. the target selector for direction `to). The network path doesn't matter. Because in the view of the source pod, it only sees the target IP, no matter how it reaches the other end.
The only task for us is to get the IP of the service. For LoadBalancer
, it seems that the .status.loadBalancer.ingress[].ip
and port
is faithful, and we could also add the .spec.clusterIP
(if they are not nil).
Another question, I wonder whether the service discovering works for the LoadBalancer
service and which IP it will return.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@STRRL As we have discussed on the last weekly meeting, the Service should only be allowed for the destination of the connection (e.g. the target selector for direction `to).
That could be the first stage for the feature "NetworkChoas on Kubernetes Service".
I wonder if we could do much more to inject chaos into the "Kubernetes Service" not only network traffic in the Kubernetes cluster. For example, an "external service"(not located in Kubernetes Cluster) could access this LoadBalancer
service, also affected by Chaos Mesh NetworkChaos
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not only network traffic in the Kubernetes cluster
It seems we could do that if we modify the tc/iptable
rules on Node.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another question, I wonder whether the service discovering works for the LoadBalancer service and which IP it will return.
"Normal" (not headless) Services are assigned a DNS A or AAAA record, depending on the IP family of the service, for a name of the form my-svc.my-namespace.svc.cluster-domain.example. This resolves to the cluster IP of the Service.
As the document says, only ClusterIP is introduced. And it does act as that document says(on my local minikube cluster). I did not dig into the codes of coredns kubernetes plugin, not sure about there is no "magic" in it about resolving external IP. :P
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This way for
ClusterIP
and in-cluster network traffic sounds great. Another concerning thing is the networking path for aLoadBalancer
service andClusterIP
might be not exactly the same.I have no idea about how cloud providers treat the
LoadBalancer
service. I am very glad if you could provide more introduction about that @dengruilong . ❤️
We didn't use cloud providers's LoadBalancer, because it usually depended on the provides' difference and hard to work as a universal solution. Instead, we used Ingress as north-to-south communication, and used service (in fact, this service worked as a IPVS, which has high performance in kernel level and highly used by the K8s industry) as horizon communication.
Hope above info can help you clear your concerns. Many thanks ❤️
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any update for the service injection design? @STRRL
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no more new outcomes yet🤔
This feature is not in chaos-mesh/chaos-mesh#2608. Maybe it needs more time to consider.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you have interest in helping us with the design and implementation, PR is welcome! ❤️
Hi @dengruilong, I noticed that the "detailed design" part is not enough properly described. I think we could help you to enrich this part. Would you mind us editing this RFC directly in the future? |
@STRRL Glad to hear that. It will be great if you can help enriching the detailed design. |
CHAOS-MESH is an excellent K8S failure injection tool than other products, but under the trend of cloudy and mixed clouds, the fault injection tool in the K8S cluster can not meet the business disaster tolerance needs. The failure injection ability must surpass the competition and become a great fault injection product. |
We need this feat very much. |
不错不错 |
@STRRL Another (rough) thought. Will ingress network chaos works like a charm in this situation 🤔 ? If the (Though, I agree supporting service as the target is much more straightforward and intuitive.) |
这是来自QQ邮箱的假期自动回复邮件。
您好,我最近正在休假中,无法亲自回复您的邮件。我将在假期结束后,尽快给您回复。
|
No description provided.