Skip to content

Commit

Permalink
[CPDEV-94547] Rework network connectivity check and implement VRRP IP…
Browse files Browse the repository at this point in the history
… check (#551)

* Clarify requirements for opened TCP ports

* Major rework in network connectivity checks

1. Clarified listened and target ports for each node role.
2. Added UDP ports check (53 CoreDNS)
3. Connect each to each node.
4. Do not skip check if some node does not have python, or some ports are already listened.
5. Make connection attempts in parallel.
6. Only connect to listened ports or container ports instead of sending random stream of bytes.

* Implement checking connectivity to VRRP IP

* Update Installation.md

* Fix floating issue at port listeners startup

If the process is exited just after first failed attempt to read from pipe, the output is not checked and the connectivity check is terminated with error.

The solution is to check if the process is exited first.

* Calico Typha can be deployed on control planes

projectcalico/calico#7979

---------

Co-authored-by: Shoaib Mohammed <94443646+shmo1218@users.noreply.github.com>
  • Loading branch information
ilia1243 and shmo1218 committed Nov 28, 2023
1 parent 4993e65 commit 934a02e
Show file tree
Hide file tree
Showing 10 changed files with 743 additions and 295 deletions.
25 changes: 14 additions & 11 deletions documentation/Installation.md
Expand Up @@ -181,30 +181,33 @@ The actual information about the supported versions can be found in `compatibili
* Opened TCP-ports:
* Internal communication:
* 22 : SSH
* 53 : CoreDNS (TCP & UDP), if access is needed from services bound to the host network.
* 80 (or 20080, if balancers are presented): HTTP
* 179 : Calico BGP
* 443 (or 20443, if balancers are presented): HTTPS
* 5473 : Calico netowrking with Typha enabled
* 5443 : Calico API server, if enabled
* 5473 : Calico networking with Typha enabled
* 6443 : Kubernetes API server
* 8443 : Kubernetes dashboard
* 8443 : Ingress NGINX validating webhook
* 2379-2380 : ETCD server & client API
* 9091 - Calico metric port
* 9093 - Calico Typha metric port
* 9094 - Calico kube-controller metric port
* 9091 : Calico metrics port
* 9093 : Calico Typha metrics port
* 10250 : Kubelet API
* 10257 : Kube-scheduler
* 10259 : Kube-controller-manager
* 10254 : Prometheus port
* 30000-32767 : NodePort Services
* Other ports if communication happens between services with any participant bound to the host network.
* External communication:
* 22 : SSH, if you use external nodes' IP addresses to deploy the cluster.
* 80
* 443
* 6443 : Kubernetes API server, if necessary to access externally. For example, using the [helm](#helm) plugin.
* Internal network bandwidth not less than 1GBi/s.
* Dedicated internal address, IPv4, and IPv6 are supported as well, for each VM.
* Any network security policies are disabled or whitelisted. This is especially important for OpenStack environments.
* Traffic is allowed for pod subnet. Search for address at`services.kubeadm.networking.podSubnet`. By default, `10.128.0.0/14` for IPv4 or `fd02::/48` for IPv6.
* Traffic is allowed for service subnet. Search for address at `services.kubeadm.networking.serviceSubnet`. By default `172.30.0.0/16` for IPv4 or `fd03::/112` for IPv6).
* Traffic to/from podSubnet, serviceSubnet is allowed inside the cluster.
* Traffic is allowed for **Opened TCP-ports** between the nodes inside the cluster's internal subnet.
* TCP & UDP traffic is allowed for pod subnet between the nodes inside the cluster.
Search for address at `services.kubeadm.networking.podSubnet`. By default, `10.128.0.0/14` for IPv4 or `fd02::/48` for IPv6.
* TCP & UDP traffic is allowed for service subnet between the nodes inside the cluster.
Search for address at `services.kubeadm.networking.serviceSubnet`. By default, `172.30.0.0/16` for IPv4 or `fd03::/112` for IPv6.

**Warning**: `Kubemarine` works only with `firewalld` as an IP firewall, and switches it off during the installation.
If you have other solution, remove or switch off the IP firewall before the installation.
Expand Down
34 changes: 24 additions & 10 deletions documentation/Kubecheck.md
Expand Up @@ -22,9 +22,11 @@ This section provides information about the Kubecheck functionality.
- [007 RAM Amount - Control-planes](#007-ram-amount---control-planes)
- [007 RAM Amount - Workers](#007-ram-amount---workers)
- [008 Distributive](#008-distributive)
- [009 PodSubnet](#009-podsubnet)
- [010 ServiceSubnet](#010-servicesubnet)
- [011 TCPPorts](#011-tcpports)
- Network
- [009 PodSubnet](#009-podsubnet)
- [010 ServiceSubnet](#010-servicesubnet)
- [011 TCP & UDP Ports](#011-tcp--udp-ports)
- [016 VRRP IPs](#016-vrrp-ips)
- [012 Thirdparties Availability](#012-thirdparties-availability)
- [013 Package Repositories](#013-package-repositories)
- [014 Package Availability](#014-package-availability)
Expand Down Expand Up @@ -156,8 +158,8 @@ The task tree is as follows:
* network
* pod_subnet_connectivity
* service_subnet_connectivity
* check_tcp_ports
* thirdparties_available
* ports_connectivity
* vips_connectivity
* hardware
* members_amount
* vips
Expand All @@ -175,8 +177,14 @@ The task tree is as follows:
* workers
* system
* distributive
* thirdparties
* availability
* software
* kernel
* version
* thirdparties
* availability
* packages
* repositories
* availability

##### 001 Connectivity

Expand Down Expand Up @@ -303,11 +311,17 @@ This test checks the connectivity between nodes inside a pod's subnetwork.

This test checks the connectivity between nodes inside the service's subnetwork.

##### 011 TCPPorts
##### 011 TCP & UDP Ports

*Task*: `network.ports_connectivity`

This test checks the connectivity between nodes for the predefined set of ports inside the nodes' internal subnetwork.

##### 016 VRRP IPs

*Task*: `network.check_tcp_ports`
*Task*: `network.vips_connectivity`

This test checks if necessary ports are opened on the nodes.
This test checks the connectivity between nodes and the VRRP IPs when they are assigned to the balancer nodes.

##### 012 Thirdparties Availability

Expand Down
4 changes: 4 additions & 0 deletions kubemarine/core/cluster.py
Expand Up @@ -111,6 +111,10 @@ def get_addresses_from_node_names(self, node_names: List[str]) -> List[str]:
def get_node(self, host: _AnyConnectionTypes) -> NodeConfig:
return self.make_group([host]).get_config()

def get_node_name(self, host: _AnyConnectionTypes) -> str:
name: str = self.get_node(host)['name']
return name

def make_group_from_nodes(self, node_names: List[str]) -> NodeGroup:
ips = self.get_addresses_from_node_names(node_names)
return self.make_group(ips)
Expand Down
3 changes: 3 additions & 0 deletions kubemarine/core/executor.py
Expand Up @@ -121,6 +121,9 @@ def repr_out(self, *, hide_already_printed: bool = False) -> str:
f"{val.rstrip()}\n")
return "\n".join(ret)

def grep_returned_nothing(self) -> bool:
return not self.stdout and not self.stderr and self.exited == 1


class UnexpectedExit(Exception):
def __init__(self, result: RunnersResult):
Expand Down
9 changes: 4 additions & 5 deletions kubemarine/core/group.py
Expand Up @@ -793,7 +793,7 @@ def flush(self) -> None:
class GroupException(Exception):
def __init__(self, cluster: object, results: Dict[str, List[GenericResult]]):
self.cluster = cluster
self._results = results
self.results = results

def _make_group(self, hosts: Iterable[str]) -> NodeGroup:
return NodeGroup(hosts, self.cluster)
Expand All @@ -813,7 +813,7 @@ def get_excepted_hosts_list(self) -> List[str]:
:return: List with hosts
"""
excepted_hosts: List[str] = []
for host, results in self._results.items():
for host, results in self.results.items():
if any(isinstance(result, Exception) for result in results):
excepted_hosts.append(host)
return excepted_hosts
Expand All @@ -834,7 +834,7 @@ def get_exited_hosts_list(self) -> List[str]:
:return: List with hosts
"""
exited_hosts: List[str] = []
for host, results in self._results.items():
for host, results in self.results.items():
if all(isinstance(result, RunnersResult) for result in results):
exited_hosts.append(host)
return exited_hosts
Expand All @@ -853,7 +853,7 @@ def __str__(self) -> str:
# for the reason that the user code might want to print output of some commands in the batch,
# but failed to do that because of the exception.
host_outputs = []
for host, results in self._results.items():
for host, results in self.results.items():
output = f"{host}:"

# filter out transfer results and the last exception if present
Expand Down Expand Up @@ -884,4 +884,3 @@ def __init__(self, result: GenericGroupResult[GenericResult]):
class RemoteGroupException(GroupException):
def __init__(self, cluster: object, results: Dict[str, TokenizedResult]):
super().__init__(cluster, {host: list(res.values()) for host, res in results.items()})
self.results = results

0 comments on commit 934a02e

Please sign in to comment.