-
Notifications
You must be signed in to change notification settings - Fork 450
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[provider-local] VPN tunnel check succeeds even if VPN is broken #9604
Comments
This issue does also affect non-HA scenarios in the local setup. As there is no node range defined for shoots in the local setup the network connectivity will be the following for the VPN check in the reconciliation:
In real scenarios, it should be like this:
Fixing this can prevent regressions, but there are also validations in place preventing shoot/seed network overlaps, which may make this somewhat challenging. |
You're right. In the non-HA scenario, kube-apiserver will always connect to vpn-seed-server because of the
We can verify this route by breaking the VPN connection on the shoot-side this time:
|
To summarize: ProblemIn the provider-local setup, the VPN tunnel check performed by gardenlet (port-forward check) does not detect a broken VPN tunnel, because either kube-apiserver (HA clusters) or vpn-seed-server (non-HA clusters) route requests to the kubelet API directly via the seed's pod network. We should strive towards resolving this discrepancy between the local setup and cloud setups regarding the VPN connection to prevent bugs by validating the real setup in e2e tests. Proposal1. Set
|
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
/assign @rfranzke @timebertt |
How to categorize this issue?
/area networking testing
/kind bug
What happened:
In the provider-local HA setup (tested with single-zone but should also apply to multi-zone), kube-apiserver talks directly to the kubelet API instead of using the VPN connection.
With this, operations like
kubectl logs
andkubectl port-forward
(for which the kubelet API is called by kube-apiserver) work even if the VPN connection is broken.As the VPN tunnel check performed by gardenlet uses a port-forward operation (code), the shoot can be reconciled successfully and be marked as healthy even if the VPN connection is broken.
This problem might cause bugs and regressions in the VPN setup to go unnoticed.
E.g., in #9597 there was a problem in the HA VPN configuration (fixed in a later commit).
Nevertheless, most test cases of
pull-gardener-e2e-kind-ha-{single,multi}-zone
succeeded. I.e., shoot creations were successful although the VPN connection was never working.The problem was only discovered by chance in the credentials rotation test case (ref).
What you expected to happen:
If the VPN connection cannot be established successfully
How to reproduce it (as minimally and precisely as possible):
make kind-ha-single-zone-up gardener-ha-single-zone-up
example/provider-local/shoot.yaml
kubectl apply -f example/provider-local/shoot.yaml
k -n kube-system logs deploy/metrics-server --request-timeout 2s
k -n kube-system port-forward svc/metrics-server 8443:443 --request-timeout 2s
k top no
k -n shoot--local--local scale sts vpn-seed-server --replicas 0
k -n shoot--local--local delete po -l role=apiserver
logs
andport-forward
work, while connection to the metrics-server (k top no
) doesn't work.Anything else we need to know?:
This only applies to HA clusters, where routes to the shoot networks are configured explicitly in the kube-apiserver pods.
For non-HA clusters, there is an
EgressSelectorConfiguration
that connects to theenvoy-proxy
container in thevpn-seed-server
usingHTTPConnect
instead of using explicitly configured IP routes.E.g.:
Note, that there is no route for the shoot node network. This is because
Shoot.spec.networking.nodes
is empty, as is overlaps withSeed.spec.networks.pods
(provider-local starts pods in the seed as shoot nodes).Hence, kube-apiserver can talk directly to the kubelet API via the seed pod network.
There are even multiple mechanisms for allowing this direct communication path from kube-apiserver to kubelet:
allow-machine-pods
NetworkPolicy
:gardener/pkg/provider-local/controller/infrastructure/actuator.go
Lines 61 to 65 in 70fe495
machines
Service
: https://github.com/gardener/machine-controller-manager-provider-local/blob/aa28b3aede72b45440183187c23db89ea76840d5/pkg/local/create_machine.go#L67-L86networking.resources.gardener.cloud/to-machines-tcp-10250=allowed
label tokube-apiserver
:gardener/pkg/provider-local/webhook/controlplane/ensurer.go
Line 88 in 70fe495
To verify that kube-apiserver of local HA shoots talks directly to the kubelet API, use the following steps:
k -n shoot--local--local delete netpol allow-machine-pods
k -n shoot--local--local delete svc machines
k -n shoot--local--local delete po -l role=apiserver
logs
andport-forward
don't work (don't use the VPN connection), while connection to the metrics-server (k top no
) works (uses the working VPN connection).port-forward
operation doesn't work.Environment:
v1.93.0-dev
The text was updated successfully, but these errors were encountered: