Skip to content
This repository was archived by the owner on Oct 24, 2023. It is now read-only.
This repository was archived by the owner on Oct 24, 2023. It is now read-only.

incluster api calls to kubernetes.default.svc from master nodes failes on multimaster deployments #622

@kosta709

Description

@kosta709

Is this a request for help?:
yes

Is this an ISSUE or FEATURE REQUEST? (choose one):
ISSUE

What version of acs-engine?:
v0.22.2

Orchestrator and version (e.g. Kubernetes, DC/OS, Swarm)
Kubernetes

What happened:
on k8s multimaster installation - "masterProfile": {"count": 3, ...}
a pod scheduled to master nodes gets sporadic errors on accessing kubernetes.default.svc (internal endpoint for k8s api) by kubectl or curl
Unable to connect to the server: dial tcp 10.0.0.1:443: i/o timeout

What you expected to happen:
accessing to kubernetes.default.svc from master nodes to be stable

How to reproduce it (as minimally and precisely as possible):
Deploy 3-master k8s cluster - "masterProfile": {"count": 3, ...} in the json
Ensure that 3 masters are up:

kubectl get nodes -l kubernetes.io/role=master
NAME                    STATUS    ROLES     AGE       VERSION
k8s-master-17552040-0   Ready     master    6d        v1.10.8
k8s-master-17552040-1   Ready     master    6d        v1.10.8
k8s-master-17552040-2   Ready     master    6d        v1.10.8

submit pod like below:

---
apiVersion: v1
kind: Pod
metadata:
  name: kubectl-test
  labels:
    app: kubectl
spec:
  containers:
  - image: lachlanevenson/k8s-kubectl:latest
    name: kubectl
    command:
      - sleep 
      - "1000000"
  tolerations:
  - effect: NoSchedule
    key: node-role.kubernetes.io/master
    operator: "Exists"
  nodeSelector:
    kubernetes.io/role: master
  hostNetwork: true
  dnsPolicy: ClusterFirstWithHostNet  

enter to pod's shell and try kubectl or curl to master

kubectl get pods -owide
NAME           READY     STATUS    RESTARTS   AGE       IP             NODE
kubectl-test   1/1       Running   0          14h       10.240.254.6   k8s-master-17552040-1
kubectl exec -it  kubectl-test sh
~ # kubectl version --short
Client Version: v1.12.1
Server Version: v1.10.8

~ # kubectl version --short
Client Version: v1.12.1
Server Version: v1.10.8

~ # kubectl version --short
Client Version: v1.12.1
Unable to connect to the server: dial tcp 10.0.0.1:443: i/o timeout

Anything else we need to know:
in acs-engine installation such requests go through internal load balance with endpoint to each master on port 4443. There is also iptables nat PREROUTING to redirect 4443 to 443 , see https://github.com/Azure/acs-engine/blob/master/parts/k8s/kubernetesmastercustomdata.yml

{{if gt .MasterProfile.Count 1}}
    # Azure does not support two LoadBalancers(LB) sharing the same nic and backend port.
    # As a workaround, the Internal LB(ILB) listens for apiserver traffic on port 4443 and the External LB(ELB) on port 443
    # This IPTable rule then redirects ILB traffic to port 443 in the prerouting chain
    iptables -t nat -A PREROUTING -p tcp --dport 4443 -j REDIRECT --to-port 443
{{end}} 

Looks like PREROUTING chain is not working then it goes to the same host there pod is running:
on my case the pod has been scheduled to k8s-master-17552040-1 , so this node cannot be accessed, but 2 others are ok, and that is why we get the error in ~1/3 occurancies:
accessing local node:

~ # curl -k https://k8s-master-17552040-1:4443
curl: (7) Failed to connect to k8s-master-17552040-1 port 4443: Connection refused 

accessing other master nodes:

~ # curl -k https://k8s-master-17552040-0:4443
{
  "kind": "Status",
.....
~ # curl -k https://k8s-master-17552040-1:4443
{
  "kind": "Status",
.....

I tried to change PREROUTING to OUTPUT on all masters iptables -t nat -A OUTPUT -p tcp --dport 4443 -j REDIRECT --to-port 443 , it fixes curl, but does not fix kubectl

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions