incluster api calls to kubernetes.default.svc from master nodes failes on multimaster deployments

**Is this a request for help?**:
yes
---

**Is this an ISSUE or FEATURE REQUEST?** (choose one):
ISSUE
---

**What version of acs-engine?**:
v0.22.2
---




**Orchestrator and version (e.g. Kubernetes, DC/OS, Swarm)**
Kubernetes

**What happened**:
on k8s multimaster installation - "masterProfile": {"count": 3, ...}
a pod scheduled to master nodes gets sporadic errors on accessing kubernetes.default.svc (internal endpoint for k8s api) by kubectl or curl  
`Unable to connect to the server: dial tcp 10.0.0.1:443: i/o timeout`

**What you expected to happen**:
accessing to kubernetes.default.svc from master nodes to be stable

**How to reproduce it** (as minimally and precisely as possible):
Deploy 3-master k8s cluster -  "masterProfile": {"count": 3, ...} in the json
Ensure that 3 masters are up:
```
kubectl get nodes -l kubernetes.io/role=master
NAME                    STATUS    ROLES     AGE       VERSION
k8s-master-17552040-0   Ready     master    6d        v1.10.8
k8s-master-17552040-1   Ready     master    6d        v1.10.8
k8s-master-17552040-2   Ready     master    6d        v1.10.8
```

submit pod like below:
```
---
apiVersion: v1
kind: Pod
metadata:
  name: kubectl-test
  labels:
    app: kubectl
spec:
  containers:
  - image: lachlanevenson/k8s-kubectl:latest
    name: kubectl
    command:
      - sleep 
      - "1000000"
  tolerations:
  - effect: NoSchedule
    key: node-role.kubernetes.io/master
    operator: "Exists"
  nodeSelector:
    kubernetes.io/role: master
  hostNetwork: true
  dnsPolicy: ClusterFirstWithHostNet  
```
enter to pod's shell and try kubectl or curl to master
```
kubectl get pods -owide
NAME           READY     STATUS    RESTARTS   AGE       IP             NODE
kubectl-test   1/1       Running   0          14h       10.240.254.6   k8s-master-17552040-1
kubectl exec -it  kubectl-test sh
```
```
~ # kubectl version --short
Client Version: v1.12.1
Server Version: v1.10.8

~ # kubectl version --short
Client Version: v1.12.1
Server Version: v1.10.8

~ # kubectl version --short
Client Version: v1.12.1
Unable to connect to the server: dial tcp 10.0.0.1:443: i/o timeout
```

**Anything else we need to know**:
in acs-engine installation such requests go through internal load balance with endpoint to each master on port 4443. There is also iptables nat PREROUTING to redirect 4443 to 443 , see https://github.com/Azure/acs-engine/blob/master/parts/k8s/kubernetesmastercustomdata.yml
```
{{if gt .MasterProfile.Count 1}}
    # Azure does not support two LoadBalancers(LB) sharing the same nic and backend port.
    # As a workaround, the Internal LB(ILB) listens for apiserver traffic on port 4443 and the External LB(ELB) on port 443
    # This IPTable rule then redirects ILB traffic to port 443 in the prerouting chain
    iptables -t nat -A PREROUTING -p tcp --dport 4443 -j REDIRECT --to-port 443
{{end}} 
```

Looks like PREROUTING chain is not working then it goes to the same host there pod is running:
on my case the pod has been scheduled to k8s-master-17552040-1 , so this node cannot be accessed, but 2 others are ok, and that is why we get the error in ~1/3 occurancies:
accessing local node:
```
~ # curl -k https://k8s-master-17552040-1:4443
curl: (7) Failed to connect to k8s-master-17552040-1 port 4443: Connection refused 
```
accessing other master nodes:
```
~ # curl -k https://k8s-master-17552040-0:4443
{
  "kind": "Status",
.....
~ # curl -k https://k8s-master-17552040-1:4443
{
  "kind": "Status",
.....
```

I tried to change PREROUTING to OUTPUT on all masters  `iptables -t nat -A OUTPUT -p tcp --dport 4443 -j REDIRECT --to-port 443` , it fixes curl, but does not fix kubectl

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

incluster api calls to kubernetes.default.svc from master nodes failes on multimaster deployments #622

Is this a request for help?:
yes

Is this an ISSUE or FEATURE REQUEST? (choose one):
ISSUE

What version of acs-engine?:
v0.22.2

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

incluster api calls to kubernetes.default.svc from master nodes failes on multimaster deployments #622

Description

Is this a request for help?: yes

Is this an ISSUE or FEATURE REQUEST? (choose one): ISSUE

What version of acs-engine?: v0.22.2

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Is this a request for help?:
yes

Is this an ISSUE or FEATURE REQUEST? (choose one):
ISSUE

What version of acs-engine?:
v0.22.2