Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

双网卡下,coredns,dashbord,metrics-server不能访问kube-apiserver #479

Closed
4220182 opened this issue Mar 12, 2019 · 7 comments
Closed

Comments

@4220182
Copy link

4220182 commented Mar 12, 2019

版本:kubeasz-1.0.0rc1

主机网络配置:
master:
内网: 10.2.2.120
外网: 192.168.5.120
gateway: 192.168.5.1

node1 :
内网: 10.2.2.121
外网: 192.168.5.121
gateway: 192.168.5.1

node2 :
内网: 10.2.2.122
外网: 192.168.5.122
gateway: 192.168.5.1

采用 example/hosts.s-master.example 方式安装(1个master,2个node)

# cat hosts 
# 集群部署节点:一般为运行ansible 脚本的节点
# 变量 NTP_ENABLED (=yes/no) 设置集群是否安装 chrony 时间同步
[deploy]
10.2.2.120 NTP_ENABLED=no

# etcd集群请提供如下NODE_NAME,请注意etcd集群必须是1,3,5,7...奇数个节点
[etcd]
10.2.2.120 NODE_NAME=etcd1

[kube-master]
10.2.2.120

[kube-node]
10.2.2.121
10.2.2.122

# 参数 NEW_INSTALL:yes表示新建,no表示使用已有harbor服务器
# 如果不使用域名,可以设置 HARBOR_DOMAIN=""
[harbor]
#192.168.1.8 HARBOR_DOMAIN="harbor.yourdomain.com" NEW_INSTALL=no

#【可选】外部负载均衡,用于自有环境负载转发 NodePort 暴露的服务等
[ex-lb]
#192.168.1.6 LB_ROLE=backup EX_VIP=192.168.1.250
#192.168.1.7 LB_ROLE=master EX_VIP=192.168.1.250

[all:vars]
# ---------集群主要参数---------------
#集群部署模式:allinone, single-master, multi-master
DEPLOY_MODE=single-master

#集群 MASTER IP,自动生成
MASTER_IP="{{ groups['kube-master'][0] }}"
KUBE_APISERVER="https://{{ MASTER_IP }}:6443"

# 集群网络插件,目前支持calico, flannel, kube-router, cilium
CLUSTER_NETWORK="flannel"

# 服务网段 (Service CIDR),注意不要与内网已有网段冲突
SERVICE_CIDR="10.68.0.0/16"

# POD 网段 (Cluster CIDR),注意不要与内网已有网段冲突
CLUSTER_CIDR="172.20.0.0/16"

# 服务端口范围 (NodePort Range)
NODE_PORT_RANGE="20000-40000"

# kubernetes 服务 IP (预分配,一般是 SERVICE_CIDR 中第一个IP)
CLUSTER_KUBERNETES_SVC_IP="10.68.0.1"

# 集群 DNS 服务 IP (从 SERVICE_CIDR 中预分配)
CLUSTER_DNS_SVC_IP="10.68.0.2"

# 集群 DNS 域名
CLUSTER_DNS_DOMAIN="cluster.local."

# 集群basic auth 使用的用户名和密码 (运行时会生成随机密码)
BASIC_AUTH_USER="admin"
BASIC_AUTH_PASS="51942f94bc136c5d"

# ---------附加参数--------------------
#默认二进制文件目录
bin_dir="/opt/kube/bin"

#证书目录
ca_dir="/etc/kubernetes/ssl"

#部署目录,即 ansible 工作目录
base_dir="/etc/ansible"

查看coredns,dashbord,metrics-server 的日志:

# kubectl get po -o wide --all-namespaces=true
NAME                                    READY   STATUS             RESTARTS   AGE     IP           NODE         NOMINATED NODE   READINESS GATES
coredns-dc8bbbcf9-4rsfl                 0/1     CrashLoopBackOff   18         55m     172.20.1.5   10.2.2.121   <none>           <none>
coredns-dc8bbbcf9-7rz2p                 0/1     CrashLoopBackOff   18         55m     172.20.2.4   10.2.2.122   <none>           <none>
kubernetes-dashboard-6685cb584f-nvc8p   0/1     CrashLoopBackOff   20         55m     172.20.2.5   10.2.2.122   <none>           <none>
metrics-server-79558444c6-gtt4t         0/1     CrashLoopBackOff   6          9m27s   172.20.1.6   10.2.2.121   <none>           <none>

# route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
0.0.0.0         192.168.5.1     0.0.0.0         UG    100    0        0 enp0s3
10.2.2.0        0.0.0.0         255.255.255.0   U     101    0        0 enp0s8
172.17.0.0      0.0.0.0         255.255.0.0     U     0      0        0 docker0
172.20.1.0      172.20.1.0      255.255.255.0   UG    0      0        0 flannel.1
172.20.2.0      172.20.2.0      255.255.255.0   UG    0      0        0 flannel.1
192.168.5.0     0.0.0.0         255.255.255.0   U     100    0        0 enp0s3

# kubectl logs metrics-server-79558444c6-56qmd -n kube-system
panic: Get https://10.68.0.1:443/api/v1/namespaces/kube-system/configmaps/extension-apiserver-authentication: dial tcp 10.68.0.1:443: connect: connection refused

goroutine 1 [running]:
main.main()
	/go/src/github.com/kubernetes-incubator/metrics-server/cmd/metrics-server/metrics-server.go:39 +0x13b

# kubectl logs kubernetes-dashboard-6685cb584f-nvc8p -n kube-system
2019/03/11 13:19:06 Starting overwatch
2019/03/11 13:19:06 Using in-cluster config to connect to apiserver
2019/03/11 13:19:06 Using service account token for csrf signing
2019/03/11 13:19:06 No request provided. Skipping authorization
2019/03/11 13:19:06 Error while initializing connection to Kubernetes apiserver. This most likely means that the cluster is misconfigured (e.g., it has invalid apiserver certificates or service account's configuration) or the --apiserver-host param points to a server that does not exist. Reason: Get https://10.68.0.1:443/version: dial tcp 10.68.0.1:443: getsockopt: connection refused
Refer to our FAQ and wiki pages for more information: https://github.com/kubernetes/dashboard/wiki/FAQ

# kubectl logs coredns-dc8bbbcf9-7rz2p -n kube-system

E0311 13:46:20.106731       1 reflector.go:205] github.com/coredns/coredns/plugin/kubernetes/controller.go:318: Failed to list *v1.Namespace: Get https://10.68.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0: dial tcp 10.68.0.1:443: connect: connection refused

检查iptables:

# iptables-save |grep KUBE-SEP-VPBSGNC2TAY6H4RC

-A KUBE-SERVICES -d 10.68.0.1/32 -p tcp -m comment --comment "default/kubernetes:https cluster IP" -m tcp --dport 443 -j KUBE-SVC-NPX46M4PTMTKRN6Y
:KUBE-SEP-VPBSGNC2TAY6H4RC - [0:0]
-A KUBE-SEP-VPBSGNC2TAY6H4RC -s 192.168.5.120/32 -j KUBE-MARK-MASQ
-A KUBE-SEP-VPBSGNC2TAY6H4RC -p tcp -m tcp -j DNAT --to-destination 192.168.5.120:6443
-A KUBE-SVC-NPX46M4PTMTKRN6Y -j KUBE-SEP-VPBSGNC2TAY6H4RC

这条规则 --to-destination 192.168.5.120:6443 为啥会是外网的ip?我猜是生成iptables规则错了,导致访问不到kube-apiserver 。

而kube-apiserver 绑定的ip和端口是 :10.2.2.120:6443

# netstat -anp |grep LISTEN |grep 6443
tcp        0      0 10.2.2.120:6443         0.0.0.0:*               LISTEN      10996/kube-apiserve

请问如何解决这个问题?

@weilinqwe
Copy link
Contributor

weilinqwe commented Mar 12, 2019

今天我正好碰到这个问题。 这个主要是你网关落在外网卡的问题。 kubectl describe svc kubernetes ,是不是endpoint里apiserver都是外网地址,而实际apiserver是listen在内网地址的。导致集群内部pod(包括coredns)访问不了10.68.0.1:443。 原因是apisever启动时会通过识别gateway来识别地址。 解决办法:把admin节点的网关设成内网网关,重启kube-apiserver即可。

@4220182
Copy link
Author

4220182 commented Mar 12, 2019

谢谢 @weilinqwe
我按照你的方法,查了一些 kubernetes 的endpoint 真是落在外网地址上:

[root@k8s-master ansible]# kubectl get svc kubernetes
NAME         TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)   AGE
kubernetes   ClusterIP   10.68.0.1    <none>        443/TCP   19h
[root@k8s-master ansible]# kubectl get ep kubernetes
NAME         ENDPOINTS            AGE
kubernetes   192.168.5.120:6443   19h

然后我更改了default gw 为内网网关:

# route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
0.0.0.0         10.2.2.1        0.0.0.0         UG    0      0        0 enp0s8
10.2.2.0        0.0.0.0         255.255.255.0   U     101    0        0 enp0s8
172.17.0.0      0.0.0.0         255.255.0.0     U     0      0        0 docker0
172.20.1.0      172.20.1.0      255.255.255.0   UG    0      0        0 flannel.1
172.20.2.0      172.20.2.0      255.255.255.0   UG    0      0        0 flannel.1
192.168.5.0     0.0.0.0         255.255.255.0   U     100    0        0 enp0s3

再重启apiserver, 然后检查dns、dashboard都可以正常访问kube-apiserver了。

[root@k8s-master ansible]# kubectl get ep kubernetes
NAME         ENDPOINTS         AGE
kubernetes   10.2.2.120:6443   19h

但是,这样会导致master的主机不能访问外网(缺省路由改为内网网关后,内网网关没接上外部互联网),以后我想安装个软件或更新系统,是个困难的事情。

或者 有没有其他办法可以另 apisever启动时 不通过识别default gateway来设置 svc kubernetes 的 endpoint ?

@weilinqwe
Copy link
Contributor

不太清楚内部机制,除非看源码,不知道哪个兄弟清楚这里说一下。 我是把外网网卡down以后,看apiserver启动日志里报错说找不到默认网关之类才知道的。 master访问外网或者安装的话,建议可以找台服务器(比如deploy哪个节点)做proxy,yum.conf里设一下proxy就可以。

@AEGQ
Copy link
Contributor

AEGQ commented Mar 17, 2019

在1.13.1 的集群上 遇到过 kubernetes.default 的 endpoint 是 虚机的公网IP 。

后来在每个kube-apiserver 上加了启动参数 --advertise-address 指定内网IP,就好了,可以试试 :) @4220182

@4220182
Copy link
Author

4220182 commented Mar 17, 2019

完美解决 @AEGQ ,感谢!
参数说明:--advertise-address  # 对集群中成员提供API服务地址

[root@k8s-master ~]# cat /etc/systemd/system/kube-apiserver.service 
[Unit]
Description=Kubernetes API Server
Documentation=https://github.com/GoogleCloudPlatform/kubernetes
After=network.target

[Service]
ExecStart=/opt/kube/bin/kube-apiserver \
......
  --bind-address=10.2.2.120 \
  --advertise-address=10.2.2.120 \
......

重启kube-apiserver后,再检查endpoint,已经正确了

# kubectl get ep kubernetes
NAME         ENDPOINTS         AGE
kubernetes   10.2.2.120:6443   6d1h

@4220182 4220182 closed this as completed Mar 17, 2019
@xieydd
Copy link

xieydd commented Apr 20, 2019

@AEGQ Prefect solve my problem.How about send a PR in this repo.

@gjmzj
Copy link
Collaborator

gjmzj commented May 23, 2019

已提交fix 148dce5
感谢各位

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants