bug: Unable to reconnect to apisix, when all ep are deleted under svc of apisix #769

han6565 · 2021-11-24T09:39:31Z

Issue description

我在k8s上部署了一个apisix，一个apisix-controller，两台配置好后controller健康检查无问题，
当apisix pod被kill后就算pod重新拉起，controller仍然无法重新连接到apisix
但是我后续手动admin访问pod相同地址没有问题
如果有两个或多个apisix pod时只要存在一个活的pod就不会有问题，但是当短时间内没有pod存活就会无法再连接上
是我配置问题，还是本来没有pod能访问时就应该再也无法连接呢

在controller启动同步后，删除service再恢复可以百分之百复现

021-11-23T20:51:39+08:00	warn	apisix/cluster.go:452	failed to check health for cluster default: dial tcp 172.24.150.14:9180: connect: connection refused, will retry
2021-11-23T20:51:39+08:00	warn	ingress/controller.go:660	failed to check health for default cluster: timed out waiting for the condition, give up leader
2021-11-23T20:51:39+08:00	info	ingress/endpoint.go:83	endpoints controller exited
2021-11-23T20:51:39+08:00	info	ingress/apisix_tls.go:71	ApisixTls controller exited
2021-11-23T20:51:39+08:00	error	ingress/ingress.go:63	cache sync failed
2021-11-23T20:51:39+08:00	info	ingress/ingress.go:64	ingress controller exited
2021-11-23T20:51:39+08:00	info	ingress/apisix_upstream.go:71	ApisixUpstream controller exited
2021-11-23T20:51:39+08:00	info	ingress/secret.go:76	secret controller exited
2021-11-23T20:51:39+08:00	info	ingress/service.go:61	svc controller exited
2021-11-23T20:51:39+08:00	info	ingress/namespace.go:82	namespace controller exited
2021-11-23T20:51:39+08:00	info	ingress/pod.go:56	pod controller exited
2021-11-23T20:51:39+08:00	info	ingress/apisix_route.go:71	ApisixRoute controller exited
2021-11-23T20:51:39+08:00	info	ingress/apisix_consumer.go:67	ApisixConsumer controller exited
2021-11-23T20:51:39+08:00	info	ingress/apisix_cluster_config.go:69	ApisixClusterConfig controller exited
2021-11-23T20:51:39+08:00	info	ingress/controller.go:354	controller now is running as a candidate	{"namespace": "apisix", "pod": "apisix-ingress-controller-7994d7bb49-z5hms"}
I1123 20:51:39.105127       1 leaderelection.go:243] attempting to acquire leader lease apisix/ingress-apisix-leader...
2021-11-23T20:51:39+08:00	info	ingress/controller.go:307	LeaderElection	{"message": "apisix-ingress-controller-7994d7bb49-z5hms became leader", "event_type": "Normal"}
I1123 20:51:39.111962       1 leaderelection.go:253] successfully acquired lease apisix/ingress-apisix-leader
2021-11-23T20:51:39+08:00	info	ingress/controller.go:387	controller tries to leading ...	{"namespace": "apisix", "pod": "apisix-ingress-controller-7994d7bb49-z5hms"}
2021-11-23T20:51:39+08:00	error	ingress/controller.go:414	failed to wait the default cluster to be ready: dial tcp 172.24.150.14:9180: connect: connection refused
E1123 20:51:39.112199       1 leaderelection.go:325] error retrieving resource lock apisix/ingress-apisix-leader: Get https://172.24.144.1:443/apis/coordination.k8s.io/v1/namespaces/apisix/leases/ingress-apisix-leader: context canceled
2021-11-23T20:51:39+08:00	info	ingress/controller.go:307	LeaderElection	{"message": "apisix-ingress-controller-7994d7bb49-z5hms stopped leading", "event_type": "Normal"}
I1123 20:51:39.112223       1 leaderelection.go:278] failed to renew lease apisix/ingress-apisix-leader: timed out waiting for the condition
2021-11-23T20:51:39+08:00	info	ingress/controller.go:354	controller now is running as a candidate	{"namespace": "apisix", "pod": "apisix-ingress-controller-7994d7bb49-z5hms"}
I1123 20:51:39.112241       1 leaderelection.go:243] attempting to acquire leader lease apisix/ingress-apisix-leader...
2021-11-23T20:51:39+08:00	info	apisix/cluster.go:156	syncing cache	{"cluster": "default"}
2021-11-23T20:51:39+08:00	info	apisix/cluster.go:347	syncing schema	{"cluster": "default"}
2021-11-23T20:51:39+08:00	error	apisix/plugin.go:46	failed to list plugins' names: Get http://172.24.150.14:9180/apisix/admin/plugins/list: context canceled
2021-11-23T20:51:39+08:00	error	apisix/cluster.go:367	failed to list plugin names in APISIX: Get http://172.24.150.14:9180/apisix/admin/plugins/list: context canceled
2021-11-23T20:51:39+08:00	warn	apisix/cluster.go:330	failed to sync schema: Get http://172.24.150.14:9180/apisix/admin/plugins/list: context canceled
2021-11-23T20:51:39+08:00	error	apisix/route.go:119	failed to list routes: Get http://172.24.150.14:9180/apisix/admin/routes: context canceled
2021-11-23T20:51:39+08:00	error	apisix/cluster.go:200	failed to list route in APISIX: Get http://172.24.150.14:9180/apisix/admin/routes: context canceled
2021-11-23T20:51:39+08:00	info	ingress/controller.go:307	LeaderElection	{"message": "apisix-ingress-controller-7994d7bb49-z5hms became leader", "event_type": "Normal"}
I1123 20:51:39.118355       1 leaderelection.go:253] successfully acquired lease apisix/ingress-apisix-leader
2021-11-23T20:51:39+08:00	info	ingress/controller.go:387	controller tries to leading ...	{"namespace": "apisix", "pod": "apisix-ingress-controller-7994d7bb49-z5hms"}
2021-11-23T20:51:39+08:00	warn	apisix/cluster.go:307	waiting cluster default to ready, it may takes a while
2021-11-23T20:51:41+08:00	error	apisix/route.go:119	failed to list routes: Get http://172.24.150.14:9180/apisix/admin/routes: context canceled
2021-11-23T20:51:41+08:00	error	apisix/cluster.go:200	failed to list route in APISIX: Get http://172.24.150.14:9180/apisix/admin/routes: context canceled
[GIN] 2021/11/23 - 20:51:42 | 200 |      42.863µs |    172.24.248.6 | GET      "/healthz"
2021-11-23T20:51:43+08:00	error	apisix/route.go:119	failed to list routes: Get http://172.24.150.14:9180/apisix/admin/routes: context canceled
2021-11-23T20:51:43+08:00	error	apisix/cluster.go:200	failed to list route in APISIX: Get http://172.24.150.14:9180/apisix/admin/routes: context canceled
2021-11-23T20:51:45+08:00	error	apisix/route.go:119	failed to list routes: Get http://172.24.150.14:9180/apisix/admin/routes: context canceled
2021-11-23T20:51:45+08:00	error	apisix/cluster.go:200	failed to list route in APISIX: Get http://172.24.150.14:9180/apisix/admin/routes: context canceled
[GIN] 2021/11/23 - 20:51:46 | 200 |      41.607µs |    172.24.248.6 | GET      "/healthz"
2021-11-23T20:51:47+08:00	error	apisix/route.go:119	failed to list routes: Get http://172.24.150.14:9180/apisix/admin/routes: context canceled
2021-11-23T20:51:47+08:00	error	apisix/cluster.go:200	failed to list route in APISIX: Get http://172.24.150.14:9180/apisix/admin/routes: context canceled
2021-11-23T20:51:47+08:00	error	apisix/cluster.go:166	failed to sync cache	{"cost_time": "8.001080415s", "cluster": "default"}
2021-11-23T20:51:47+08:00	error	ingress/controller.go:414	failed to wait the default cluster to be ready: Get http://172.24.150.14:9180/apisix/admin/routes: context canceled
2021-11-23T20:51:47+08:00	info	ingress/controller.go:354	controller now is running as a candidate	{"namespace": "apisix", "pod": "apisix-ingress-controller-7994d7bb49-z5hms"}
I1123 20:51:47.113474       1 leaderelection.go:243] attempting to acquire leader lease apisix/ingress-apisix-leader...
2021-11-23T20:51:47+08:00	info	apisix/cluster.go:347	syncing schema	{"cluster": "default"}
2021-11-23T20:51:47+08:00	error	apisix/plugin.go:46	failed to list plugins' names: Get http://172.24.150.14:9180/apisix/admin/plugins/list: context canceled
2021-11-23T20:51:47+08:00	error	apisix/cluster.go:367	failed to list plugin names in APISIX: Get http://172.24.150.14:9180/apisix/admin/plugins/list: context canceled
2021-11-23T20:51:47+08:00	info	apisix/cluster.go:156	syncing cache	{"cluster": "default"}
2021-11-23T20:51:47+08:00	error	apisix/route.go:119	failed to list routes: Get http://172.24.150.14:9180/apisix/admin/routes: context canceled
2021-11-23T20:51:47+08:00	error	apisix/cluster.go:200	failed to list route in APISIX: Get http://172.24.150.14:9180/apisix/admin/routes: context canceled
2021-11-23T20:51:47+08:00	warn	apisix/cluster.go:330	failed to sync schema: Get http://172.24.150.14:9180/apisix/admin/plugins/list: context canceled
2021-11-23T20:51:47+08:00	info	ingress/controller.go:307	LeaderElection	{"message": "apisix-ingress-controller-7994d7bb49-z5hms became leader", "event_type": "Normal"}
I1123 20:51:47.119539       1 leaderelection.go:253] successfully acquired lease apisix/ingress-apisix-leader
2021-11-23T20:51:47+08:00	info	ingress/controller.go:387	controller tries to leading ...	{"namespace": "apisix", "pod": "apisix-ingress-controller-7994d7bb49-z5hms"}
2021-11-23T20:51:47+08:00	warn	apisix/cluster.go:307	waiting cluster default to ready, it may takes a while

config-map

data:
  config.yaml: |
    # log options
    log_level: "info"
    log_output: "stderr"
    http_listen: ":8080"
    enable_profiling: true
    kubernetes:
      kubeconfig: ""
      resync_interval: "6h"
      app_namespaces:
      - "*"
      ingress_class: "apisix"
      ingress_version: "networking/v1"
      apisix_route_version: "apisix.apache.org/v2beta1"
    apisix:
      default_cluster_base_url: "http://172.24.150.14:9180/apisix/admin"
      default_cluster_admin_key: "edd1c9f034335f136f87ad84b625c8f1"
      default_cluster_name: ""

Environment

your apisix-ingress-controller version (output of apisix-ingress-controller version --long);
1.3.0
your Kubernetes cluster version (output of kubectl version);
Server Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.14", GitCommit:"89182bdd065fbcaffefec691908a739d161efc03", GitTreeState:"clean", BuildDate:"2020-12-18T12:02:35Z", GoVersion:"go1.13.15", Compiler:"gc", Platform:"linux/amd64"}
if you run apisix-ingress-controller in Bare-metal environment, also show your OS version (uname -a).

The text was updated successfully, but these errors were encountered:

tao12345666333 · 2021-11-24T10:38:46Z

I will try to reproduce.

tao12345666333 · 2021-11-24T12:20:43Z

Thanks!

tao12345666333 · 2021-11-24T12:22:56Z

cc @gxthrj PTAL

tokers · 2021-11-25T01:53:11Z

I can give you some clues about this issue.

When the controller gets the opportunity to be the new leader, it tries to add the cluster (name is default) again, but since the cluster was already there (added in its last term), the new one won't be added, and the controller will still use the old cluster, and in the old cluster, the cacheSyncErr was cached and will be used directly when calling HasSynced. So the controller won't enter the state for watching Kubernetes resources.

A simple solution for this is destroyed the old cluster when it gives up the leader role.

chzhuo · 2021-11-30T07:03:36Z

I encountered the same situation, but I also encountered the problem that the leader would not switch.
Because the failed leader node give up and quickly restarts acquiring the lock again.
The leader not switch over last two days.

Zhang21 · 2021-12-09T08:43:19Z

Same error about apisix ingress accesses to apisix admin api.

2021-12-09T16:37:00+08:00	info	ingress/controller.go:290	LeaderElection	{"message": "apisix-ingress-controller-769ddc5457-68d2z became leader", "event_type": "Normal"}
I1209 16:37:00.635044       1 leaderelection.go:253] successfully acquired lease ingress-apisix/ingress-apisix-leader
2021-12-09T16:37:00+08:00	info	ingress/controller.go:370	controller tries to leading ...	{"namespace": "ingress-apisix", "pod": "apisix-ingress-controller-769ddc5457-68d2z"}
2021-12-09T16:37:00+08:00	warn	apisix/cluster.go:304	waiting cluster default to ready, it may takes a while
2021-12-09T16:37:02+08:00	error	apisix/route.go:117	failed to list routes: Get http://apisix-admin.ingress-apisix.svc.cluster.local:9180/apisix/admin/routes: context canceled
2021-12-09T16:37:02+08:00	error	apisix/cluster.go:197	failed to list route in APISIX: Get http://apisix-admin.ingress-apisix.svc.cluster.local:9180/apisix/admin/routes: context canceled
[GIN] 2021/12/09 - 16:37:03 | 200 |     162.264µs |    172.16.2.248 | GET      "/healthz"

I can access apisix admin api by curl:

curl http://apisix-admin.ingress-apisix.svc.cluster.local:9180/apisix/admin/routes -H 'X-API-KEY: xxxxx'
{ result xxxxxx }

stone2world · 2021-12-22T10:15:32Z

i get it

tao12345666333 · 2021-12-24T09:57:08Z

#774 has been merged. It will be released in v1.4 (next week)

If you want to try it now, you can also build docker image to use it.

I will close this one.

tao12345666333 added bug Something isn't working triage/accepted Indicates an issue or PR is ready to be actively worked on. labels Nov 24, 2021

tao12345666333 changed the title ~~request help: controller启动后，单点apisix pod被删除后就算恢复，controller也无法再连接上~~ bug: Unable to reconnect to apisix, when all ep are deleted under svc of apisix Nov 24, 2021

gxthrj self-assigned this Nov 24, 2021

tokers assigned tokers and unassigned gxthrj Nov 26, 2021

tokers mentioned this issue Nov 26, 2021

fix: delete the cluster object when give up the leadership #774

Merged

4 tasks

tao12345666333 closed this as completed Dec 24, 2021

tao12345666333 added this to To do in v1.4 Planning via automation Dec 24, 2021

tao12345666333 added this to the 1.4.0 milestone Dec 24, 2021

tao12345666333 linked a pull request Dec 24, 2021 that will close this issue

fix: delete the cluster object when give up the leadership #774

Merged

4 tasks

tao12345666333 mentioned this issue Jan 30, 2022

bug: Ingress controller error: apisix/route.go:117 failed to list routes #851

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bug: Unable to reconnect to apisix, when all ep are deleted under svc of apisix #769

bug: Unable to reconnect to apisix, when all ep are deleted under svc of apisix #769

han6565 commented Nov 24, 2021 •

edited

Loading

tao12345666333 commented Nov 24, 2021

tao12345666333 commented Nov 24, 2021

tao12345666333 commented Nov 24, 2021

tokers commented Nov 25, 2021

chzhuo commented Nov 30, 2021

Zhang21 commented Dec 9, 2021

stone2world commented Dec 22, 2021

tao12345666333 commented Dec 24, 2021

bug: Unable to reconnect to apisix, when all ep are deleted under svc of apisix #769

bug: Unable to reconnect to apisix, when all ep are deleted under svc of apisix #769

Comments

han6565 commented Nov 24, 2021 • edited Loading

Issue description

Environment

tao12345666333 commented Nov 24, 2021

tao12345666333 commented Nov 24, 2021

tao12345666333 commented Nov 24, 2021

tokers commented Nov 25, 2021

chzhuo commented Nov 30, 2021

Zhang21 commented Dec 9, 2021

stone2world commented Dec 22, 2021

tao12345666333 commented Dec 24, 2021

han6565 commented Nov 24, 2021 •

edited

Loading