server memory highly load due to too many connections？ #9812

chgitcrazy · 2018-06-06T12:42:22Z

Our Environment and Config:
os: centos 6
etcd Version: 3.2.11
Git SHA: 1e1dbb2
Go Version: go1.8.5
Go OS/Arch: linux/amd64
Etcd config:
name: 'etcdserver-test02'
data-dir: /usr/etcd/data
wal-dir: /usr/etcd/data/wal
snapshot-count: 5000
heartbeat-interval: 1000
election-timeout: 5000
quota-backend-bytes: 0
listen-peer-urls: https://192.168.1.100:2380
listen-client-urls: https://192.168.1.100:2379
max-snapshots: 5
max-wals: 5
cors:
initial-advertise-peer-urls: https://192.168.1.100:2380
advertise-client-urls: https://192.168.1.100:2379
discovery:
discovery-fallback: 'proxy'
discovery-proxy:
discovery-srv:
initial-cluster: 'etcdserver-test01=https://192.168.1.101:2380,etcdserver-test02=https://192.168.1.100:2380,etcdserver-test03=https://192.168.1.102:2380,etcdserver-test04=https://192.168.1.103:2380,etcdserver-test05=https://192.168.1.104:2380,etcdserver-test06=https://192.168.1.105:2380,etcdserver-test07=https://192.168.1.106:2380'
initial-cluster-token: 'etcd-cluster-kfefeiifeNHHEfeifek'
initial-cluster-state: 'existing'
strict-reconfig-check: false
enable-v2: false
enable-pprof: true
proxy: 'off'
proxy-failure-wait: 5000
proxy-refresh-interval: 30000
proxy-dial-timeout: 1000
proxy-write-timeout: 5000
proxy-read-timeout: 0
client-transport-security:
ca-file: '/usr/etcd/config/ssl/ca.pem'
cert-file: '/usr/etcd/config/ssl/server.pem'
key-file: '/usr/etcd/config/ssl/server-key.pem'
client-cert-auth: true
trusted-ca-file: '/usr/etcd/config/ssl/ca.pem'
auto-tls: true
peer-transport-security:
ca-file: '/usr/etcd/config/ssl/ca.pem'
cert-file: '/usr/etcd/config/ssl/member2.pem'
key-file: '/usr/etcd/config/ssl/member2-key.pem'
peer-client-cert-auth: true
trusted-ca-file: '/usr/etcd/config/ssl/ca.pem'
auto-tls: true
debug: true
log-package-levels:
log-output: default
force-new-cluster: false

This case occured when 10000+ clients connected etcd cluster(7 nodes) ； After a period of time, I found some clients connected failed with the error (“context deadline exceeded”); Then ,by the monitor system，I observed that the memory of that two hosts have exceeded 30G（the two nodes total memory：32G），but the grafana displayed that etcd process in that two hosts only consumed memory not large than 5G and the hosts wasn’t responding.
Picture as follows:

Besides the above , by analysing the etcd log , I found a lot like these information :
etcdmain: rejected connection from “192.168.1.104:32427" (error "read tcp 192.168.1.102:2380->192.168.1.104:32427: i/o timeout", ServerName “”),why?

And in the ESTAB 0 0 ESTAB 0 0 ESTAB 0 0 ESTAB 0 0 ESTAB 0 0 ESTAB 0 0 ESTAB 0 0 ESTAB 0 0 ESTAB 0 0 ESTAB 0 0 ESTAB 0 0 ESTAB 0 0 ESTAB 0 0 ESTAB 0 0 ESTAB 0 0 ESTAB 0 0 ESTAB 0 0 ESTAB 0 0 ESTAB 0 0 ESTAB 0 0 ESTAB 0 0 ESTAB 0 0 ESTAB 0 0 ……..
…….(so many connections) cluster ，I found each member has multy connection on port 2380 with other member repeatly，it’s so strange ,as follows:
192.168.1.102:2380 192.168.1.104:14203
192.168.1.102:2380 192.168.1.104:20579
192.168.1.102:2380 192.168.1.104:19975
192.168.1.102:2380 192.168.1.104:15983
192.168.1.102:2380 192.168.1.104:20296
192.168.1.102:2380 192.168.1.105:60271
192.168.1.102:2380 192.168.1.104:18111
192.168.1.102:2380 192.168.1.105:60273
192.168.1.102:2380 192.168.1.104:18323
192.168.1.102:2380 192.168.1.104:20451
192.168.1.102:2380 192.168.1.104:18217
192.168.1.102:2380 192.168.1.104:19466
192.168.1.102:2380 192.168.1.104:15917
192.168.1.102:2380 192.168.1.104:18318
192.168.1.102:2380 192.168.1.105:60194
192.168.1.102:2380 192.168.1.104:20152
192.168.1.102:2380 192.168.1.103:16269
192.168.1.102:2380 192.168.1.103:16615
192.168.1.102:2380 192.168.1.103:17397
192.168.1.102:2380 192.168.1.103:17397
192.168.1.102:2380 192.168.1.103:17397
192.168.1.102:2380 192.168.1.103:17397
192.168.1.102:2380 192.168.1.103:17397

when in host 192.168.104 , I use ‘ss’ command to find the connections from local to foreign:
ss |awk '{print $5}'|grep ":2380"|awk -F ":" "{print $1}"|sort |uniq -c|sort
     3 192.168.1.101:2380
     3 192.168.1.102:2380
     3 192.168.1.103:2380
     3 192.168.1.1042380
     3 192.168.1.100:2380
  3498 192.168.1.106:2380

here， 3498 connections to 192.168.1.106 , it’s so discuzing！

So，my problems is as follows：

why a member has too many connections with other member repeatly？
why etcd occupied memory is not large than 5G，but the host memory has exceeded 30G？Is this due to many connections？
@gyuho @xiang90

The text was updated successfully, but these errors were encountered:

cfc4n · 2018-07-26T13:21:45Z

@gyuho Can you close this issue? This is the same problem as #9911 , and I record the detail on my blog https://www.cnxct.com/etcd-lease-keepalive-debug-note/ why it happened.

chgitcrazy changed the title ~~machine memory highly load due to too many connections？~~ server memory highly load due to too many connections？ Jun 6, 2018

gyuho added the type/question label Jun 14, 2018

cfc4n mentioned this issue Jul 9, 2018

clientv3 sdk will send /etcdserverpb.Lease/LeaseKeepAlive twice every second for lease id after over time. #9911

Closed

gyuho closed this as completed Jul 27, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

server memory highly load due to too many connections？ #9812

server memory highly load due to too many connections？ #9812

chgitcrazy commented Jun 6, 2018 •

edited

Loading

cfc4n commented Jul 26, 2018

server memory highly load due to too many connections？ #9812

server memory highly load due to too many connections？ #9812

Comments

chgitcrazy commented Jun 6, 2018 • edited Loading

cfc4n commented Jul 26, 2018

chgitcrazy commented Jun 6, 2018 •

edited

Loading