You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This case occured when 10000+ clients connected etcd cluster(7 nodes) ; After a period of time, I found some clients connected failed with the error (“context deadline exceeded”); Then ,by the monitor system,I observed that the memory of that two hosts have exceeded 30G(the two nodes total memory:32G),but the grafana displayed that etcd process in that two hosts only consumed memory not large than 5G and the hosts wasn’t responding.
Picture as follows:
Besides the above , by analysing the etcd log , I found a lot like these information :
etcdmain: rejected connection from “192.168.1.104:32427" (error "read tcp 192.168.1.102:2380->192.168.1.104:32427: i/o timeout", ServerName “”),why?
when in host 192.168.104 , I use ‘ss’ command to find the connections from local to foreign:
ss |awk '{print $5}'|grep ":2380"|awk -F ":" "{print $1}"|sort |uniq -c|sort
3 192.168.1.101:2380
3 192.168.1.102:2380
3 192.168.1.103:2380
3 192.168.1.1042380
3 192.168.1.100:2380
3498 192.168.1.106:2380
here, 3498 connections to 192.168.1.106 , it’s so discuzing!
So,my problems is as follows:
why a member has too many connections with other member repeatly?
why etcd occupied memory is not large than 5G,but the host memory has exceeded 30G?Is this due to many connections? @gyuho@xiang90
The text was updated successfully, but these errors were encountered:
chgitcrazy
changed the title
machine memory highly load due to too many connections?
server memory highly load due to too many connections?
Jun 6, 2018
Our Environment and Config:
os: centos 6
etcd Version: 3.2.11
Git SHA: 1e1dbb2
Go Version: go1.8.5
Go OS/Arch: linux/amd64
Etcd config:
name: 'etcdserver-test02'
data-dir: /usr/etcd/data
wal-dir: /usr/etcd/data/wal
snapshot-count: 5000
heartbeat-interval: 1000
election-timeout: 5000
quota-backend-bytes: 0
listen-peer-urls: https://192.168.1.100:2380
listen-client-urls: https://192.168.1.100:2379
max-snapshots: 5
max-wals: 5
cors:
initial-advertise-peer-urls: https://192.168.1.100:2380
advertise-client-urls: https://192.168.1.100:2379
discovery:
discovery-fallback: 'proxy'
discovery-proxy:
discovery-srv:
initial-cluster: 'etcdserver-test01=https://192.168.1.101:2380,etcdserver-test02=https://192.168.1.100:2380,etcdserver-test03=https://192.168.1.102:2380,etcdserver-test04=https://192.168.1.103:2380,etcdserver-test05=https://192.168.1.104:2380,etcdserver-test06=https://192.168.1.105:2380,etcdserver-test07=https://192.168.1.106:2380'
initial-cluster-token: 'etcd-cluster-kfefeiifeNHHEfeifek'
initial-cluster-state: 'existing'
strict-reconfig-check: false
enable-v2: false
enable-pprof: true
proxy: 'off'
proxy-failure-wait: 5000
proxy-refresh-interval: 30000
proxy-dial-timeout: 1000
proxy-write-timeout: 5000
proxy-read-timeout: 0
client-transport-security:
ca-file: '/usr/etcd/config/ssl/ca.pem'
cert-file: '/usr/etcd/config/ssl/server.pem'
key-file: '/usr/etcd/config/ssl/server-key.pem'
client-cert-auth: true
trusted-ca-file: '/usr/etcd/config/ssl/ca.pem'
auto-tls: true
peer-transport-security:
ca-file: '/usr/etcd/config/ssl/ca.pem'
cert-file: '/usr/etcd/config/ssl/member2.pem'
key-file: '/usr/etcd/config/ssl/member2-key.pem'
peer-client-cert-auth: true
trusted-ca-file: '/usr/etcd/config/ssl/ca.pem'
auto-tls: true
debug: true
log-package-levels:
log-output: default
force-new-cluster: false
This case occured when 10000+ clients connected etcd cluster(7 nodes) ; After a period of time, I found some clients connected failed with the error (“context deadline exceeded”); Then ,by the monitor system,I observed that the memory of that two hosts have exceeded 30G(the two nodes total memory:32G),but the grafana displayed that etcd process in that two hosts only consumed memory not large than 5G and the hosts wasn’t responding.
Picture as follows:
Besides the above , by analysing the etcd log , I found a lot like these information :
etcdmain: rejected connection from “192.168.1.104:32427" (error "read tcp 192.168.1.102:2380->192.168.1.104:32427: i/o timeout", ServerName “”),why?
And in the cluster ,I found each member has multy connection on port 2380 with other member repeatly,it’s so strange ,as follows:
ESTAB 0 0 192.168.1.102:2380 192.168.1.104:14203
ESTAB 0 0 192.168.1.102:2380 192.168.1.104:20579
ESTAB 0 0 192.168.1.102:2380 192.168.1.104:19975
ESTAB 0 0 192.168.1.102:2380 192.168.1.104:15983
ESTAB 0 0 192.168.1.102:2380 192.168.1.104:20296
ESTAB 0 0 192.168.1.102:2380 192.168.1.105:60271
ESTAB 0 0 192.168.1.102:2380 192.168.1.104:18111
ESTAB 0 0 192.168.1.102:2380 192.168.1.105:60273
ESTAB 0 0 192.168.1.102:2380 192.168.1.104:18323
ESTAB 0 0 192.168.1.102:2380 192.168.1.104:20451
ESTAB 0 0 192.168.1.102:2380 192.168.1.104:18217
ESTAB 0 0 192.168.1.102:2380 192.168.1.104:19466
ESTAB 0 0 192.168.1.102:2380 192.168.1.104:15917
ESTAB 0 0 192.168.1.102:2380 192.168.1.104:18318
ESTAB 0 0 192.168.1.102:2380 192.168.1.105:60194
ESTAB 0 0 192.168.1.102:2380 192.168.1.104:20152
ESTAB 0 0 192.168.1.102:2380 192.168.1.103:16269
ESTAB 0 0 192.168.1.102:2380 192.168.1.103:16615
ESTAB 0 0 192.168.1.102:2380 192.168.1.103:17397
ESTAB 0 0 192.168.1.102:2380 192.168.1.103:17397
ESTAB 0 0 192.168.1.102:2380 192.168.1.103:17397
ESTAB 0 0 192.168.1.102:2380 192.168.1.103:17397
ESTAB 0 0 192.168.1.102:2380 192.168.1.103:17397
……..
…….(so many connections)
when in host 192.168.104 , I use ‘ss’ command to find the connections from local to foreign:
ss |awk '{print $5}'|grep ":2380"|awk -F ":" "{print $1}"|sort |uniq -c|sort
3 192.168.1.101:2380
3 192.168.1.102:2380
3 192.168.1.103:2380
3 192.168.1.1042380
3 192.168.1.100:2380
3498 192.168.1.106:2380
here, 3498 connections to 192.168.1.106 , it’s so discuzing!
So,my problems is as follows:
@gyuho @xiang90
The text was updated successfully, but these errors were encountered: