Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TCP socket opened leak using proxy #472

Closed
KannarFr opened this issue Feb 22, 2021 · 6 comments · Fixed by #478
Closed

TCP socket opened leak using proxy #472

KannarFr opened this issue Feb 22, 2021 · 6 comments · Fixed by #478

Comments

@KannarFr
Copy link

KannarFr commented Feb 22, 2021

Expected behavior

When a client is connected, the proxy shouldn't continue to create a connection with the broker.

Actual behavior

Considering a client with one producer and one consumer,
A cluster with 20 proxies/20 brokers/7 ZK

With 500 clients
  • java client conducts proxy to have 3500 opened TCP socket and it's stable over time.
  • go client conducts proxy to have 4293 opened TCP socket and it keeps increasing rapidly to 5000, then slowly.
With 300k clients

Using go client,
The proxies continue to create connections indefinitely to brokers.
Then it conducts the proxies nodes to use all socket the host could provide.
Then, the proxy becomes unable and another gets the same problem, etc.
Finally, the cluster becomes fully unreachable.

Notes

There is no debug logs from go client that it's reconnecting or something.

For each new TCP socket opened, a connection from proxy to broker is logged.

Feb 22 14:00:16 pulsar-c2-n24 pulsar-proxy[795547]: 14:00:16.796 [pulsar-proxy-io-2-11] INFO  org.apache.pulsar.client.impl.ConnectionPool - [[id: 0xf30e2d8f, L:/192.168.11.115:51920 - R:pulsar-c2-n36/192.168.11.60:2004]] Connected to server

Steps to reproduce

Create 500 clients using go client, for each creates a producer and a consumer to cluster with multiples proxies/brokers and follow the number of opened TCP connections on proxy using:

ss | grep ESTAB | wc -l
@KannarFr
Copy link
Author

KannarFr commented Feb 23, 2021

I'm currently running 1 client with one producer and one consumer without sending or receiving messages. They are just connected and receiving/replying to broker's ping.

With the Go client:
Proxy had 7 TCP sockets established right after launching the client.

  • after 10 minutes it has 25 TCP sockets opened;
  • after 1 hour, it varies between 25 and 30 TCP sockets but seems stable.

With the Java client:
Proxy had 7 TCP sockets established right after launching the client.

  • after 10 minutes it has 7 TCP sockets opened;
  • after 1 hour it has 7 TCP sockets opened.

@freeznet
Copy link
Contributor

freeznet commented Feb 24, 2021

@KannarFr thanks for your brief description, would you mind to share some proxy's full logs for above tests with go client & java client?

@KannarFr
Copy link
Author

KannarFr commented Feb 25, 2021

For java:

12:55:40.804 [main-EventThread] INFO  org.apache.pulsar.zookeeper.ZooKeeperDataCache - [State:CONNECTED Timeout:30000 sessionid:0x1003e625d88003f local:/192.168.11.104:40592 remoteserver:x-zookeeper-c4-n1/192.168.11.9:2184 lastZxid:4299675004 xid:4289 sent:11685 recv:15951 queuedpkts:0 pendingresp:0 queuedevents:0] Received ZooKeeper watch event: WatchedEvent state:SyncConnected type:NodeDataChanged path:/loadbalance/brokers/x-pulsar-c2-n31:2009
12:55:51.522 [pulsar-proxy-io-2-12] INFO  org.apache.pulsar.proxy.server.ProxyConnection - [/46.252.181.22:39300] New connection opened
12:55:51.737 [pulsar-proxy-io-2-12] INFO  org.apache.pulsar.proxy.server.ProxyConnection - [/46.252.181.22:39300] complete connection, init proxy handler. authenticated with token role biscuit:XXX, hasProxyToBrokerUrl: false
12:55:51.752 [pulsar-proxy-io-2-14] INFO  org.apache.pulsar.client.impl.ConnectionPool - [[id: 0xa4b72e3b, L:/192.168.11.104:59336 - R:x-pulsar-c2-n1/192.168.11.1:2004]] Connected to server
12:55:52.159 [pulsar-proxy-io-2-1] INFO  org.apache.pulsar.client.impl.ConnectionPool - [[id: 0xa9c0690b, L:/192.168.11.104:59768 - R:x-pulsar-c2-n26/192.168.11.50:2004]] Connected to server
12:55:52.323 [pulsar-proxy-io-2-4] INFO  org.apache.pulsar.proxy.server.ProxyConnection - [/46.252.181.22:39302] New connection opened
12:55:52.341 [pulsar-proxy-io-2-4] INFO  org.apache.pulsar.proxy.server.ProxyConnection - [/46.252.181.22:39302] complete connection, init proxy handler. authenticated with token role biscuit:XXX, hasProxyToBrokerUrl: true
12:55:52.569 [pulsar-proxy-io-2-6] INFO  org.apache.pulsar.client.impl.ConnectionPool - [[id: 0x9e50e750, L:/192.168.11.104:49578 - R:x-pulsar-c2-n27/192.168.11.51:2004]] Connected to server
12:55:52.763 [pulsar-proxy-io-2-9] INFO  org.apache.pulsar.client.impl.ConnectionPool - [[id: 0x2994e1b3, L:/192.168.11.104:44150 - R:x-pulsar-c2-n28/192.168.11.52:2004]] Connected to server
12:55:52.924 [pulsar-proxy-io-2-12] INFO  org.apache.pulsar.proxy.server.ProxyConnection - [/46.252.181.22:39304] New connection opened
12:55:52.943 [pulsar-proxy-io-2-12] INFO  org.apache.pulsar.proxy.server.ProxyConnection - [/46.252.181.22:39304] complete connection, init proxy handler. authenticated with token role biscuit:XXX, hasProxyToBrokerUrl: true
12:55:55.126 [main-EventThread] INFO  org.apache.pulsar.zookeeper.ZooKeeperDataCache - [State:CONNECTED Timeout:30000 sessionid:0x1003e625d88003f local:/192.168.11.104:40592 remoteserver:x-zookeeper-c4-n1/192.168.11.9:2184 lastZxid:4299675005 xid:4290 sent:11687 recv:15955 queuedpkts:0 pendingresp:0 queuedevents:0] Received ZooKeeper watch event: WatchedEvent state:SyncConnected type:NodeDataChanged path:/loadbalance/brokers/x-pulsar-c2-n2:2009

Stable no more connections, it should be only one as mentioned in the pulsar documentation but there are 4.

For go:

13:00:36.788 [pulsar-proxy-io-2-11] INFO  org.apache.pulsar.proxy.server.ProxyConnection - [/77.141.83.211:40736] New connection opened
13:00:36.846 [pulsar-proxy-io-2-11] INFO  org.apache.pulsar.proxy.server.ProxyConnection - [/77.141.83.211:40736] complete connection, init proxy handler. authenticated with token role biscuit:XXX, hasProxyToBrokerUrl: false
13:00:36.861 [pulsar-proxy-io-2-13] INFO  org.apache.pulsar.client.impl.ConnectionPool - [[id: 0xe2c30cdd, L:/192.168.11.8:54822 - R:x-pulsar-c2-n32/192.168.11.56:2004]] Connected to server
13:00:37.045 [pulsar-proxy-io-2-16] INFO  org.apache.pulsar.client.impl.ConnectionPool - [[id: 0x9c3e54aa, L:/192.168.11.8:50326 - R:x-pulsar-c2-n33/192.168.11.57:2004]] Connected to server
13:00:37.241 [pulsar-proxy-io-2-3] INFO  org.apache.pulsar.proxy.server.ProxyConnection - [/77.141.83.211:40738] New connection opened
13:00:37.276 [pulsar-proxy-io-2-3] INFO  org.apache.pulsar.proxy.server.ProxyConnection - [/77.141.83.211:40738] complete connection, init proxy handler. authenticated with token role biscuit:XXX, hasProxyToBrokerUrl: true
13:00:37.320 [pulsar-proxy-io-2-5] INFO  org.apache.pulsar.client.impl.ConnectionPool - [[id: 0x10d8bee4, L:/192.168.11.8:53418 - R:x-pulsar-c2-n34/192.168.11.58:2004]] Connected to server
13:00:37.499 [pulsar-proxy-io-2-8] INFO  org.apache.pulsar.client.impl.ConnectionPool - [[id: 0x8e1071c7, L:/192.168.11.8:52996 - R:x-pulsar-c2-n35/192.168.11.59:2004]] Connected to server
13:00:37.548 [pulsar-proxy-io-2-11] INFO  org.apache.pulsar.proxy.server.ProxyConnection - [/77.141.83.211:40740] New connection opened
13:00:37.581 [pulsar-proxy-io-2-11] INFO  org.apache.pulsar.proxy.server.ProxyConnection - [/77.141.83.211:40740] complete connection, init proxy handler. authenticated with token role biscuit:XXX, hasProxyToBrokerUrl: true
13:00:50.181 [main-EventThread] INFO  org.apache.pulsar.zookeeper.ZooKeeperDataCache - [State:CONNECTED Timeout:30000 sessionid:0x1003e625d88003e local:/192.168.11.8:57296 remoteserver:x-zookeeper-c4-n1/192.168.11.9:2184 lastZxid:4299675079 xid:4346 sent:11757 recv:16081 queuedpkts:0 pendingresp:0 queuedevents:0] Received ZooKeeper watch event: WatchedEvent state:SyncConnected type:NodeDataChanged path:/loadbalance/brokers/x-pulsar-c2-n2:2009
13:00:50.211 [main-EventThread] INFO  org.apache.pulsar.zookeeper.ZooKeeperDataCache - [State:CONNECTED Timeout:30000 sessionid:0x1003e625d88003e local:/192.168.11.8:57296 remoteserver:x-zookeeper-c4-n1/192.168.11.9:2184 lastZxid:4299675080 xid:4347 sent:11758 recv:16083 queuedpkts:0 pendingresp:0 queuedevents:0] Received ZooKeeper watch event: WatchedEvent state:SyncConnected type:NodeDataChanged path:/loadbalance/brokers/x-pulsar-c2-n5:2009
13:00:50.613 [main-EventThread] INFO  org.apache.pulsar.zookeeper.ZooKeeperDataCache - [State:CONNECTED Timeout:30000 sessionid:0x1003e625d88003e local:/192.168.11.8:57296 remoteserver:x-zookeeper-c4-n1/192.168.11.9:2184 lastZxid:4299675081 xid:4348 sent:11759 recv:16085 queuedpkts:0 pendingresp:0 queuedevents:0] Received ZooKeeper watch event: WatchedEvent state:SyncConnected type:NodeDataChanged path:/loadbalance/brokers/x-pulsar-c2-n38:2009
13:00:50.858 [main-EventThread] INFO  org.apache.pulsar.zookeeper.ZooKeeperDataCache - [State:CONNECTED Timeout:30000 sessionid:0x1003e625d88003e local:/192.168.11.8:57296 remoteserver:x-zookeeper-c4-n1/192.168.11.9:2184 lastZxid:4299675082 xid:4349 sent:11760 recv:16087 queuedpkts:0 pendingresp:0 queuedevents:0] Received ZooKeeper watch event: WatchedEvent state:SyncConnected type:NodeDataChanged path:/loadbalance/brokers/x-pulsar-c2-n31:2009
13:01:00.808 [main-EventThread] INFO  org.apache.pulsar.zookeeper.ZooKeeperDataCache - [State:CONNECTED Timeout:30000 sessionid:0x1003e625d88003e local:/192.168.11.8:57296 remoteserver:x-zookeeper-c4-n1/192.168.11.9:2184 lastZxid:4299675083 xid:4350 sent:11762 recv:16089 queuedpkts:0 pendingresp:0 queuedevents:0] Received ZooKeeper watch event: WatchedEvent state:SyncConnected type:NodeDataChanged path:/loadbalance/brokers/x-pulsar-c2-n29:2009
13:01:10.069 [main-EventThread] INFO  org.apache.pulsar.zookeeper.ZooKeeperDataCache - [State:CONNECTED Timeout:30000 sessionid:0x1003e625d88003e local:/192.168.11.8:57296 remoteserver:x-zookeeper-c4-n1/192.168.11.9:2184 lastZxid:4299675084 xid:4351 sent:11764 recv:16092 queuedpkts:0 pendingresp:0 queuedevents:0] Received ZooKeeper watch event: WatchedEvent state:SyncConnected type:NodeDataChanged path:/loadbalance/brokers/x-pulsar-c2-n32:2009
13:01:11.760 [main-EventThread] INFO  org.apache.pulsar.zookeeper.ZooKeeperDataCache - [State:CONNECTED Timeout:30000 sessionid:0x1003e625d88003e local:/192.168.11.8:57296 remoteserver:x-zookeeper-c4-n1/192.168.11.9:2184 lastZxid:4299675085 xid:4352 sent:11765 recv:16095 queuedpkts:0 pendingresp:0 queuedevents:0] Received ZooKeeper watch event: WatchedEvent state:SyncConnected type:NodeDataChanged path:/loadbalance/brokers/x-pulsar-c2-n37:2009
13:01:17.176 [main-EventThread] INFO  org.apache.pulsar.zookeeper.ZooKeeperDataCache - [State:CONNECTED Timeout:30000 sessionid:0x1003e625d88003e local:/192.168.11.8:57296 remoteserver:x-zookeeper-c4-n1/192.168.11.9:2184 lastZxid:4299675086 xid:4353 sent:11766 recv:16097 queuedpkts:0 pendingresp:0 queuedevents:0] Received ZooKeeper watch event: WatchedEvent state:SyncConnected type:NodeDataChanged path:/loadbalance/brokers/x-pulsar-c2-n30:2009
13:01:24.094 [main-EventThread] INFO  org.apache.pulsar.zookeeper.ZooKeeperDataCache - [State:CONNECTED Timeout:30000 sessionid:0x1003e625d88003e local:/192.168.11.8:57296 remoteserver:x-zookeeper-c4-n1/192.168.11.9:2184 lastZxid:4299675087 xid:4354 sent:11767 recv:16099 queuedpkts:0 pendingresp:0 queuedevents:0] Received ZooKeeper watch event: WatchedEvent state:SyncConnected type:NodeDataChanged path:/loadbalance/brokers/x-pulsar-c2-n35:2009
13:01:36.349 [main-EventThread] INFO  org.apache.pulsar.zookeeper.ZooKeeperDataCache - [State:CONNECTED Timeout:30000 sessionid:0x1003e625d88003e local:/192.168.11.8:57296 remoteserver:x-zookeeper-c4-n1/192.168.11.9:2184 lastZxid:4299675088 xid:4355 sent:11769 recv:16102 queuedpkts:0 pendingresp:0 queuedevents:0] Received ZooKeeper watch event: WatchedEvent state:SyncConnected type:NodeDataChanged path:/loadbalance/brokers/x-pulsar-c2-n40:2009
13:01:37.321 [pulsar-proxy-io-2-13] INFO  org.apache.pulsar.client.impl.ConnectionPool - [[id: 0x8b62d43b, L:/192.168.11.8:34644 - R:x-pulsar-c2-n36/192.168.11.60:2004]] Connected to server
13:01:37.618 [pulsar-proxy-io-2-16] INFO  org.apache.pulsar.client.impl.ConnectionPool - [[id: 0xe4972e0d, L:/192.168.11.8:41384 - R:x-pulsar-c2-n37/192.168.11.61:2004]] Connected to server
13:01:38.975 [main-EventThread] INFO  org.apache.pulsar.zookeeper.ZooKeeperDataCache - [State:CONNECTED Timeout:30000 sessionid:0x1003e625d88003e local:/192.168.11.8:57296 remoteserver:x-zookeeper-c4-n1/192.168.11.9:2184 lastZxid:4299675089 xid:4356 sent:11770 recv:16104 queuedpkts:0 pendingresp:0 queuedevents:0] Received ZooKeeper watch event: WatchedEvent state:SyncConnected type:NodeDataChanged path:/loadbalance/brokers/x-pulsar-c2-n4:2009
13:01:40.580 [main-EventThread] INFO  org.apache.pulsar.zookeeper.ZooKeeperDataCache - [State:CONNECTED Timeout:30000 sessionid:0x1003e625d88003e local:/192.168.11.8:57296 remoteserver:x-zookeeper-c4-n1/192.168.11.9:2184 lastZxid:4299675090 xid:4357 sent:11771 recv:16106 queuedpkts:0 pendingresp:0 queuedevents:0] Received ZooKeeper watch event: WatchedEvent state:SyncConnected type:NodeDataChanged path:/loadbalance/brokers/x-pulsar-c2-n38:2009
13:02:37.327 [pulsar-proxy-io-2-3] INFO  org.apache.pulsar.client.impl.ConnectionPool - [[id: 0x8e02143d, L:/192.168.11.8:57936 - R:x-pulsar-c2-n38/192.168.11.62:2004]] Connected to server
13:02:37.625 [pulsar-proxy-io-2-6] INFO  org.apache.pulsar.client.impl.ConnectionPool - [[id: 0xe6509205, L:/192.168.11.8:41908 - R:x-pulsar-c2-n39/192.168.11.63:2004]] Connected to server

Still having new connection

@KannarFr
Copy link
Author

@freeznet

@freeznet
Copy link
Contributor

@KannarFr thanks for your inputs, I have to setup some test envs to debug with this issue. Will let you know if I have any updates.

@freeznet
Copy link
Contributor

freeznet commented Mar 2, 2021

@KannarFr I have compared go client and java client, one difference is go client send PartitionedTopicMetadata fetch request with new connection, but java is using lookup service's connection. And this might lead to the socket leakage issue. I have created a pr #478 to fix this issue, and tested with local cluster, PTAL if you have time, thanks.

wolfstudy pushed a commit that referenced this issue Mar 5, 2021
Fixes #472

### Motivation

#472 addressed a socket leakage in go client. with some debug works, a difference between java client and go client was found. one difference is go client send `PartitionedTopicMetadata` request with new connection, but java is using lookup service's connection. This pr added `GetPartitionedTopicMetadata` to lookup service to reuse the lookup service connection.

Through this pr, the leakage has been resolved with [this sample deployment](https://gist.github.com/freeznet/8681525ddd240877af88633363ba222f), the sockets are keep to 7 for 2 hours test. But still need to check with @KannarFr  to see if there are any other possible leakages since I cannot have a cluster with 20 proxies/20 brokers/7 ZK locally.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants