Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dubbo provider数量较大且频繁重启,造成consumer端服务器频繁FullGC,且FullGC一直持续 #376

Closed
zhanghw89 opened this issue Feb 16, 2017 · 13 comments

Comments

@zhanghw89
Copy link

当dubbo provider数量较大时,进行频繁重启,会造成dubbo的consumer端服务器频繁FullGC,且FullGC一直持续造成服务不可用,必须重启Comsumer端的服务才能停止垃圾回收。

@huifrank
Copy link

huifrank commented Feb 22, 2017

可否详细描述下?provider达到多少时会触发频繁fullgc? provider数量较大指的是一个应用发布多个服务,还是有多个应用? consumer服务器指的是订阅该服务的服务器还是所有服务器?
我们以后可能也会有大量的提供端,然后频繁重启服务 
现在我想先重现一下你说的问题

@zhanghw89
Copy link
Author

我们线上的环境是一个应用里provider大概有300个,provider应用台数大概有5台左右,对全部的5台provider应用同时进行重启,在这个时候consumer持续进行服务调用,就会发生consumer端的fullGC。通过观察jvm的dump发现内存中存有大量以下实例对象:
com.alibaba.dubbo.registry.zookeeper.ZookeeperRegistry
com.alibaba.dubbo.common.URL
com.alibaba.dubbo.registry.integration.RegistryDirectory

详情如下:
Problem Suspect 1

One instance of "com.alibaba.dubbo.registry.zookeeper.ZookeeperRegistry" loaded by "sun.misc.Launcher$AppClassLoader @ 0x80021b28" occupies 250,412,952 (17.39%) bytes. The memory is accumulated in one instance of "com.alibaba.dubbo.registry.zookeeper.ZookeeperRegistry" loaded by "sun.misc.Launcher$AppClassLoader @ 0x80021b28".

Keywords
sun.misc.Launcher$AppClassLoader @ 0x80021b28
com.alibaba.dubbo.registry.zookeeper.ZookeeperRegistry
Details »
Problem Suspect 2

88,920 instances of "com.alibaba.dubbo.common.URL", loaded by "sun.misc.Launcher$AppClassLoader @ 0x80021b28" occupy 473,116,384 (32.86%) bytes.

Keywords
com.alibaba.dubbo.common.URL
sun.misc.Launcher$AppClassLoader @ 0x80021b28
Details »
Problem Suspect 3

30,458 instances of "com.alibaba.dubbo.registry.integration.RegistryDirectory", loaded by "sun.misc.Launcher$AppClassLoader @ 0x80021b28" occupy 454,550,800 (31.57%) bytes. These instances are referenced from one instance of "java.util.concurrent.ConcurrentHashMap$Node[]", loaded by ""

Keywords
java.util.concurrent.ConcurrentHashMap$Node[]
sun.misc.Launcher$AppClassLoader @ 0x80021b28
com.alibaba.dubbo.registry.integration.RegistryDirectory
Details »
Problem Suspect 4

1,638,063 instances of "java.lang.String", loaded by "" occupy 200,439,840 (13.92%) bytes.

Keywords
java.lang.String
Details »
Hint 1

The problem suspects 1 and 3 may be related, because the reference chains to them have a common beginning.
Details »

@zhanghw89
Copy link
Author

其中的解决方法是,将服务拆分成尽可能小的粒度进行服务发布,这样就能避免在同一时间有大量的垃圾造成consumer的fullGC。

@YoungHu
Copy link
Contributor

YoungHu commented Mar 21, 2017

因为消费端一直在监听zk的服务节点,你同时注销然后注册300个服务,消费端也需要销毁之前建立的实例重新new实例出来。内存消耗比较大

@YoungHu
Copy link
Contributor

YoungHu commented Mar 21, 2017

我这里也有一个问题。dubbo注册服务的时候zk的最细路径到接口名称,我注册服务用同一个接口,但是group名称不一样,这样会导致的问题就是在我服务端重启的时候,消费端更新服务信息的时候会把zk服务器的流量打满,导致短时服务不可用的结果。比如我一个接口,有100个实现,都是通过group来区分并发布服务的,1台机器的话,zk的path下面list的size为100,一个元素的信息量大小大概是1K,服务端重启,每注册一次接口,消费端都要去读取一次zk的节点数据,读取一次100K,注册100个服务就要读取100次,完成一次重启,zk写出流量就是10M,这是1台服务端服务器重启,集群的话重启一次的流量就是10MxN(消费者节点数)xM(提供者节点数),除了服务拆分,有没有更好的办法

@qct
Copy link
Contributor

qct commented Mar 21, 2017

和这个是一个问题 #306

@YoungHu
Copy link
Contributor

YoungHu commented May 28, 2017

最后这个问题还是解决了,目前生产运行没有什么问题。我的解决方式就是在zk的interface下面再加group和version的节点,这样服务更新的时候dubbo就能依据interface+group+version准确的进行通知,减少重复数据同步占用带宽。

@foreveryang321
Copy link
Contributor

@YoungHu 请问一下,这个“在zk的interface下面再加group和version的节点”是怎么配置的,没明白这句话的意思

@Tong-c
Copy link

Tong-c commented Aug 5, 2017

@foreveryang321 ,http://dubbo.io/user-guide/reference-registry/zookeeper.html
文檔下面提到了group,而對外提供的服務接口里可以設置version,也可以設置group
,http://dubbo.io/user-guide/reference-xmlconf/dubbo-service.html

@taige
Copy link

taige commented Aug 8, 2017

Proxy.java有内存泄露的BUG。
试试这个fix:taige@f869f0f

@chickenlj
Copy link
Contributor

@taige 这个泄露的原理能解释下吗?

@diecui1202
Copy link

我们线上的环境是一个应用里provider大概有300个,provider应用台数大概有5台左右,对全部的5台provider应用同时进行重启,在这个时候consumer持续进行服务调用,就会发生consumer端的fullGC
////
You need to restart the app by groups. For example, 3 groups, 2, 2, 1.

Make sure there has providers online anytime.

@diecui1202
Copy link

Feel free to reopen it. &READY-TO-CLOSE&

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants