Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

由1.4.2升级到1.4.6后,在启动内存配置参数不变的情况下,Full GC时间增加了大约十几倍。 #10787

Closed
wyt opened this issue Jul 14, 2023 · 18 comments
Labels
kind/bug Category issues or prs related to bug.
Milestone

Comments

@wyt
Copy link

wyt commented Jul 14, 2023

升级前:
2023-05-13T11:11:03.871+0800: 23199291.478: [Full GC (Ergonomics) [PSYoungGen: 19168K->0K(4174848K)] [ParOldGen: 6290293K->1005684K(6291456K)] 6309461K->1005684K(10466304K), [Metaspace: 87461K->87461K(1130496K)], 0.1700654 secs] [Times: user=0.62 sys=0.12, real=0.17 secs]
2023-05-13T13:51:43.024+0800: 23208930.631: [Full GC (Ergonomics) [PSYoungGen: 18912K->0K(4174336K)] [ParOldGen: 6290870K->1005100K(6291456K)] 6309782K->1005100K(10465792K), [Metaspace: 87461K->87461K(1130496K)], 0.1707284 secs] [Times: user=0.61 sys=0.12, real=0.17 secs]
2023-05-13T16:32:24.626+0800: 23218572.233: [Full GC (Ergonomics) [PSYoungGen: 18720K->0K(4174336K)] [ParOldGen: 6289967K->1012340K(6291456K)] 6308687K->1012340K(10465792K), [Metaspace: 87461K->87461K(1130496K)], 0.1704050 secs] [Times: user=0.74 sys=0.00, real=0.17 secs]
2023-05-13T19:12:52.195+0800: 23228199.802: [Full GC (Ergonomics) [PSYoungGen: 19072K->0K(4174848K)] [ParOldGen: 6290958K->1008397K(6291456K)] 6310030K->1008397K(10466304K), [Metaspace: 87461K->87461K(1130496K)], 0.1670840 secs] [Times: user=0.73 sys=0.00, real=0.17 secs]
2023-05-13T21:53:26.760+0800: 23237834.367: [Full GC (Ergonomics) [PSYoungGen: 18912K->0K(4174848K)] [ParOldGen: 6290160K->1013505K(6291456K)] 6309072K->1013505K(10466304K), [Metaspace: 87461K->87461K(1130496K)], 0.1705765 secs] [Times: user=0.75 sys=0.00, real=0.17 secs]
2023-05-14T00:33:36.515+0800: 23247444.122: [Full GC (Ergonomics) [PSYoungGen: 19136K->0K(4174848K)] [ParOldGen: 6290931K->1016372K(6291456K)] 6310067K->1016372K(10466304K), [Metaspace: 87461K->87461K(1130496K)], 0.1711714 secs] [Times: user=0.74 sys=0.00, real=0.17 secs]
2023-05-14T03:14:16.425+0800: 23257084.032: [Full GC (Ergonomics) [PSYoungGen: 18848K->0K(4174848K)] [ParOldGen: 6290592K->1021930K(6291456K)] 6309440K->1021930K(10466304K), [Metaspace: 87461K->87461K(1130496K)], 0.1642861 secs] [Times: user=0.61 sys=0.13, real=0.16 secs]
2023-05-14T05:54:22.967+0800: 23266690.574: [Full GC (Ergonomics) [PSYoungGen: 19072K->0K(4174336K)] [ParOldGen: 6289543K->1024258K(6291456K)] 6308615K->1024258K(10465792K), [Metaspace: 87461K->87461K(1130496K)], 0.1678808 secs] [Times: user=0.60 sys=0.12, real=0.17 secs]
2023-05-14T08:34:14.287+0800: 23276281.894: [Full GC (Ergonomics) [PSYoungGen: 18816K->0K(4174848K)] [ParOldGen: 6290728K->1035862K(6291456K)] 6309544K->1035862K(10466304K), [Metaspace: 87461K->87461K(1130496K)], 0.1723334 secs] [Times: user=0.62 sys=0.13, real=0.17 secs]
2023-05-14T11:13:54.941+0800: 23285862.548: [Full GC (Ergonomics) [PSYoungGen: 18880K->0K(4174336K)] [ParOldGen: 6291083K->1036786K(6291456K)] 6309963K->1036786K(10465792K), [Metaspace: 87461K->87461K(1130496K)], 0.1718653 secs] [Times: user=0.74 sys=0.00, real=0.17 secs]
2023-05-14T13:53:36.794+0800: 23295444.401: [Full GC (Ergonomics) [PSYoungGen: 18816K->0K(4174336K)] [ParOldGen: 6289983K->1037363K(6291456K)] 6308799K->1037363K(10465792K), [Metaspace: 87461K->87461K(1130496K)], 0.1684250 secs] [Times: user=0.73 sys=0.00, real=0.17 secs]
2023-05-14T16:31:49.117+0800: 23304936.723: [Full GC (Ergonomics) [PSYoungGen: 18880K->0K(4174336K)] [ParOldGen: 6289020K->1073979K(6291456K)] 6307900K->1073979K(10465792K), [Metaspace: 87461K->87461K(1130496K)], 0.1959986 secs] [Times: user=0.82 sys=0.00, real=0.20 secs]
2023-05-14T19:10:24.719+0800: 23314452.325: [Full GC (Ergonomics) [PSYoungGen: 19008K->0K(4174336K)] [ParOldGen: 6289498K->1087454K(6291456K)] 6308506K->1087454K(10465792K), [Metaspace: 87461K->87461K(1130496K)], 0.1704064 secs] [Times: user=0.62 sys=0.13, real=0.17 secs]
2023-05-14T21:48:49.638+0800: 23323957.245: [Full GC (Ergonomics) [PSYoungGen: 18720K->0K(4174848K)] [ParOldGen: 6290375K->1091221K(6291456K)] 6309095K->1091221K(10466304K), [Metaspace: 87461K->87461K(1130496K)], 0.1704408 secs] [Times: user=0.63 sys=0.13, real=0.17 secs]
2023-05-15T00:26:59.459+0800: 23333447.066: [Full GC (Ergonomics) [PSYoungGen: 18752K->0K(4174848K)] [ParOldGen: 6289737K->1087750K(6291456K)] 6308489K->1087750K(10466304K), [Metaspace: 87461K->87461K(1130496K)], 0.1644770 secs] [Times: user=0.61 sys=0.12, real=0.17 secs]
2023-05-15T03:04:49.616+0800: 23342917.223: [Full GC (Ergonomics) [PSYoungGen: 18496K->0K(4174848K)] [ParOldGen: 6290349K->1009804K(6291456K)] 6308845K->1009804K(10466304K), [Metaspace: 87461K->87461K(1130496K)], 0.2883000 secs] [Times: user=1.61 sys=0.00, real=0.29 secs]
2023-05-15T05:45:19.259+0800: 23352546.866: [Full GC (Ergonomics) [PSYoungGen: 18592K->0K(4174848K)] [ParOldGen: 6289310K->1011289K(6291456K)] 6307902K->1011289K(10466304K), [Metaspace: 87461K->87461K(1130496K)], 0.1642978 secs] [Times: user=0.58 sys=0.12, real=0.17 secs]
2023-05-15T08:25:48.615+0800: 23362176.222: [Full GC (Ergonomics) [PSYoungGen: 19168K->0K(4174848K)] [ParOldGen: 6289734K->1015392K(6291456K)] 6308902K->1015392K(10466304K), [Metaspace: 87461K->87461K(1130496K)], 0.1688850 secs] [Times: user=0.72 sys=0.00, real=0.17 secs]

升级后:

2023-07-14T00:36:43.955+0800: 1560858.593: [Full GC (Ergonomics) [PSYoungGen: 9600K->0K(4183040K)] [ParOldGen: 6288948K->2244369K(6291456K)] 6298548K->2244369K(10474496K), [Metaspace: 84355K->84355K(1128448K)], 1.3631063 secs] [Times: user=10.17 sys=0.00, real=1.36 secs]
2023-07-14T01:31:37.485+0800: 1564152.123: [Full GC (Ergonomics) [PSYoungGen: 9952K->0K(4183040K)] [ParOldGen: 6289357K->2286632K(6291456K)] 6299309K->2286632K(10474496K), [Metaspace: 84355K->84355K(1128448K)], 1.3682990 secs] [Times: user=9.58 sys=0.61, real=1.37 secs]
2023-07-14T02:25:32.474+0800: 1567387.112: [Full GC (Ergonomics) [PSYoungGen: 9824K->0K(4181504K)] [ParOldGen: 6286228K->2302335K(6291456K)] 6296052K->2302335K(10472960K), [Metaspace: 84355K->84355K(1128448K)], 1.3964071 secs] [Times: user=8.39 sys=2.02, real=1.40 secs]
2023-07-14T03:19:55.375+0800: 1570650.013: [Full GC (Ergonomics) [PSYoungGen: 9728K->0K(4183040K)] [ParOldGen: 6286803K->2298780K(6291456K)] 6296531K->2298780K(10474496K), [Metaspace: 84355K->84355K(1128448K)], 1.3909473 secs] [Times: user=10.22 sys=0.00, real=1.39 secs]
2023-07-14T04:14:24.521+0800: 1573919.159: [Full GC (Ergonomics) [PSYoungGen: 10016K->0K(4181504K)] [ParOldGen: 6286520K->2295721K(6291456K)] 6296536K->2295721K(10472960K), [Metaspace: 84355K->84355K(1128448K)], 1.4162378 secs] [Times: user=9.35 sys=1.20, real=1.41 secs]
2023-07-14T05:09:00.785+0800: 1577195.423: [Full GC (Ergonomics) [PSYoungGen: 9600K->0K(4183040K)] [ParOldGen: 6288843K->2304364K(6291456K)] 6298443K->2304364K(10474496K), [Metaspace: 84355K->84355K(1128448K)], 1.3999174 secs] [Times: user=8.48 sys=2.01, real=1.40 secs]
2023-07-14T06:02:47.382+0800: 1580422.021: [Full GC (Ergonomics) [PSYoungGen: 9920K->0K(4183552K)] [ParOldGen: 6287748K->2090612K(6291456K)] 6297668K->2090612K(10475008K), [Metaspace: 84355K->84355K(1128448K)], 1.4470640 secs] [Times: user=9.60 sys=0.00, real=1.44 secs]
2023-07-14T07:00:04.113+0800: 1583858.751: [Full GC (Ergonomics) [PSYoungGen: 10112K->0K(4182528K)] [ParOldGen: 6286867K->2123146K(6291456K)] 6296979K->2123146K(10473984K), [Metaspace: 84355K->84355K(1128448K)], 1.4455848 secs] [Times: user=8.65 sys=2.06, real=1.44 secs]
2023-07-14T07:56:28.038+0800: 1587242.676: [Full GC (Ergonomics) [PSYoungGen: 9696K->0K(4183040K)] [ParOldGen: 6287276K->2137543K(6291456K)] 6296972K->2137543K(10474496K), [Metaspace: 84355K->84355K(1128448K)], 1.4331709 secs] [Times: user=10.69 sys=0.00, real=1.44 secs]
2023-07-14T08:52:45.242+0800: 1590619.880: [Full GC (Ergonomics) [PSYoungGen: 9504K->0K(4183552K)] [ParOldGen: 6287306K->2141503K(6291456K)] 6296811K->2141503K(10475008K), [Metaspace: 84355K->84355K(1128448K)], 1.4394211 secs] [Times: user=10.76 sys=0.00, real=1.44 secs]
2023-07-14T09:48:49.616+0800: 1593984.254: [Full GC (Ergonomics) [PSYoungGen: 9952K->0K(4183040K)] [ParOldGen: 6289126K->2150238K(6291456K)] 6299078K->2150238K(10474496K), [Metaspace: 84356K->84356K(1128448K)], 1.4179921 secs] [Times: user=10.61 sys=0.00, real=1.42 secs]
2023-07-14T10:41:06.558+0800: 1597121.196: [Full GC (Ergonomics) [PSYoungGen: 9568K->0K(4182016K)] [ParOldGen: 6289797K->2142933K(6291456K)] 6299365K->2142933K(10473472K), [Metaspace: 84363K->84363K(1128448K)], 1.4261841 secs] [Times: user=10.58 sys=0.00, real=1.43 secs]

启动内存参数:

-Xms10g -Xmx10g -Xmn4g -XX:MetaspaceSize=128m -XX:MaxMetaspaceSize=320m -XX:-OmitStackTraceInFastThrow -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/data/apps/nacos/logs/java_heapdump.hprof -XX:-UseLargePages -Dnacos.member.list= -Djava.ext.dirs=/usr/java/jdk1.8.0_191-amd64//jre/lib/ext:/usr/java/jdk1.8.0_191-amd64//lib/ext -Xloggc:/data/apps nacos/logs/nacos_gc.log -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=100M -Dloader.path=/data/apps/nacos/plugins/health,/data/apps/nacos/plugins/cmdb -Dnacos.home=/data/apps/nacos -jar /data/apps/nacos/target/nacos-server.jar --spring.config.additional-location=file:/data/apps/nacos/conf/ --logging.config=/data/apps/nacos/conf/nacos-logback.xml --server.max-http-header-size=524288 nacos.nacos

由于Full GC时间过长,造成客户端链接超时:

com.alibaba.nacos.api.exception.NacosException: org.apache.http.conn.ConnectTimeoutException: Connect to x.x.x.x:8848 [/x.x.x.x] failed: connect timed out at com.alibaba.nacos.client.naming.net.NamingProxy.callServer(NamingProxy.java:615) at com.alibaba.nacos.client.naming.net.NamingProxy.reqApi(NamingProxy.java:526) at com.alibaba.nacos.client.naming.net.NamingProxy.reqApi(NamingProxy.java:498) at com.alibaba.nacos.client.naming.net.NamingProxy.sendBeat(NamingProxy.java:433) at com.alibaba.nacos.client.naming.beat.BeatReactor$BeatTask.run(BeatReactor.java:167) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)

怀疑新版本代码改动造成此问题

@wyt
Copy link
Author

wyt commented Jul 14, 2023

@KomachiSion

@wyt wyt changed the title 有1.4.2升级到1.4.6后,在启动内存配置参数不变的情况下,Full GC时间增加了大约十几倍。 由1.4.2升级到1.4.6后,在启动内存配置参数不变的情况下,Full GC时间增加了大约十几倍。 Jul 14, 2023
@KomachiSion
Copy link
Collaborator

这两个版本没有太大变化, 只有修复一些bug, 可以自行确认changelog。 可能是升级过后长轮询不均匀导致的。 可以看下config-memory, 每台节点的量和之前的对比。

@wyt
Copy link
Author

wyt commented Jul 17, 2023

这两个版本没有太大变化, 只有修复一些bug, 可以自行确认changelog。 可能是升级过后长轮询不均匀导致的。 可以看下config-memory, 每台节点的量和之前的对比。

在哪查?nacos grafna里?

@wyt
Copy link
Author

wyt commented Jul 18, 2023

@KomachiSion

@KomachiSion
Copy link
Collaborator

这两个版本没有太大变化, 只有修复一些bug, 可以自行确认changelog。 可能是升级过后长轮询不均匀导致的。 可以看下config-memory, 每台节点的量和之前的对比。

config-memory.log, naming-performance.log

如果有grafana,可以看GC的节点的用量监控指标和升级前的区别。

@wyt
Copy link
Author

wyt commented Jul 20, 2023

@wyt
Copy link
Author

wyt commented Jul 20, 2023

avatar

堆内存都是这种byte[]请求对象,full gc都回收不掉,导致full gc时间长且频繁

@KomachiSion
Copy link
Collaborator

那就是请求数变多了,导致GC提升的,并不是版本导致的,这个对象新旧版本都是存在的,是tomcat http serverlt request的byte[]。 应该就是我猜测的,升级的时候导致了流量不均,负载高的(抗连接更多的)节点出现了频繁FullGC。可以考虑扩容或者升配了,否则集群有一个节点dump了的话,可能会导致所有节点都出现严重GC甚至影响稳定性。

@wyt
Copy link
Author

wyt commented Jul 25, 2023

在哪里看出请求增多的,单机QPS一致800左右,一共五台,实际上升级配置,增大堆内存,问题会更严重,堆内存增大,导致Full GC时停顿时间更长,长达3s - 4s

@wyt
Copy link
Author

wyt commented Jul 27, 2023

而且我看gc root,都是各种nacos线程持有,类似timer之类的,需要内存dump文件我可以提供

@KomachiSion
Copy link
Collaborator

那就是请求数变多了,导致GC提升的,并不是版本导致的,这个对象新旧版本都是存在的,是tomcat http serverlt request的byte[]。 应该就是我猜测的,升级的时候导致了流量不均,负载高的(抗连接更多的)节点出现了频繁FullGC。可以考虑扩容或者升配了,否则集群有一个节点dump了的话,可能会导致所有节点都出现严重GC甚至影响稳定性。

request的byte,对应的就是请求数或长轮询数,如果占用内存的都是byte[],且来自于tomcat,因此只能说明是流量不均导致的。或者是业务整体增长,导致更多的byte[]被生成。

@wyt
Copy link
Author

wyt commented Aug 1, 2023

我们准备回退到1.4.2了,看看效果

@wyt
Copy link
Author

wyt commented Aug 3, 2023

那就是请求数变多了,导致GC提升的,并不是版本导致的,这个对象新旧版本都是存在的,是tomcat http serverlt request的byte[]。 应该就是我猜测的,升级的时候导致了流量不均,负载高的(抗连接更多的)节点出现了频繁FullGC。可以考虑扩容或者升配了,否则集群有一个节点dump了的话,可能会导致所有节点都出现严重GC甚至影响稳定性。

request的byte,对应的就是请求数或长轮询数,如果占用内存的都是byte[],且来自于tomcat,因此只能说明是流量不均导致的。或者是业务整体增长,导致更多的byte[]被生成。

回退报错,只能扩增节点,增加了三台,还是老样子,byte[]这些对象都是被nacos相关的线程持有不释放,1.4.2版本就没有这些问题。

@KomachiSion
Copy link
Collaborator

你可以看下1.4.2和1.4.6版本的change log,看下是哪些改动可能导致这个问题,欢迎提交PR修复。

@wyt
Copy link
Author

wyt commented Aug 4, 2023

好的

wyt added a commit to wyt/nacos that referenced this issue Aug 9, 2023
…e creation of a large number of sessions.

Signed-off-by: wangyongtao <wangyongtao@yiche.com>
@wyt
Copy link
Author

wyt commented Aug 9, 2023

经过和同事压测排查,发现此问题是由1.4.3版本引入的bug:
#7252
导致大量不必要的session被创建,引发内存性能问题。

@wyt
Copy link
Author

wyt commented Aug 9, 2023

@KomachiSion 麻烦review下,没问题发个版,我们好升级修复

KomachiSion pushed a commit that referenced this issue Aug 10, 2023
#10937)

* [ISSUE #10787] Fixed the memory performance issue caused by the creation of a large number of sessions.

Signed-off-by: wangyongtao <wangyongtao@yiche.com>

* Remove unused import.

Signed-off-by: wangyongtao <wangyongtao@yiche.com>

---------

Signed-off-by: wangyongtao <wangyongtao@yiche.com>
@KomachiSion KomachiSion added kind/bug Category issues or prs related to bug. and removed status/need feedback labels Aug 10, 2023
@KomachiSion KomachiSion modified the milestones: 1.4.6, 1.4.7 Aug 10, 2023
@KomachiSion
Copy link
Collaborator

#10937

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Category issues or prs related to bug.
Projects
None yet
Development

No branches or pull requests

2 participants