-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] FE 频繁打满64G内存导致宕机,集群上只有Broker Load在定时执行,过一段时间内存就满了 #27594
Comments
遇到同样的问题,doris 2.0.2 release |
只改了-xmx大小 其他都是默认JVM配置 |
Using G1GC |
thanks, i will try |
btw, i used broker 2.0.2, not 2.0.1.1. |
Maybe you need to turn off the global profile |
After reviewing the source code, the default max_query_profile_num seems to be 100, so it would't keep pushing profile into memory? |
I don’t have a detailed understanding yet. You can continue to follow or provide more detailed log information. |
Has this memory problem affected usage? In addition, will the memory be lost by gc? |
|
We have encountered the same problem.and now we also closed the profile.yestoday wo dump the jvm data,DBA is analyzing |
Due to the impact of dumping on the normal use of the cluster, we did not dump the JVM data. If you discover anything after dumping, please share the specific situation here. @wj215318 Thanks! |
@wj215318 btw, 2.0.2 release don't have this problem. |
一样的问题,minor gc的频率跟不上老年代增长的速度,最后三个fe节点全部查询排队超时卡死宕机,建议用prometheus+grafana监控fe的JVM,看看到底问题出在哪,顺便改下你的参数,年轻代等于老年代的1/3,并且不要用-XX:NewRatio=3这种,而是固定设置成-Xmn16G,打开CMS的并行重标记,不然minor gc那点时间这么多内存根本标记不完,然后调低CMS初始化时的内存占比,80%太靠后了,可能gc没完成服务就down了,可以改成60或者65,实测有效,我的集群调整完至今没有fe宕机 |
修改后的启动参数可以分享下 |
JAVA_OPTS="-server -Xmx64g -Xmn16g -Xms32g -XX:+UseMembar -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=15 -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:+CMSClassUnloadingEnabled -XX:+CMSParallelRemarkEnabled -XX:CMSInitiatingOccupancyFraction=65 -XX:SoftRefLRUPolicyMSPerMB=0 -Xloggc:$DORIS_HOME/log/fe.gc.log.$DATE" @zhbdesign 具体内存大小根据机器的实际值来设置 |
dump下来的文件可以上传上来 |
Search before asking
Version
版本 2.0.1.1 release
What's Wrong?
下图是内存使用情况,内存无法回收,每次需要重启,过一段时间又满了:
版本 2.0.1.1 release
JVM -xmx64g
宕机前gc日志:
宕机前fe.log:
来自该机器的sql只有broker load,show load,show partitions,drop partition,add partition这几类。
What You Expected?
什么原因导致FE 宕机,应该不是64g内存也不够吧
How to Reproduce?
No response
Anything Else?
No response
Are you willing to submit PR?
Code of Conduct
The text was updated successfully, but these errors were encountered: