New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
consumer接收消息时导致CPU100% #23
Comments
能够提供一下cpu100%时的线程堆栈吗?可以使用jstack pid > thread.txt |
|
我在本地测试了一下,发现大部分的线程堆栈是netty包下面的线程,netty3在windows环境下会存在这个问题,目前建议先在linux环境运行。后续会有计划升级到netty4。 |
感觉不是netty的问题,TubeMQ的consumer默认创建了8个FetchTaskWorker线程来获取message,当我试图手动设置consumerConfig.setPushFetchThreadCnt(3)时,CPU100%占用的情况明显好转了,而且最终我发现,FetchTaskWorker线程近乎死循环一样的请求Broker获取message,获取不到时,会响应404,最终走向如下逻辑分支中的case TErrCodeConstants.NOT_FOUND,
看这块的代码注释,应该根据限制配置减慢请求的速度,默认配置getMsgNotFoundWaitPeriodMs为200ms,但是下面的代码没有任何阻塞的操作。最后的errRspRelease()操作,也是以近乎死循环的往HashedWheelTimer写任务,如:
所以最终体现CPU100%的问题可能就落到了netty的HashedWheelTimer中了。 |
下面是我在执行timeouts.put(partitionKey, timer.newTimeout(new TimeoutTask(partitionKey), waitDlt, TimeUnit.MILLISECONDS));前追加的日志打印情况,如:
|
确实是FetchTaskWorker线程导致cpu100%,问题应该是出在RmtDataCache.java public PartitionSelectResult pushSelect() {
do {
if (this.isClosed.get()) {
break;
}
if (!partitionMap.isEmpty()) {
break;
}
ThreadUtils.sleep(200);
} while (true);
if (this.isClosed.get()) {
return null;
}
waitCont.incrementAndGet();
try {
rebProcessWait();
if (this.isClosed.get()) {
return null;
}
String key = indexPartition.take();
if (key == null) {
return null;
}
PartitionExt partitionExt = partitionMap.get(key);
if (partitionExt == null) {
return null;
}
long curTime = System.currentTimeMillis();
Long newTime = partitionUsedMap.putIfAbsent(key, curTime);
if (newTime != null) {
return null;
}
return new PartitionSelectResult(partitionExt,
curTime, partitionExt.getAndResetLastPackConsumed());
} catch (Throwable e1) {
return null;
} finally {
waitCont.decrementAndGet();
}
} |
Thanks @klboke |
系统:Windows10
运行环境:jdk8
开发工具:IDEA2019.1.3X64
初步定位到FetchTaskWorker 空转导致CPU 100% ,在Producer发送消息时,consumer接收消息后必现
The text was updated successfully, but these errors were encountered: