Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

断网follower重连后,install leader snapshot会导致数据错误 #10406

Closed
Lin-1997 opened this issue Apr 28, 2023 · 1 comment
Closed
Labels
area/Naming kind/bug Category issues or prs related to bug.
Milestone

Comments

@Lin-1997
Copy link

Lin-1997 commented Apr 28, 2023

nacos: 2.2.1

三节点集群,follower节点A断网,但程序还在内存中运行,内存的所有数据不会丢失
一段时间后节点A网络恢复,从leader节点B中拉去数据,部分数据已被压缩,因此从leader节点B installSnapshot,流程如下

PersistentClientOperationServiceImpl.PersistentInstanceSnapshotOperation#readSnapshot  -->
loadSnapshot  -->  loadSyncDataToClient  -->
client.putServiceInstance && NotifyCenter publish ClientRegisterServiceEvent

ClientServiceIndexesManager 接收到 ClientRegisterServiceEvent -->  addPublisherIndexes

因此这段期间,service增加的instance可以被应用到ClientServiceIndexesManager。
但是service删除的instance没有应用到ClientServiceIndexesManager中,从未对比过内存数据与接收的snapshot数据差异,没有重新清空内存数据,也没有发布过ClientDeregisterServiceEvent。

内存中MetricsMonitor的统计数据也没有丢失,但是读snapshot时触发的client.putServiceInstance会重复增加MetricsMonitor.ipCount,导致统计数据也会错乱。

PersistentIpPortClientManager#loadFromSnapshot,直接oldClients.clear();,没有调用IpPortBasedClient#release,ClientBeatCheckTaskV2或者HealthCheckTaskV2会内存泄漏吧

@KomachiSion KomachiSion added kind/bug Category issues or prs related to bug. area/Naming labels May 8, 2023
@KomachiSion KomachiSion added this to the 2.3.0 milestone May 11, 2023
@Daydreamer-ia
Copy link
Contributor

i will resolve it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/Naming kind/bug Category issues or prs related to bug.
Projects
None yet
Development

No branches or pull requests

3 participants