Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nacos beta to full release, client access configuration inconsistency #5437

Closed
CherishCai opened this issue Apr 21, 2021 · 1 comment · Fixed by #5440
Closed

Nacos beta to full release, client access configuration inconsistency #5437

CherishCai opened this issue Apr 21, 2021 · 1 comment · Fixed by #5440
Labels
area/Config kind/bug Category issues or prs related to bug.
Milestone

Comments

@CherishCai
Copy link
Contributor

CherishCai commented Apr 21, 2021

灰度到全量发布,客户端获取配置不一致问题

使用 Nacos 版本 1.4.1

Describe the bug
配置发布勾选 beta;beta 验证结束后进行全量发布,但是客户端侧却获取到 beta 前的旧值。
Configure the beta release; after the beta verification is over, the full release will be performed, but the client side gets the old value before the beta.

Expected behavior
beta 后全量发布,全部客户端获取到最新值。
Full release after beta, all clients get the latest value.

Acutally behavior
beta 后全量发布,偶现客户端获取到 beta 前的旧值。
Full release after beta, occasionally the client gets the old value before beta.

How to Reproduce

  1. 起一个客户端监听 对应的 dataId 配置;
  2. 然后对目标配置 进行 beta 发布;
  3. 再执行 beta 的全量发布;
  4. 未出现问题则重复 2-3。

  1. Start a client to monitor the corresponding dataId configuration;
  2. Then carry out beta release to the target configuration;
  3. Re-execute the full release of beta;
  4. If there is no problem, repeat 2-3.

Additional context
其中一次变更导致不一致时刻,各个节点日志信息。
At the time when one of the changes caused inconsistency, each node log information.

[root@nacos-0 logs]# grep '2021-04-12 20:47:06' *.log
config-trace.log:2021-04-12 20:47:06,302|nacos-0.nacos-headless.default.svc.cluster.local|bkqiya.properties|APP|antbank-sg|null|1618231626302|nacos-0.nacos-headless.default.svc.cluster.local|notify|ok|0|nacos-0.nacos-headless.default.svc.cluster.local:8848
config-trace.log:2021-04-12 20:47:06,303|nacos-0.nacos-headless.default.svc.cluster.local|bkqiya.properties|APP|antbank-sg|null|1618231626302|nacos-0.nacos-headless.default.svc.cluster.local|notify|ok|1|nacos-2.nacos-headless.default.svc.cluster.local:8848
config-trace.log:2021-04-12 20:47:06,303|nacos-0.nacos-headless.default.svc.cluster.local|bkqiya.properties|APP|antbank-sg|null|1618231626302|nacos-0.nacos-headless.default.svc.cluster.local|notify|ok|1|nacos-1.nacos-headless.default.svc.cluster.local:8848

config-trace.log:2021-04-12 20:47:06,344|nacos-0.nacos-headless.default.svc.cluster.local|bkqiya.properties|APP|antbank-sg|null|1618231626302|nacos-0.nacos-headless.default.svc.cluster.local|dump|remove-ok|42|0

config-client-request.log:2021-04-12 20:47:06,943|29500|timeout|yy.yy.yy.yy|polling|4|212

[root@nacos-1 logs]# grep '2021-04-12 20:47:06' *.log
config-memory.log:2021-04-12 20:47:06,080 INFO toNotifyTaskSize = 0
config-memory.log:2021-04-12 20:47:06,081 INFO groupCount = 241, subscriberClientCount = 0, subscriberCount = 0
config-memory.log:2021-04-12 20:47:06,081 INFO [long-pulling] client count 1

config-client-request.log:2021-04-12 20:47:06,244|10|true|xx.xx.xx.xx|publish|bkqiya.properties|APP|antbank-sg|8f84adafaf354872830d2aca2397ea6e|null

config-trace.log:2021-04-12 20:47:06,244|nacos-1.nacos-headless.default.svc.cluster.local|bkqiya.properties|APP|antbank-sg|null|1618231626234|nacos-1.nacos-headless.default.svc.cluster.local|persist|pub|-1|8f84adafaf354872830d2aca2397ea6e

config-trace.log:2021-04-12 20:47:06,245|nacos-1.nacos-headless.default.svc.cluster.local|bkqiya.properties|APP|antbank-sg|null|1618231626234|nacos-1.nacos-headless.default.svc.cluster.local|notify|ok|11|nacos-1.nacos-headless.default.svc.cluster.local:8848
config-trace.log:2021-04-12 20:47:06,245|nacos-1.nacos-headless.default.svc.cluster.local|bkqiya.properties|APP|antbank-sg|null|1618231626234|nacos-1.nacos-headless.default.svc.cluster.local|notify|ok|11|nacos-2.nacos-headless.default.svc.cluster.local:8848
config-trace.log:2021-04-12 20:47:06,245|nacos-1.nacos-headless.default.svc.cluster.local|bkqiya.properties|APP|antbank-sg|null|1618231626234|nacos-1.nacos-headless.default.svc.cluster.local|notify|ok|11|nacos-0.nacos-headless.default.svc.cluster.local:8848

config-trace.log:2021-04-12 20:47:06,308|nacos-1.nacos-headless.default.svc.cluster.local|bkqiya.properties|APP|antbank-sg|null|1618231626302|nacos-0.nacos-headless.default.svc.cluster.local|dump|remove-ok|6|0


[root@nacos-2 logs]# grep '2021-04-12 20:47:06' *.log
config-client-request.log:2021-04-12 20:47:06,326|0|in-advance|xx.xx.xx.xx|polling|4|232|bkqiya.properties+APP+antbank-sg
config-client-request.log:2021-04-12 20:47:06,373|0|null|xx.xx.xx.xx|get|bkqiya.properties|APP|antbank-sg|05461ab668375ff110c1c9791e6128ca|bkqiya
config-client-request.log:2021-04-12 20:47:06,514|0|null|xx.xx.xx.xx|get|bkqiya|APP|antbank-sg||bkqiya
config-client-request.log:2021-04-12 20:47:06,556|0|null|xx.xx.xx.xx|get|bkqiya.properties|APP|antbank-sg|05461ab668375ff110c1c9791e6128ca|bkqiya
config-client-request.log:2021-04-12 20:47:06,600|1|null|xx.xx.xx.xx|get|bkqiya-dev.properties|APP|antbank-sg|1928b2099033f64bb31446f8da16bd12|bkqiya

config-pull-check.log:bkqiya.properties+APP+antbank-sg|xx.xx.xx.xx|05461ab668375ff110c1c9791e6128ca|2021-04-12 20:47:06
config-pull-check.log:bkqiya.properties+APP+antbank-sg|xx.xx.xx.xx|05461ab668375ff110c1c9791e6128ca|2021-04-12 20:47:06
config-pull-check.log:bkqiya-dev.properties+APP+antbank-sg|xx.xx.xx.xx|1928b2099033f64bb31446f8da16bd12|2021-04-12 20:47:06

config-trace.log:2021-04-12 20:47:06,326|nacos-2.nacos-headless.default.svc.cluster.local|bkqiya.properties|APP|antbank-sg|null|1618231626302|nacos-0.nacos-headless.default.svc.cluster.local|dump|remove-ok|24|0

config-trace.log:2021-04-12 20:47:06,373|nacos-2.nacos-headless.default.svc.cluster.local|bkqiya.properties|APP|antbank-sg|bkqiya|1618231607589|pull|ok|18784|xx.xx.xx.xx
config-trace.log:2021-04-12 20:47:06,514|nacos-2.nacos-headless.default.svc.cluster.local|bkqiya|APP|antbank-sg|bkqiya|-1|pull|not-found|-1|xx.xx.xx.xx
config-trace.log:2021-04-12 20:47:06,556|nacos-2.nacos-headless.default.svc.cluster.local|bkqiya.properties|APP|antbank-sg|bkqiya|1618231607589|pull|ok|18967|xx.xx.xx.xx
config-trace.log:2021-04-12 20:47:06,600|nacos-2.nacos-headless.default.svc.cluster.local|bkqiya-dev.properties|APP|antbank-sg|bkqiya|1617960116758|pull|ok|271509842|xx.xx.xx.xx

@CherishCai
Copy link
Contributor Author

CherishCai commented Apr 21, 2021

明确问题,因为全量发布,会先进行新配置的 persist ,再进行 beta 配置的 remove

通过以上日志可看得出,两次调用之间的间隔少于 100ms
第一个 全量发布配置的 处理时间 config-trace.log:2021-04-12 20:47:06,244
第二个 删除beta配置的 处理时间 config-trace.log:2021-04-12 20:47:06,302

而且各节点日志只有 'dump|remove-ok' 没有 'dump|ok'
// 具体日志打印亲看 @see DumpConfigHandler.configDump

往代码上找 DumpTask 是由 DumpService.dump 加入的
image

String groupKey = GroupKey2.getKey(dataId, group, tenant);
dumpTaskMgr.addTask(groupKey, new DumpTask(groupKey, lastModified, handleIp, isBeta));

TaskManager 继承 NacosDelayTaskExecuteEngine , 而 addTask 如下
image


由以上分析,可以明确为 remove 的 task 覆盖了 persist 的 task

修复方式可以是

  1. 实现 AbstractDelayTask.merge
  2. 变更 tasks 的 key, groupKey -> xxxx

CherishCai added a commit to CherishCai/nacos that referenced this issue Apr 21, 2021
@KomachiSion KomachiSion added area/Config kind/bug Category issues or prs related to bug. labels Apr 22, 2021
@KomachiSion KomachiSion added this to the 1.4.2 milestone Apr 22, 2021
CherishCai added a commit to CherishCai/nacos that referenced this issue Apr 22, 2021
KomachiSion pushed a commit that referenced this issue Apr 22, 2021
* feat(fix-#5437): change dump-task taskKey

* feat(fix-#5437): Handle CI issues and add doc comments.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/Config kind/bug Category issues or prs related to bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants