存在一些数据丢失现象，如何排查 #47

JessonChan · 2020-03-30T00:35:05Z

如下图示。

报送日志，如下，是持续的过程。

collector组件无错误日志，应该如何定位问题？

UlricQin · 2020-03-30T03:33:01Z

上报的数据的timestamp字段也打印一下

另外就是看transfer、tsdb的日志

JessonChan · 2020-03-30T04:01:21Z

timestamp 没有问题，直接使用time.Now().Uinx()，所以绝大多数请求正确。
transfer，无错误日志。
tsdb只有最近2个小时日志，虽然有报错，但是好像无关。

UlricQin · 2020-03-30T04:08:03Z

这个错误有可能是原因，tsdb会把数据落盘存储，sync_disk就是在落盘，落盘失败，数据就断点了。你是本地虚拟机么？硬盘有什么特殊的么？

JessonChan · 2020-03-30T05:19:58Z

是我的测试机器。但是这个报错并不影响所有的数据，看这个指表就是正常的。

UlricQin · 2020-03-30T10:01:30Z

嗯，一个监控指标一个rrd文件，如果部分rrd文件有问题，其他的是不受影响的。从逻辑上来说，代码是一套，如果是代码的问题，有指标不正常应该全部不正常。但是这里只有部分不正常。

所以，坦白讲，我也没有好思路

JessonChan · 2020-03-30T11:30:21Z

OK，没有问题，我抽时间研究下，如果有结果再和你沟通。

JessonChan · 2020-04-10T00:11:18Z

我的线上服务也出现这个问题，记录下，暂时还没有排查到原因。

JessonChan · 2020-04-10T09:06:39Z

排查到一些线索
2020-04-10 15:31:24.393253 WARNING rpc/push.go:61 push obj error, obj: <Endpoint:host Metric:key, Tags:, TagsMap:map[], Value:0, TS:1586503740 2020-04-10 15:29:00 DsType:GAUGE, Step:60, Heartbeat:120, Min:U, Max:U>, error: data @1586502900, timestamp old than previous chunk. currentchunk t0: 1586503800

UlricQin · 2020-04-10T10:05:04Z

服务端收到的数据是：先收到了一条新数据，又收到了一条老数据，但是监控数据是要求有时序的。所以报错。

这是自己推送的数据？时间获取的是否有问题？机器时间同步了么？

JessonChan · 2020-04-10T11:53:48Z

是自己推送的数据，时间获取没有问题，机器时间也是同步的。
也就是要找到服务端收到数据无续的原因，就能解决问题了？

JessonChan · 2020-04-10T12:07:54Z

我已经调整了推送代码，如果有进一步结论再同步。

JessonChan · 2020-04-11T08:49:46Z

目前看，问题并没有得到解决。

UlricQin · 2020-04-11T08:56:37Z

你是所有指标都有问题，还是只有部分有问题，有规律么

UlricQin · 2020-04-11T08:58:23Z

哪一个指标有问题，需要去查所有链路上的组件，比如是插件上报的指标abc，就要从collector开始查，到transfer、tsdb，看abc这个指标到底上报对了么，是否在哪个环节出问题

JessonChan · 2020-04-11T11:16:06Z

好的，我按你说的做个全量的review。目前看只是部分指标的问题。

JessonChan · 2020-04-16T01:30:51Z

重启n9e-tsdb后，目前还没有复现过问题。

UlricQin · 2020-04-16T01:56:38Z

你是自己搭建的虚机测试的？还是在正式生产环境的机器测试的？看现象不是软件的问题，像是环境的问题

JessonChan · 2020-04-16T02:57:41Z

是正式环境，用的阿里云ESC

UlricQin · 2020-04-17T11:10:11Z

这个issue先关了，后面如果还有问题，把各块日志都贴出来，重开一个issue再看。这个问题略诡异。

supervisoredis · 2020-11-26T10:35:12Z

我这里重启tsdb也不能解决这个问题

UlricQin · 2020-11-30T07:24:33Z

我这里重启tsdb也不能解决这个问题

不行试试3.3.0版本，用M3DB作为存储引擎试试，rrdtool看起来在有些场景下有问题

JessonChan closed this as completed Mar 30, 2020

JessonChan reopened this Apr 10, 2020

JessonChan closed this as completed Apr 10, 2020

JessonChan reopened this Apr 10, 2020

UlricQin closed this as completed Apr 17, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

存在一些数据丢失现象，如何排查 #47

存在一些数据丢失现象，如何排查 #47

JessonChan commented Mar 30, 2020

UlricQin commented Mar 30, 2020 •

edited

JessonChan commented Mar 30, 2020

UlricQin commented Mar 30, 2020 •

edited

JessonChan commented Mar 30, 2020

UlricQin commented Mar 30, 2020

JessonChan commented Mar 30, 2020

JessonChan commented Apr 10, 2020

JessonChan commented Apr 10, 2020

UlricQin commented Apr 10, 2020

JessonChan commented Apr 10, 2020

JessonChan commented Apr 10, 2020

JessonChan commented Apr 11, 2020

UlricQin commented Apr 11, 2020

UlricQin commented Apr 11, 2020

JessonChan commented Apr 11, 2020

JessonChan commented Apr 16, 2020

UlricQin commented Apr 16, 2020

JessonChan commented Apr 16, 2020

UlricQin commented Apr 17, 2020

supervisoredis commented Nov 26, 2020

UlricQin commented Nov 30, 2020

存在一些数据丢失现象，如何排查 #47

存在一些数据丢失现象，如何排查 #47

Comments

JessonChan commented Mar 30, 2020

UlricQin commented Mar 30, 2020 • edited

JessonChan commented Mar 30, 2020

UlricQin commented Mar 30, 2020 • edited

JessonChan commented Mar 30, 2020

UlricQin commented Mar 30, 2020

JessonChan commented Mar 30, 2020

JessonChan commented Apr 10, 2020

JessonChan commented Apr 10, 2020

UlricQin commented Apr 10, 2020

JessonChan commented Apr 10, 2020

JessonChan commented Apr 10, 2020

JessonChan commented Apr 11, 2020

UlricQin commented Apr 11, 2020

UlricQin commented Apr 11, 2020

JessonChan commented Apr 11, 2020

JessonChan commented Apr 16, 2020

UlricQin commented Apr 16, 2020

JessonChan commented Apr 16, 2020

UlricQin commented Apr 17, 2020

supervisoredis commented Nov 26, 2020

UlricQin commented Nov 30, 2020

UlricQin commented Mar 30, 2020 •

edited

UlricQin commented Mar 30, 2020 •

edited