Skip to content

[Bug] TsFile Sync failure under heavy load #11428

@pedropereira98

Description

@pedropereira98

Search before asking

  • I searched in the issues and found nothing similar.

Version

Both machines running Ubuntu 20.04 and docker image of IoTDB version 1.1.1

Describe the bug and provide the minimal reproduce step

  • Start two nodes in standalone configuration
  • Set pipesink to cloud node
  • Set pipe to created pipesink
  • Perform a high insertion rate at the edge node (e.g. 1600 distinct connections inserting 1 batch of 100 measurements with 30 fields each per second)
  • Flush at the edge node to ensure no data is left unsynced

What did you expect to see?

All data inserted at the edge node to be present in the cloud node (96000 values)

What did you see instead?

Some data present at the edge node is missing on the cloud node (800 out of the 96000 values).
Edge node shows pipe with message status as WARN

2023-10-30 18:48:48,587 [pool-26-IoTDB-Sync-Pipe-pipe_0-2] INFO  o.a.i.d.s.p.TsFilePipeData:186 - Waiting for tsfile sequence-root.gps-1-2808-1698691708239-1-0-0.tsfile close, retry 10 / 10. 
2023-10-30 18:48:48,589 [pool-26-IoTDB-Sync-Pipe-pipe_0-2] ERROR o.a.i.d.s.t.c.IoTDBSyncClient:170 - Get TsFiles error, because java.io.FileNotFoundException: Can not find /iotdb/data/datanode/sync/sender/pipe_0-1698691662702/file-data/sequence-root.gps-1-2808-1698691708239-1-0-0.tsfile.resource, maybe the tsfile is not closed yet. 
java.io.FileNotFoundException: Can not find /iotdb/data/datanode/sync/sender/pipe_0-1698691662702/file-data/sequence-root.gps-1-2808-1698691708239-1-0-0.tsfile.resource, maybe the tsfile is not closed yet
	at org.apache.iotdb.db.sync.pipedata.TsFilePipeData.getTsFiles(TsFilePipeData.java:166)
	at org.apache.iotdb.db.sync.transport.client.IoTDBSyncClient.send(IoTDBSyncClient.java:164)
	at org.apache.iotdb.db.sync.transport.client.SenderManager.takePipeDataAndTransport(SenderManager.java:227)
	at org.apache.iotdb.db.sync.transport.client.SenderManager.lambda$registerDataRegion$1(SenderManager.java:266)
	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
	at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
	at java.base/java.lang.Thread.run(Unknown Source)
2023-10-30 18:48:48,590 [pool-26-IoTDB-Sync-Pipe-pipe_0-2] ERROR o.a.i.d.s.t.c.SenderManager:228 - Can not transfer PipeData TsFilePipeData{serialNumber=2, tsFilePath='/iotdb/data/datanode/sync/sender/pipe_0-1698691662702/file-data/sequence-root.gps-1-2808-1698691708239-1-0-0.tsfile'}, skip it. 
2023-10-30 18:48:48,591 [pool-26-IoTDB-Sync-Pipe-pipe_0-2] ERROR o.a.i.db.sync.SyncService:363 - Warn occurred when executing PIPE [pipe_0] because Transfer PipeData 2 error, skip it.. 

Logs on the edge node show multiples instances of the same error

Anything else?

Edge node is running in a Docker container limited to 4 CPU cores, 4GB of RAM, 44MB/s disk reads, 40MB/s disk writes, 2700 read IOps, 1200 write IOps
Cloud node is running in a Docker container with no limitations. Host machine has a 6 core CPU and 16GB of RAM

Are you willing to submit a PR?

  • I'm willing to submit a PR!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions