Skip to content

[IOTDB-6267] Load 2.0#11705

Closed
yschengzi wants to merge 4 commits intoapache:masterfrom
yschengzi:IOTDB-6267
Closed

[IOTDB-6267] Load 2.0#11705
yschengzi wants to merge 4 commits intoapache:masterfrom
yschengzi:IOTDB-6267

Conversation

@yschengzi
Copy link
Contributor

  • MPP Load has some problems with stability and compatibility with the Pipe system, and there is room for optimization in loading speed.
    Issue 1: Uncontrollable upper limit of total memory used when multiple Load statements are executed concurrently.
  • Currently, Load only strictly controls the upper limit of memory used by a single Load statement during its execution life cycle.
  • When a large number of Load statements are executed concurrently, the total memory size used by these Load statements is uncontrollable.
  • Please refer to MPP Load memory footprint for the memory usage during the execution life cycle of a single Load statement.
    Issue 2: New data added by Load is not properly recognized by the Pipe system.
  • The Pipe system currently adds a ProgressIndex to all new data added to the IoTDB (see the discussion of Key Issues in Pipe System Design and Implementation).
  • In a normal write process, the process of adding the index is realized by the consensus layer.
  • In the normal write process, the process of adding an identifier is implemented by the consensus layer. However, the current Load's two-phase transaction commit process does not go through the consensus layer, and does not have a normal progress identifier, nor can it be correctly recognized by the Pipe system when restarting the task.
    Issue 3: Too many serialization steps in the Load TsFile process.
  • In the LoadTsFileScheduler class, the implementation of MPP Load 1.0 is to
    • Iterate through each TsFile in the TsFile.
    • For each TsFile, perform split first, then send, and then perform the second stage after all the sends are completed.
    • After completing the second phase, the next TsFile is loaded sequentially.
  • Since the TsFile splitting process may involve memory computation, the disk IO capacity is not fully utilized during the memory computation.
  • A single TsFile is serialized during splitting and sending via Thrift, waiting for both disk IO and network IO.

document link:https://apache-iotdb.feishu.cn/docx/UE9Od5caDoLoYJxt4Ptc4s0hnof

@SteveYurongSu SteveYurongSu self-assigned this Dec 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants