[IOTDB-6267] Load 2.0 by yschengzi · Pull Request #11705 · apache/iotdb

yschengzi · 2023-12-13T07:56:47Z

MPP Load has some problems with stability and compatibility with the Pipe system, and there is room for optimization in loading speed.
Issue 1: Uncontrollable upper limit of total memory used when multiple Load statements are executed concurrently.
Currently, Load only strictly controls the upper limit of memory used by a single Load statement during its execution life cycle.
When a large number of Load statements are executed concurrently, the total memory size used by these Load statements is uncontrollable.
Please refer to MPP Load memory footprint for the memory usage during the execution life cycle of a single Load statement.
Issue 2: New data added by Load is not properly recognized by the Pipe system.
The Pipe system currently adds a ProgressIndex to all new data added to the IoTDB (see the discussion of Key Issues in Pipe System Design and Implementation).
In a normal write process, the process of adding the index is realized by the consensus layer.
In the normal write process, the process of adding an identifier is implemented by the consensus layer. However, the current Load's two-phase transaction commit process does not go through the consensus layer, and does not have a normal progress identifier, nor can it be correctly recognized by the Pipe system when restarting the task.
Issue 3: Too many serialization steps in the Load TsFile process.
In the LoadTsFileScheduler class, the implementation of MPP Load 1.0 is to
- Iterate through each TsFile in the TsFile.
- For each TsFile, perform split first, then send, and then perform the second stage after all the sends are completed.
- After completing the second phase, the next TsFile is loaded sequentially.
Since the TsFile splitting process may involve memory computation, the disk IO capacity is not fully utilized during the memory computation.
A single TsFile is serialized during splitting and sending via Thrift, waiting for both disk IO and network IO.

document link:https://apache-iotdb.feishu.cn/docx/UE9Od5caDoLoYJxt4Ptc4s0hnof

yschengzi added 3 commits December 13, 2023 15:55

refactor local package to one

8023969

add tsfile data and load tsfile manager

dfb4d43

add cache memory manager API

3af3a12

SteveYurongSu self-assigned this Dec 13, 2023

working on tsfile split worker

248562c

SteveYurongSu closed this Dec 28, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[IOTDB-6267] Load 2.0#11705

[IOTDB-6267] Load 2.0#11705
yschengzi wants to merge 4 commits intoapache:masterfrom
yschengzi:IOTDB-6267

yschengzi commented Dec 13, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

yschengzi commented Dec 13, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants