Skip to content

[IOTDB-2723] Fix sequence inner space compaction lose data#5248

Merged
JackieTien97 merged 1 commit intoapache:masterfrom
THUMarkLau:IOTDB-2723
Mar 15, 2022
Merged

[IOTDB-2723] Fix sequence inner space compaction lose data#5248
JackieTien97 merged 1 commit intoapache:masterfrom
THUMarkLau:IOTDB-2723

Conversation

@THUMarkLau
Copy link
Copy Markdown
Contributor

@THUMarkLau THUMarkLau commented Mar 15, 2022

When executing inner space compaction for sequence files, the chunks of a series are read into memory one by one, and the program uses three ways to determine how to process a chunk:

  1. If the chunk is small or part of data in the chunk is deleted, the chunk will be deserialized into points and rewritten into chunk writer. The following chunk will be written into chunk writer util the size of chunk writer is large enough to flush.
  2. If the chunk is too large, the program just flush it to the disk.
  3. If the chunk is neither too small nor too large, the program just caches it in memory and merges it with the chunk following. The cached chunk will not be flush util its size is large enough.

Of course, these are rough descriptions. When the program reads a chunk that satisfies the condition of deserialization, if there is already a cached chunk in memory, the program will deserialize the cached chunk into chunk writer first, after which the freshly read chunk will be deserialized. Before the program deserializes the cached chunk, it will call the flip function of the cached chunk to make sure the chunk reader can read it correctly. However, in some cases, the cached chunk is the first cached chunk, which means it is a chunk directly read from TsFile using readMemChunk function in TsFileSequenceReader, and hasn't merged with any chunk yet. The chunk read by readMemChunk has already called flip function, while the chunk generated by mergeChunk hasn't. The program only needs to call the flip function for the later. So if the program call the flip function for the former, the flip function is called twice actually, which accounts for the error of variable position and limit in the data buffer of the chunk. Consequently, the chunk reader cannot read the data in the cached chunk correctly and the data is lost.

This bug actually has nothing to do with deletion

@coveralls
Copy link
Copy Markdown

Coverage Status

Coverage decreased (-0.001%) to 65.705% when pulling 765c612 on THUMarkLau:IOTDB-2723 into c3d34b6 on apache:master.

@JackieTien97 JackieTien97 merged commit 9f04de9 into apache:master Mar 15, 2022
@THUMarkLau THUMarkLau deleted the IOTDB-2723 branch March 15, 2022 13:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants