Feature(2.3): add compaction task for delta files #1945

zipper-meng · 2024-01-19T11:29:11Z

Rationale for this change

Related #1244.

Conclusion

Others

DataBlock - Add method split_at(self, index)-> (DataBlock, DataBlock) and intersection(self, time_range) -> Option<DataBlock>.
ColumnFile, LevelInfo - Implement std::fmt::Display, also add ColumnFiles<F: AsRef<ColumnFile>>(&[F]) and LevelInfos(&[LevelInfo]) to decrease code when we need to log them.
TimeRange, add function compact(time_ranges: &mut Vec<TimeRange>) to compact time ranges.
TimeRanges, add method push(&mut self, time_range: TimeRange), extend_from_slice(&mut self, time_ranges: &[TimeRange]) to modify a TimeRanges.

Compaction

Changes measurement of compaction from "db", "ts_family", "level" to "db", "ts_family", "in_level", "out_level".
Add file delta_compact.rs for delta compaction:
1. Pick delta files and the out_level, get the out_level_max_ts of the out_level.
2. Now we have many delta files, for each field_id in these files, find grouped blocks which should be merged together: Vec<CompactingBlockMetaGroup>.
3. For each CompactingBlockMetaGroup, get merge-split blocks(only data in time_range 0..=out_level_max_ts is needed) and write them into files in the out_level, store these files in VersionEdit.
4. Write a temporary tsm-tombstone file for the delta-file, named xxxxxxx.tombstone.compact.tmp, includes all it's field_ids and time_range 0..=out_level_max_ts, because data in the time_range is already merged into the out_level. Tombstones will be compacted by compact(time_ranges: &mut Vec<TimeRange>) before write.
5. Write the VersionEdit into summary, and rename these xxxxxxx.tombstone.compact.tmp files to xxxxxxx.tombstone.

Task definition

Some changes in CompactTask:

Normal(from old CompactTask::Vnode) - Compact the files in the in_level into the out_level.
Cold(from old CompactTask::ColdVnode) - Flush caches and then compact the files in the in_level into the out_level.
Delta - Compact the files in level-0 into the out_level.

Some changes in CompactReq:

Add field lv0_files: Option<Vec<Arc<ColumnFile>>> to store files in delta compaction.

Pick

Remove trait Picker: Send + Sync + Debug, add two function instead of it:
- pick_level_compaction(compact_task: CompactTask, version: Arc<Version>) -> Option<CompactReq>
- async pick_delta_compaction(compact_task: CompactTask, version: Arc<Version>) -> Option<CompactReq>
Add function pick_delta_compaction only for picking delta files with a tsm file to partly merges data in level-0 files into the destination level, the merged data will leave a tombstone file for the source level-0 files.
Improve unit test code.

Scheduler

Now schedule_compaction() not only check if a tseries family is cold but also check num of level-0 files with config compact_trigger_file_num or constant DEFAULT_COMPACT_TRIGGER_DETLA_FILE_NUM (This change makes unit tests in summary.rs won't stop, so I fixed it by some changes in unit test cases in summary.rs).
schedule_compaction() holds a shared reference of Arc<TseriesFamily, now we must manually stop it by TseriesFamily::close(). Implementation of Drop for Database will call that method of all it's tseries families.
Compact job structure compaction::job::CompactProcessor is changed to hold the map of (TsFamilyId, IsDeltaCompaction) to ShouldFlushBeforeCompaction.

Tombstone

Add tombstone type ALL, now a tombstone can not only specifies an excluded time range for a field but can also specify the time range for all fields.

pub enum TombstoneField {
     One(FieldId), // 0x00, u64
     All,          // 0x01
 }

As we know the tombstone file is record file, and each record is like:

+--------------+--------------+-----------+------------+--------------+---------------+
| 0: 4 bytes   | 4: 1 byte    | 5: 1 byte | 6: 4 bytes | 10: 4 bytes  | 14: data_size |
+--------------+--------------+-----------+------------+--------------+---------------+
| magic_number | data_version | data_type | data_size  | crc32_number |     data      |
+--------------+--------------+-----------+------------+--------------+---------------+

We can control the decoding of tombstones by data_type in record header.

Tombstone record v1

The v1 tombstone record, I think it's sparse.

+------------+---------------+---------------+
| 0: 8 bytes | 8: 8 bytes    | 16: 8 bytes   |
+------------+---------------+---------------+
|  field_id  | min_timestamp | max_timestamp |
+------------+---------------+---------------+

Tombstone record v2

Each record may contains many fields and time ranges.

# field_typ = FIELD_TYPE_ONE(0x00)
+-----------------+------------+-----------------+---------------+---------------+----
| 0: 1 byte       | 1: 8 bytes | 9: 4 bytes      | 8: 8 bytes    | 16: 8 bytes   | ...
+-----------------+------------+-----------------+---------------+---------------+----
| field_typ(0x00) |  field_id  | time_ranges_num | min_timestamp | max_timestamp | ...
+-----------------+------------+-----------------+---------------+---------------+----

# field_typ = FIELD_TYPE_ALL(0x01)
+-----------------+-----------------+---------------+---------------+----
| 0: 1 byte       | 9: 4 bytes      | 8: 8 bytes    | 16: 8 bytes   | ...
+-----------------+-----------------+---------------+---------------+----
| field_typ(0x01) | time_ranges_num | min_timestamp | max_timestamp | ...
+-----------------+-----------------+---------------+---------------+----

TODOs

There may be some logs for debug may be deleted.

Are there any user-facing changes?

No.

roseboy-liu · 2024-01-20T07:31:40Z

Generating a large number of delta files, verifying the consistency of the merged data, and completing the merging of the delta files.
Version 2.3.2 is compatible with tomb files and summary files. There is a compatibility upgrade for the tomb file format, which needs to be verified.

…fix(e2e_test): fix windows build fail

…ix unit tests

zipper-meng changed the title ~~Feature/lts 2.3/delta compaction v3~~ Feature(2.3): add compaction task for delta files Jan 19, 2024

zipper-meng mentioned this pull request Jan 19, 2024

Feature(2.3): add compaction task for delta files #1894

Closed

zipper-meng requested a review from roseboy-liu January 19, 2024 11:33

roseboy-liu force-pushed the feature/lts_2.3/delta_compaction_v3 branch from 7d8773e to 3a7a957 Compare January 24, 2024 02:48

zipper-meng force-pushed the feature/lts_2.3/delta_compaction_v3 branch from 3a7a957 to c7e7ea8 Compare January 24, 2024 02:52

roseboy-liu force-pushed the feature/lts_2.3/delta_compaction_v3 branch from c7e7ea8 to e36f193 Compare January 24, 2024 03:16

zipper-meng force-pushed the feature/lts_2.3/delta_compaction_v3 branch from 6b17266 to 14d5f0d Compare January 25, 2024 12:29

zipper-meng and others added 9 commits January 26, 2024 17:12

feat(compaction,2.3): add delta compaction

b002268

feat(compaction,2.3): new delta compaction

88d3fd8

feat(compaction,2.3): pick delta files if there are any delta-files; …

e2ced22

…fix(e2e_test): fix windows build fail

chore: only flush data to delta

107ff2c

feat(compaction,2.3): pick delta files if there are any delta-files

f2ba2e9

refactor: use new Timeranges impl

26f03bb

feat(compaction,2.3): simplfy compaction request

622d960

feat: add delta picker

071d4fd

feat(compaction,2.3):code for new delta picker; flush refactorings; f…

a5689ab

…ix unit tests

zipper-meng force-pushed the feature/lts_2.3/delta_compaction_v3 branch from 14d5f0d to a5689ab Compare January 26, 2024 09:15

roseboy-liu previously approved these changes Jan 31, 2024

View reviewed changes

feat(compaction,2.3): add some unit tests

93e26a8

zipper-meng dismissed roseboy-liu’s stale review via 93e26a8 February 1, 2024 11:18

roseboy-liu approved these changes Feb 1, 2024

View reviewed changes

roseboy-liu merged commit 7f3c412 into LTS/2.3 Feb 1, 2024
7 checks passed

roseboy-liu deleted the feature/lts_2.3/delta_compaction_v3 branch February 1, 2024 12:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature(2.3): add compaction task for delta files #1945

Feature(2.3): add compaction task for delta files #1945

zipper-meng commented Jan 19, 2024 •

edited

roseboy-liu commented Jan 20, 2024

Feature(2.3): add compaction task for delta files #1945

Feature(2.3): add compaction task for delta files #1945

Conversation

zipper-meng commented Jan 19, 2024 • edited

Rationale for this change

Conclusion

Others

Compaction

Task definition

Pick

Scheduler

Tombstone

Tombstone record v1

Tombstone record v2

TODOs

Are there any user-facing changes?

roseboy-liu commented Jan 20, 2024

zipper-meng commented Jan 19, 2024 •

edited