Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature(2.3): add compaction task for delta files #1894

Closed
wants to merge 1 commit into from

Conversation

zipper-meng
Copy link
Member

@zipper-meng zipper-meng commented Jan 4, 2024

Rationale for this change

Related #1244.

Conclusion

Others

  1. DataBlock - Add method split_at(self, index)-> (DataBlock, DataBlock) and intersection(self, time_range) -> Option<DataBlock>.
  2. ColumnFile, LevelInfo - Implement std::fmt::Display, also add ColumnFiles<F: AsRef<ColumnFile>>(&[F]) and LevelInfos(&[LevelInfo]) to decrease code when we need to log them.
  3. TimeRange, add function compact(time_ranges: &mut Vec<TimeRange>) to compact time ranges.

Compaction

  • Changes measurement of compaction from "db", "ts_family", "level" to "db", "ts_family", "in_level", "out_level".
  • Add file delta_compact.rs for delta compaction:
    1. Pick delta files and the out_level, get the out_level_max_ts of the out_level.
    2. Now we have many delta files, for each field_id in these files, find grouped blocks which should be merged together: Vec<CompactingBlockMetaGroup>.
    3. For each CompactingBlockMetaGroup, get merge-split blocks(only data in time_range 0..=out_level_max_ts is needed) and write them into files in the out_level, store these files in VersionEdit.
    4. Write a temporary tsm-tombstone file for the delta-file, named xxxxxxx.tombstone.compact.tmp, includes all it's field_ids and time_range 0..=out_level_max_ts, because data in the time_range is already merged into the out_level. Tombstones will be compacted by compact(time_ranges: &mut Vec<TimeRange>) before write.
    5. Write the VersionEdit into summary, and rename these xxxxxxx.tombstone.compact.tmp files to xxxxxxx.tombstone.

Task definition

Some changes in CompactTask:

  • Normal(from old CompactTask::Vnode) - Compact the files in the in_level into the out_level.
  • Cold(from old CompactTask::ColdVnode) - Flush memcaches and then compact the files in the in_level into the out_level.
  • Delta - Compact the files in level-0 into the out_level.

Pick

  • Remove trait Picker: Send + Sync + Debug, add two function instead of it:
    • pick_level_compaction(version: Arc<Version>) -> Option<CompactReq>
    • pick_delta_compaction(version: Arc<Version>) -> Option<CompactReq>
  • Add function pick_delta_compaction only for picking delta files, it partly merges data in level-0 files to the destination level, the merged data will leave a tombstone file for the source level-0 files.

Scheduler

  • Now schedule_compaction() not only check if a tseries family is cold but also check num of level-0 files with config compact_trigger_file_num or constant DEFAULT_COMPACT_TRIGGER_DETLA_FILE_NUM (This change makes unit tests in summary.rs won't stop, so I fixed it by some changes in unit test cases in summary.rs).
  • schedule_compaction() holds a shared reference of Arc<TseriesFamily, now we must manually stop it by TseriesFamily::close(). Implementation of Drop for Database will call that method of all it's tseries families.
  • Compact job structure compaction::job::CompactProcessor is changed to hold the map of (TsFamilyId, IsDeltaCompaction) to ShouldFlushBeforeCompaction.

TODOs

There may be some logs for debug may be deleted.

Are there any user-facing changes?

No.

@zipper-meng zipper-meng force-pushed the feature/lts_2.3/delta_compaction_v2 branch 2 times, most recently from 090d2fd to 565015f Compare January 4, 2024 07:48
@zipper-meng zipper-meng force-pushed the feature/lts_2.3/delta_compaction_v2 branch from 565015f to e189a94 Compare January 4, 2024 11:33
@@ -116,6 +116,41 @@ impl TimeRange {
self.min_ts = self.min_ts.min(other.min_ts);
self.max_ts = self.max_ts.max(other.max_ts);
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

try optimize the algo merge-intervals

.context(error::ReadTsmSnafu)?
}
};
if let Some((min_ts, _max_ts)) = data_block.time_range() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

simply the code

}
}

#[derive(Debug, Default, Clone, PartialEq, Eq, Ord, PartialOrd, Hash)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

define enum compaction type to replace CompactTaskKey

@Subsegment
Copy link
Contributor

  1. pick file instead of pick level(to avoid timerange overlap)
  2. modify max level ts before flush(to avoid timerange overlap)

@zipper-meng
Copy link
Member Author

Please see the new PR #1945.

@zipper-meng zipper-meng deleted the feature/lts_2.3/delta_compaction_v2 branch May 23, 2024 07:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants