Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(mito): enable inverted index #3158

Merged
merged 11 commits into from Jan 15, 2024

Conversation

zhongzc
Copy link
Contributor

@zhongzc zhongzc commented Jan 12, 2024

I hereby agree to the terms of the GreptimeDB CLA

What's changed and what's your intention?

This PR has enabled the inverted index for the mito engine.

Main changes:

  • Introduced IntermediateManager with the objectives:
    1. To ensure intermediate files are read and written only on the local file system, avoiding access to object storage services during index creation.
    2. To clean up any residual intermediate files left by abnormal exits of the greptimedb service. This also requires that the IntermediateManager be a singleton to prevent repetitive deletions that could cause errors.
  • Modified the path of IntermediateLocation. Previously, the path was placed near the data files, but with the introduction of the IntermediateManager, data files and intermediate files were completely isolated, hence a more customized path was adopted.
  • Introduced Indexer, embedded within ParquetWriter. Indexer is used to create the index and hides error handling internally, exposing three methods to ParquetWriter that do not return errors: update, finish, and abort.
  • Added InvertedIndexConfig to MitoConfig, which includes the following parameters:
    • Toggle type: create_on_flush, create_on_compaction, apply_on_query
    • intermediate_path: The file system path for intermediates
    • mem_threshold_on_create: Memory control when creating the index
  • Modified MitoConfig::sanitize, taking data_home as an input because both intermediate_path and experimental_write_cache_path depend on data_home for setting default paths.
  • ScanRegion disables index apply during queries based on the apply_on_query parameter.
  • SstWriteRequest introduces create_inverted_index and mem_threshold_index_create. The decision to create an index during flush and compaction is controlled by passing the configs from MitoConfig to these two parameters.

Checklist

  • I have written the necessary rustdoc comments.
  • I have added the necessary unit tests and integration tests.
  • This PR does not require documentation updates.

Refer to a related PR or issue link (optional)

#2705

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>
@github-actions github-actions bot added the Doc update required This change requires document update on https://github.com/GreptimeTeam/docs label Jan 12, 2024
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>
@zhongzc zhongzc self-assigned this Jan 12, 2024
… Engine

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>
Copy link

codecov bot commented Jan 12, 2024

Codecov Report

Attention: 47 lines in your changes are missing coverage. Please review.

Comparison is base (bf88b3b) 85.43% compared to head (448150d) 85.09%.
Report is 6 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3158      +/-   ##
==========================================
- Coverage   85.43%   85.09%   -0.34%     
==========================================
  Files         823      829       +6     
  Lines      134922   135714     +792     
==========================================
+ Hits       115268   115492     +224     
- Misses      19654    20222     +568     

src/mito2/src/test_util/scheduler_util.rs Outdated Show resolved Hide resolved
src/mito2/src/cache/write_cache.rs Outdated Show resolved Hide resolved
src/mito2/src/sst/index.rs Outdated Show resolved Hide resolved
src/mito2/src/sst/index/creator.rs Outdated Show resolved Hide resolved
src/mito2/src/sst/file_purger.rs Outdated Show resolved Hide resolved
src/mito2/src/read/scan_region.rs Show resolved Hide resolved
src/mito2/src/config.rs Show resolved Hide resolved
zhongzc and others added 2 commits January 15, 2024 14:00
Co-authored-by: Yingwen <realevenyag@gmail.com>
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>
@zhongzc zhongzc force-pushed the zhongzc/inverted-index-enable branch from 82322c2 to 6403d01 Compare January 15, 2024 06:18
…to field of WriteCache

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>
@github-actions github-actions bot added Size: XL and removed Size: L labels Jan 15, 2024
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>
@zhongzc zhongzc requested a review from evenyag January 15, 2024 08:08
Copy link
Collaborator

@fengjiachun fengjiachun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@zhongzc zhongzc added this pull request to the merge queue Jan 15, 2024
Merged via the queue into GreptimeTeam:main with commit 6f07d69 Jan 15, 2024
21 checks passed
@zhongzc zhongzc deleted the zhongzc/inverted-index-enable branch January 15, 2024 09:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Doc update required This change requires document update on https://github.com/GreptimeTeam/docs
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

None yet

3 participants