Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Implements merge_mode region options #4208

Merged
merged 18 commits into from
Jun 27, 2024

Conversation

evenyag
Copy link
Contributor

@evenyag evenyag commented Jun 25, 2024

I hereby agree to the terms of the GreptimeDB CLA.

Refer to a related PR or issue link (optional)

What's changed and what's your intention?

This PR implements a new region option update_mode to control how to update existing rows by key. There are two update modes available:

  • LastRow: overwrites all fields by the values in the latest row
  • LastNotNull: only overwrites the null value by the latest value for each field

Memtables have individual dedup implementations so they can't reuse the DedupReader. To avoid repeating the dedup logic for each memtable, this PR implements a new iterator LastNotNullIter as a wrapper for the memtable iter. LastNotNullIter splits batches and invokes the LastNotNull dedup strategy to dedup them. The LastNotNullIter only takes effect when update_mode is LastNotNull.

After this PR, I will expose the option to table options and add some sqlness tests.

Checklist

  • I have written the necessary rustdoc comments.
  • I have added the necessary unit tests and integration tests.
  • This PR requires documentation updates.

Summary by CodeRabbit

  • New Features

    • Introduced MergeMode options for handling duplicate row merging in various components.
    • Added new iterator LastNotNullIter for deduplication based on the LastNotNull strategy.
  • Enhancements

    • Updated time series operations and compaction processes to utilize the new MergeMode options.
    • Enhanced TimeSeriesMemtable, PartitionTreeConfig, and ScanInput to include merge_mode configuration.
  • Tests

    • Added comprehensive tests for new MergeMode options and append mode functionality.

Copy link
Contributor

coderabbitai bot commented Jun 25, 2024

Note

Reviews paused

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Walkthrough

The recent changes introduce a MergeMode option across various components of the mito2 module to handle time series data merging strategies during compaction, scanning, and memtable operations. Updated classes, structures, and functions now accommodate a merge_mode parameter, influencing the handling of duplicate rows. Additionally, new tests ensure the correct implementation of these features.

Changes

File(s) Change Summary
.../benches/memtable_bench.rs Introduced MergeMode::LastRow in the creation of TimeSeriesMemtable for benchmarks.
.../compaction.rs Added CompactionSstReaderBuilder with merge_mode for compaction tasks.
.../compaction/compactor.rs Replaced build_sst_reader with CompactionSstReaderBuilder, included update_mode parameter.
.../compaction/window.rs Added merge_mode field in the struct within tests.
.../engine.rs Added update_mode_test module.
.../engine/merge_mode_test.rs Included tests for merge_mode functionality.
.../memtable.rs, .../memtable/partition_tree.rs, .../memtable/time_series.rs Added MergeMode to configurations and memtable constructs.
.../read/dedup.rs Introduced LastNotNullIter for custom deduplication strategies.
.../read/scan_region.rs, .../read/seq_scan.rs Integrated MergeMode in ScanInput and adjusted DedupReader for new strategy.
.../region/opener.rs, .../region/options.rs Added merge_mode field to RegionOptions, updated memtable_builder.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant Engine
    participant CompactionSstReaderBuilder
    participant ScanInput

    User ->> Engine: Start compaction
    Engine ->> CompactionSstReaderBuilder: Initialize with MergeMode
    Engine ->> CompactionSstReaderBuilder: Build SST reader
    CompactionSstReaderBuilder ->> Engine: Return SST reader
    Engine ->> ScanInput: Configure with MergeMode
    ScanInput ->> Engine: Initiate scanning
    Engine ->> User: Return compaction results
Loading

Poem

On data fields where time's in play,
A merge mode comes to save the day.
With LastRow set and columns bright,
Rows combine in seamless flight.
Rabbits coding, lines so neat,
Bugs defeated, a code complete! 🐇📜✨


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share
Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai generate interesting stats about this repository and render them as a table.
    • @coderabbitai show all the console.log statements in this repository.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (invoked as PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Additionally, you can add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.

CodeRabbit Configration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@github-actions github-actions bot added the docs-not-required This change does not impact docs. label Jun 25, 2024
@evenyag evenyag marked this pull request as ready for review June 25, 2024 08:49
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Outside diff range and nitpick comments (1)
src/mito2/src/compaction/window.rs (1)

Line range hint 212-263: Proper handling of UpdateMode in compaction window configuration

The addition of UpdateMode to the RegionOptions within the Version struct is a strategic enhancement that aligns the compaction behavior with the update strategies defined at the region level.

  • Alignment with Design: This change ensures that the compaction process respects the update mode settings, which is crucial for maintaining data integrity and expected behavior during the compaction.
  • Potential Improvement: It would be beneficial to add tests specifically targeting the interaction between the update mode and the compaction process to ensure that the integration is seamless and behaves as expected under various conditions.

Would you like to develop these targeted tests or need further clarification on potential edge cases that should be covered?

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 1e815dd and 62dfe8d.

Files selected for processing (17)
  • src/mito2/benches/memtable_bench.rs (4 hunks)
  • src/mito2/src/compaction.rs (2 hunks)
  • src/mito2/src/compaction/compactor.rs (3 hunks)
  • src/mito2/src/compaction/window.rs (2 hunks)
  • src/mito2/src/engine.rs (1 hunks)
  • src/mito2/src/engine/append_mode_test.rs (1 hunks)
  • src/mito2/src/engine/update_mode_test.rs (1 hunks)
  • src/mito2/src/memtable.rs (4 hunks)
  • src/mito2/src/memtable/partition_tree.rs (3 hunks)
  • src/mito2/src/memtable/partition_tree/tree.rs (3 hunks)
  • src/mito2/src/memtable/time_series.rs (14 hunks)
  • src/mito2/src/read/dedup.rs (2 hunks)
  • src/mito2/src/read/scan_region.rs (5 hunks)
  • src/mito2/src/read/seq_scan.rs (2 hunks)
  • src/mito2/src/region/opener.rs (2 hunks)
  • src/mito2/src/region/options.rs (8 hunks)
  • src/mito2/src/test_util.rs (1 hunks)
Files skipped from review due to trivial changes (1)
  • src/mito2/src/engine/append_mode_test.rs
Additional comments not posted (40)
src/mito2/benches/memtable_bench.rs (1)

Line range hint 27-128: Update mode integration in benchmark tests

The benchmark tests for write_rows and full_scan functions have been updated to include the UpdateMode::LastRow. This is a crucial update as it ensures that the performance implications of different update modes are measured and understood.

  • Performance Consideration: It's essential to also benchmark the LastNotNull mode to compare its performance impact relative to LastRow. This would provide a more comprehensive understanding of how each mode affects performance.
  • Refinement: Consider parameterizing the benchmark tests to run with both LastRow and LastNotNull modes. This could be achieved by using a loop or parameterized benchmarks if supported by the Criterion library.

Would you like a script to verify the performance impact comparison between LastRow and LastNotNull modes?

src/mito2/src/memtable.rs (1)

Line range hint 36-286: Comprehensive integration of UpdateMode in Memtable configurations

The integration of UpdateMode in various memtable configurations is a significant enhancement, ensuring that different update strategies are supported at a fundamental level in the storage layer.

  • Consistency: The addition of UpdateMode to both TimeSeriesMemtableBuilder and PartitionTreeMemtableBuilder ensures that the update mode is consistently applied across different types of memtables.
  • Best Practices: Consider adding default cases in the match statements for MemtableBuilderProvider to handle unexpected MemtableOptions. This would prevent potential runtime errors and improve the robustness of the code.
  • Performance: The impact of different update modes on memtable performance should be analyzed, especially since memtables are critical for write performance.

Would you like assistance in analyzing the performance impact of different update modes on memtables?

src/mito2/src/compaction/compactor.rs (4)

31-31: Clarify import of CompactionSstReaderBuilder.

The import of CompactionSstReaderBuilder is crucial for the changes in this file. Ensure that this import is necessary and correctly placed in the context of the file's other imports.


144-144: Review the integration of update_mode in MemtableBuilderProvider.

The integration of update_mode in the MemtableBuilderProvider builder method is a significant change. It directly affects how memtables are built based on the update_mode setting. Ensure that this integration is tested and that the behavior aligns with the expected outcomes for different update modes.

Verification successful

Tests for update_mode integration found in update_mode_test.rs

The integration of update_mode within MemtableBuilderProvider appears to be tested in src/mito2/src/engine/update_mode_test.rs. Specifically, the following tests are present:

  • test_update_mode_write_query
  • test_update_mode_compaction

These tests suggest that different update_mode settings are being evaluated.

  • src/mito2/src/engine/update_mode_test.rs
Scripts executed

The following scripts were executed for the analysis:


Script:

#!/bin/bash
# Description: Verify the integration of `update_mode` in `MemtableBuilderProvider`.

# Test: Search for the method usage in tests. Expect: Tests covering different `update_mode` settings.
rg --type rust $'MemtableBuilderProvider::new.*update_mode'

Length of output: 60



Script:

#!/bin/bash
# Search for `update_mode` within the `MemtableBuilderProvider` context
rg --type rust 'MemtableBuilderProvider' | rg 'update_mode'

# Search for test files that might be testing `update_mode` indirectly
rg --type rust 'update_mode' --tests

Length of output: 526



Script:

#!/bin/bash
# Search for `update_mode` within the `MemtableBuilderProvider` context
rg --type rust 'MemtableBuilderProvider.*update_mode'

# Corrected command to search for test files that might be testing `update_mode` indirectly
rg --type rust 'update_mode' --glob '*test*'

Length of output: 457


284-293: Review the construction of CompactionSstReaderBuilder with update_mode.

The construction of CompactionSstReaderBuilder now includes the update_mode. This is crucial for ensuring that the SST reader behaves according to the specified update mode during the compaction process. It is important to verify that this integration is correct and that the CompactionSstReaderBuilder is properly utilizing the update_mode.

Verification successful

Review the construction of CompactionSstReaderBuilder with update_mode.

The CompactionSstReaderBuilder correctly includes and initializes the update_mode field. The update_mode is passed from compaction_region.current_version.options.update_mode and is integrated properly in the builder.

  • src/mito2/src/compaction/compactor.rs:
    • Line where update_mode is fetched: let update_mode = compaction_region.current_version.options.update_mode;
    • Line where update_mode is passed to the builder: update_mode,
Scripts executed

The following scripts were executed for the analysis:


Script:

#!/bin/bash
# Description: Verify the correct usage of `update_mode` in `CompactionSstReaderBuilder`.

# Test: Search for the instantiation of `CompactionSstReaderBuilder` in the codebase. Expect: Correct initialization with `update_mode`.
rg --type rust $'CompactionSstReaderBuilder.*update_mode'

Length of output: 58



Script:

#!/bin/bash
# Find all instances of `CompactionSstReaderBuilder` in the codebase.
rg --type rust 'CompactionSstReaderBuilder'

Length of output: 366



Script:

#!/bin/bash
# Check the definition of `CompactionSstReaderBuilder` for `update_mode`.
rg --type rust 'update_mode' src/mito2/src/compaction.rs

Length of output: 134



Script:

#!/bin/bash
# Verify how `update_mode` is used in `compactor.rs`.
rg --type rust 'update_mode' src/mito2/src/compaction/compactor.rs

Length of output: 228


282-282: Ensure the propagation of update_mode to the compaction logic.

The propagation of update_mode from the RegionOptions to the compaction logic is a critical change. This ensures that the compaction process respects the update_mode settings. Verify that this change is consistent with the intended design and that it is covered by unit tests.

Verification successful

Ensure the propagation of update_mode to the compaction logic.

The propagation of update_mode from the RegionOptions to the compaction logic is verified. The update_mode is set and used appropriately in compactor.rs, and the test function test_update_mode_compaction covers the update_mode during the compaction process.

  • src/mito2/src/compaction/compactor.rs
  • src/mito2/src/engine/update_mode_test.rs
Scripts executed

The following scripts were executed for the analysis:


Script:

#!/bin/bash
# Description: Verify the propagation of `update_mode` in the compaction logic.

# Test: Search for the usage of `update_mode` in the compaction logic. Expect: Proper handling based on `update_mode`.
rg --type rust $'update_mode.*compaction'

Length of output: 247



Script:

#!/bin/bash
# Search for all instances of `update_mode` in the `compactor.rs` file to verify its usage in the compaction logic.
rg 'update_mode' src/mito2/src/compaction/compactor.rs

Length of output: 216



Script:

#!/bin/bash
# Extract the content of the `test_update_mode_compaction` function from `update_mode_test.rs` to verify it covers `update_mode` in the compaction process.
ast-grep --lang rust --pattern 'async fn test_update_mode_compaction() {
  $$$
}' src/mito2/src/engine/update_mode_test.rs

Length of output: 9487

src/mito2/src/memtable/partition_tree/tree.rs (2)

43-45: Added imports for LastNotNullIter and Batch

These imports are necessary for the new functionality to handle different update modes. Ensure that these are used appropriately in the implementation to avoid any unused imports.


247-252: Ensure correct iterator wrapping based on update_mode

The conditional wrapping of the iterator based on the update_mode is a critical part of the functionality. This change appears correct and should work as intended. However, ensure that the LastNotNullIter is thoroughly tested to handle edge cases in data consistency and performance impacts.

src/mito2/src/region/opener.rs (2)

195-199: Correct implementation of memtable builder initialization with update_mode

This change correctly passes the update_mode to the memtable builder. It's crucial that this is tested extensively to ensure that the memtables are initialized properly based on the selected update mode, especially since this affects data integrity and query results.


329-329: Ensure update_mode is correctly passed to memtable builder during region opening

The addition of update_mode in the memtable builder during the region opening process is crucial for ensuring that the region adheres to the specified update strategies. This change looks good, but make sure to validate through integration tests that the update mode is respected in all operations post-opening.

src/mito2/src/region/options.rs (5)

37-47: Introduced new UpdateMode enum for handling duplicate rows.

The UpdateMode enum has been correctly defined with two variants LastRow and LastNotNull, which are crucial for the new functionality of handling duplicate rows based on the update mode. The use of #[default] on LastRow is appropriate as it provides a sensible default behavior.


70-72: Added update_mode field to RegionOptions.

The new update_mode field in RegionOptions is well-integrated and includes a helpful comment explaining its effect in conjunction with append_mode. This addition aligns with the PR's objectives to manage how duplicate rows are updated.


116-116: Ensure update_mode is correctly assigned in RegionOptions conversion.

The assignment of update_mode in the conversion from a HashMap to RegionOptions is correctly implemented. This ensures that the update_mode setting is preserved when options are loaded from external configurations.


Line range hint 199-210: Added update_mode to RegionOptionsWithoutEnum.

The inclusion of update_mode in RegionOptionsWithoutEnum is necessary due to the limitations of #[serde(default)] with enums. This change correctly mirrors the structure of RegionOptions, ensuring that serialization processes handle the update_mode appropriately.


500-513: Test cases for update_mode handling in RegionOptions.

The test cases added for update_mode effectively cover the scenarios for valid modes (last_row, last_not_null) and an invalid mode (unknown). This thorough testing ensures the robustness of the update_mode functionality.

src/mito2/src/engine.rs (1)

53-54: Added test module for update_mode.

The inclusion of the update_mode_test module is crucial for ensuring that the new functionality works as expected. It's good to see proactive testing being incorporated alongside new features.

src/mito2/src/compaction.rs (5)

56-56: Addition of UpdateMode import

The UpdateMode enum has been imported to be used in the compaction functionalities. This import is necessary due to the new compaction strategies introduced in this PR.


446-455: Introduction of CompactionSstReaderBuilder

This new structure is designed to build batch readers for compaction tasks. It includes various settings like append_mode, filter_deleted, and crucially, update_mode, which dictates how duplicates are handled during compaction.

  • Fields Review: The fields are well-structured and relevant for the compaction process. However, ensure that the cache field, which is optional, is handled properly in all scenarios where it might be None.
  • Use of 'update_mode': Proper inclusion and handling of the update_mode suggest good integration with the new functionality.

Overall, this structure appears well-designed for its purpose.


457-477: Review of build_sst_reader Method

This method constructs a BoxedBatchReader configured according to the CompactionSstReaderBuilder settings. The method correctly incorporates various settings like file inputs, append mode, and importantly, the update_mode.

  • Handling of update_mode: The method correctly adjusts the scan input to include the update_mode, which is critical for applying the correct duplication strategy.
  • Error Handling: The method includes robust error handling, particularly with file not found errors which are ignored during compaction—a sensible default given the context.

This method is well-implemented and aligns with the objectives of handling different update modes during compaction.


Line range hint 480-504: Review of time_range_to_predicate Function

This function converts a TimestampRange into a Predicate for filtering rows outside the specified range during scans. It handles different cases of range boundaries (start and/or end specified).

  • Correctness: The function uses datafusion_expr to create conditions based on the metadata's time index column. The use of ts_to_lit for converting timestamps to literals appears correct.
  • Error Handling: The function properly handles potential overflows when converting timestamps, which is crucial for avoiding runtime errors.

This utility function is well-implemented and crucial for ensuring that only relevant data is processed during compaction.


Line range hint 506-530: Review of get_expired_ssts Function

This function identifies SST files that have expired based on their TTL. It correctly calculates the expiration time and filters out files that are beyond this time.

  • Error Handling: The function handles potential errors during the subtraction of duration from the timestamp, which is critical for avoiding incorrect behavior in edge cases.
  • Efficiency: The function efficiently collects expired files using flat_map, which is suitable given the potentially large number of files in a compaction scenario.

This function is well-implemented and plays a crucial role in maintaining the health of the storage by removing outdated data.

src/mito2/src/memtable/partition_tree.rs (1)

84-86: Approved addition of the update_mode field in PartitionTreeConfig.

The addition aligns with the PR's objective to support different update strategies for handling duplicate rows. Marking it to skip deserialization is appropriate if this field is expected to be set programmatically rather than through configuration files.

src/mito2/src/read/seq_scan.rs (1)

214-223: Approved handling of update modes in SeqScan.

The implementation correctly handles the LastRow and LastNotNull update modes in the deduplication process. This is crucial for ensuring that the scan behavior aligns with the specified update mode, which is central to the functionality introduced in this PR.

src/mito2/src/read/scan_region.rs (4)

45-45: Addition of UpdateMode import:

The import of UpdateMode is necessary for the new functionality related to handling different update modes in scanning operations. This aligns with the PR's objectives to manage how duplicate rows are updated.


403-404: Documentation and clarity for update_mode in ScanInput:

While the implementation of update_mode in ScanInput is correct, adding a brief comment explaining its role, especially how it affects the deduplication process during scans, would enhance code readability and maintainability.

-    /// Mode to update duplicate rows.
+    /// Mode to update duplicate rows. Determines how duplicate entries are handled based on the specified update mode.

505-510: Proper implementation of with_update_mode method:

The method with_update_mode correctly sets the update_mode for ScanInput, ensuring that the scanning operations can utilize the specified update mode. This method is well-implemented and follows Rust's idiomatic use of builder patterns.


299-300: Ensure correct handling of update_mode in ScanInput:

The addition of update_mode to the ScanInput struct is crucial for implementing the new deduplication strategies. This change is consistent with the PR objectives and should be tested thoroughly to ensure that it integrates well with the existing scanning logic.

Verification successful

Ensure correct handling of update_mode in ScanInput:

The update_mode is correctly integrated into the ScanInput struct and is used in the scanning operations as follows:

  • compaction.rs:
    • ScanInput is created and update_mode is set appropriately.
  • scan_region.rs:
    • ScanInput is created and update_mode is set appropriately.
  • unordered_scan.rs:
    • ScanInput is created and update_mode is set appropriately.
  • seq_scan.rs:
    • ScanInput is created and update_mode is set appropriately.

These integrations are consistent with the PR objectives and ensure that the new deduplication strategies are implemented correctly.

Scripts executed

The following scripts were executed for the analysis:


Script:

#!/bin/bash
# Description: Verify the integration of update_mode in ScanInput with scanning operations.

# Test: Search for usage of ScanInput and ensure update_mode is considered.
rg --type rust $'ScanInput' --context 5

Length of output: 14602

src/mito2/src/read/dedup.rs (2)

471-564: Implementation of LastNotNullIter iterator

The newly added LastNotNullIter is crucial for handling the LastNotNull deduplication strategy. The iterator is well-structured and seems to correctly manage the batch processing based on the LastNotNull logic. However, there are a few points that need attention:

  1. Documentation and Comments: More detailed comments explaining the logic, especially around complex conditional flows in next_batch_for_merge and next, would enhance maintainability.
  2. Error Handling: Ensure that all possible errors from iter.next() and batch.slice() are appropriately handled and logged.
  3. Performance Considerations: Review if there are any potential performance bottlenecks, especially in loops and conditional checks within next_batch_for_merge. Consider optimizing the slicing logic to avoid unnecessary data copying.

Overall, the implementation aligns with the PR's objectives but could benefit from further documentation and potential optimizations.


1081-1136: Unit Tests for LastNotNullIter

The unit tests provided for LastNotNullIter are comprehensive and cover a variety of scenarios including batch splitting and handling multiple batches. These tests are essential for ensuring the correct behavior of the deduplication strategy under different conditions. Some suggestions:

  1. Additional Test Scenarios: Consider adding tests for edge cases, such as batches with all rows having the same timestamp or batches that mix OpType::Put and OpType::Delete in more complex patterns.
  2. Performance Testing: While functional correctness is verified, adding performance benchmarks to understand the impact of this iterator on large datasets could be beneficial.

It's good to see thorough testing around this new feature, which is critical for maintaining data integrity in GreptimeDB.
[APROVED]

src/mito2/src/memtable/time_series.rs (11)

47-49: Import of LastNotNullIter and UpdateMode

The import of LastNotNullIter and UpdateMode reflects the new functionality introduced for handling different update modes in time series memtables. This aligns with the PR objectives to support different update strategies.


60-60: Addition of update_mode field in TimeSeriesMemtableBuilder

Adding the update_mode field to the TimeSeriesMemtableBuilder struct is crucial for initializing the memtable with the desired update behavior. This change is necessary to support the new feature.


65-73: Constructor modification to include update_mode

The constructor of TimeSeriesMemtableBuilder has been appropriately modified to accept an update_mode parameter and initialize the corresponding field. This ensures that the builder pattern remains consistent with the new functionality.


85-85: Pass update_mode to TimeSeriesMemtable constructor

Passing the update_mode from the builder to the TimeSeriesMemtable instance is essential for ensuring that the memtable behaves according to the specified update mode. This is a critical link in the chain of changes required to support the new feature.


100-100: Addition of update_mode in TimeSeriesMemtable struct and constructor

The addition of the update_mode field in the TimeSeriesMemtable struct and its initialization in the constructor are key to applying the update mode logic throughout the memtable's lifecycle. This change correctly propagates the update mode setting from the configuration to the operational behavior of the memtable.

Also applies to: 109-109


118-122: Logic to disable deduplication based on update_mode

The conditional logic to disable deduplication when the update_mode is LastNotNull ensures that the memtable does not perform unnecessary deduplication operations that conflict with the intended update behavior. This is a smart use of conditional logic to optimize performance based on configuration.


132-132: Ensure update_mode is part of the memtable state

Storing update_mode as part of the TimeSeriesMemtable state is necessary for later use during operations like iteration, ensuring that the update behavior is consistently applied. This encapsulation of state within the memtable is well-implemented.


262-267: Conditional iterator creation based on update_mode

The conditional creation of LastNotNullIter based on the update_mode during iteration is a crucial implementation detail that ensures the correct handling of updates according to the specified mode. This dynamic behavior is essential for the flexibility of the update mechanism.


288-288: Include update_mode in TimeSeriesIterBuilder

Incorporating the update_mode into the TimeSeriesIterBuilder is necessary for constructing iterators that respect the specified update behavior. This change ensures that the iterator construction is aligned with the new update mode functionality.


337-337: Pass update_mode during memtable forking

Passing the update_mode during the forking of a TimeSeriesMemtable ensures that the new instance maintains the same update behavior as the original. This consistency is crucial for operations that involve replicating or splitting memtables.


874-874: Update mode handling in TimeSeriesIterBuilder

The inclusion of update_mode in the TimeSeriesIterBuilder and its use in the iterator construction process are well-implemented. This ensures that the iterator behavior is consistent with the specified update mode, providing flexibility in how updates are processed during iteration.

Also applies to: 884-890

src/mito2/src/test_util.rs Outdated Show resolved Hide resolved
src/mito2/src/memtable/partition_tree/tree.rs Outdated Show resolved Hide resolved
src/mito2/src/engine/update_mode_test.rs Outdated Show resolved Hide resolved
src/mito2/src/memtable/partition_tree.rs Outdated Show resolved Hide resolved
Copy link

codecov bot commented Jun 25, 2024

Codecov Report

Attention: Patch coverage is 97.17742% with 14 lines in your changes missing coverage. Please review.

Project coverage is 84.60%. Comparing base (948c869) to head (11936c4).
Report is 3 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4208      +/-   ##
==========================================
- Coverage   84.85%   84.60%   -0.25%     
==========================================
  Files        1040     1042       +2     
  Lines      183023   183488     +465     
==========================================
- Hits       155297   155238      -59     
- Misses      27726    28250     +524     

@evenyag evenyag requested a review from WenyXu June 25, 2024 11:26
Copy link
Collaborator

@fengjiachun fengjiachun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

src/mito2/src/read/dedup.rs Outdated Show resolved Hide resolved
src/mito2/src/read/dedup.rs Outdated Show resolved Hide resolved
Co-authored-by: Jeremyhi <jiachun_feng@proton.me>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 62dfe8d and bbee9ca.

Files selected for processing (1)
  • src/mito2/src/read/dedup.rs (2 hunks)

src/mito2/src/read/dedup.rs Outdated Show resolved Hide resolved
src/mito2/src/read/dedup.rs Show resolved Hide resolved
src/mito2/src/read/dedup.rs Outdated Show resolved Hide resolved
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between bbee9ca and 7c463f0.

Files selected for processing (2)
  • src/mito2/src/read/dedup.rs (2 hunks)
  • src/mito2/src/test_util.rs (1 hunks)
Additional comments not posted (4)
src/mito2/src/test_util.rs (1)

923-953: Enhance Documentation and Verify Data Type Handling

The function build_rows_with_fields is well-implemented for creating rows with optional fields. However, the documentation could be further improved for clarity and completeness:

  • The data type for the fields is specified as i64, but the conversion to f64 in the function suggests that these might represent floating-point numbers. Clarifying this in the documentation would help.
  • It would be beneficial to mention that the timestamp values are expected to be in seconds since they are multiplied by 1000 to convert to milliseconds.

Additionally, ensure that the data type conversions (i64 to f64) are safe and intended, as this could lead to precision issues or unexpected behavior if not handled correctly.

/// Build rows with schema (string, f64, f64, ts_millis).
/// - `key`: A string key that is common across all rows.
/// - `timestamps`: Array of timestamp values in seconds.
/// - `fields`: Array of tuples where each tuple contains two optional i64 values (interpreted as f64), representing two optional float fields.
/// Returns a vector of `Row` each containing the key, two optional float fields, and a timestamp.
Verification successful

Enhance Documentation and Verify Data Type Handling

The function build_rows_with_fields is well-implemented for creating rows with optional fields. However, the documentation could be further improved for clarity and completeness:

  • The data type for the fields is specified as i64, but the conversion to f64 in the function suggests that these might represent floating-point numbers. Clarifying this in the documentation would help.
  • It would be beneficial to mention that the timestamp values are expected to be in seconds since they are multiplied by 1000 to convert to milliseconds.

Additionally, ensure that the data type conversions (i64 to f64) are safe and intended, as this could lead to precision issues or unexpected behavior if not handled correctly.

The usages of build_rows_with_fields in the src/mito2/src/engine/update_mode_test.rs file show that the function is called with appropriate data types (timestamps in seconds and i64 values for fields).

  • Lines from src/mito2/src/engine/update_mode_test.rs:
    • let rows = build_rows_with_fields("a", &[2, 3], &[(Some(12), None), (Some(13), None)]);
    • let rows = build_rows_with_fields("a", &[1, 2], &[(Some(11), None), (Some(22), Some(222))]);
    • let rows = build_rows_with_fields("a", &[1, 2], &[(Some(21), None), (Some(22), None)]);

These lines confirm that the arguments passed to build_rows_with_fields are consistent with the expected data types.

Scripts executed

The following scripts were executed for the analysis:


Script:

#!/bin/bash
# Description: Verify data type handling in the function `build_rows_with_fields`.

# Test: Search for other usages of `build_rows_with_fields` to ensure correct data types are passed.
rg --type rust $'build_rows_with_fields'

Length of output: 870



Script:

#!/bin/bash
# Extract specific lines from src/mito2/src/engine/update_mode_test.rs to inspect the arguments passed to build_rows_with_fields.
sed -n '2p;3p;4p' src/mito2/src/engine/update_mode_test.rs

Length of output: 198



Script:

#!/bin/bash
# Find the exact line numbers of `build_rows_with_fields` usages in `src/mito2/src/engine/update_mode_test.rs` and extract those lines.
rg --line-number 'build_rows_with_fields' src/mito2/src/engine/update_mode_test.rs | cut -d: -f1 | xargs -I{} sed -n '{}p' src/mito2/src/engine/update_mode_test.rs

Length of output: 651

src/mito2/src/read/dedup.rs (3)

471-484: Well-structured definition of LastNotNullIter.

The struct LastNotNullIter is well-defined with appropriate visibility and clear documentation. The use of Option for the iterator and batch ensures proper handling of end-of-iteration states, which is crucial for robustness.


486-496: Review of the new constructor in LastNotNullIter.

The constructor initializes all fields appropriately and sets the deduplication strategy to LastNotNull with filter_deleted as false, which is consistent with the stated use case for memtables. The use of DedupMetrics::default() for metrics initialization ensures that all counters start at zero.


511-544: Comprehensive review of batch processing methods in LastNotNullIter.

The methods next_batch_for_merge and next_batch are well-implemented. They handle batch slicing based on duplicate detection and apply the deduplication strategy correctly. The use of Option for state management and the clear separation of concerns between fetching and processing batches are commendable.

However, ensure that the method handles all possible return values from next_batch_for_merge correctly, especially in edge cases where batches might be empty or the iterator might be exhausted unexpectedly.

Verification successful

The next_batch method in src/mito2/src/read/dedup.rs relies on next_batch_for_merge to handle edge cases. The next_batch_for_merge function appears to manage the state of the iterator and handle empty or exhausted batches.

Verification of edge case handling in next_batch_for_merge:

  • The method checks if self.current_batch is None and attempts to fetch a new batch from the iterator.
  • If the iterator is exhausted, it sets self.iter to None and returns Ok(None).
  • If a batch is found, it checks for duplicates using Self::find_split_index.
  • If no duplicates are found, it returns the current batch.
  • If duplicates are found, it slices the batch and updates self.current_batch with the remaining rows.

This logic ensures that the method handles edge cases where the iterator is exhausted or the batch is empty.

Conclusion:

The next_batch_for_merge method correctly handles edge cases, ensuring the iterator state is managed appropriately and batches are processed correctly.

Actionable directive:

  • No further changes are needed for edge case handling in next_batch_for_merge.
Scripts executed

The following scripts were executed for the analysis:


Script:

#!/bin/bash
# Description: Verify handling of edge cases in next_batch method.
# Test: Search for the handling of edge cases in the next_batch method. Expect: Specific checks for empty or exhausted states.

rg --type rust $'next_batch' ./src/mito2/src/read/dedup.rs

Length of output: 514

src/mito2/src/read/dedup.rs Outdated Show resolved Hide resolved
src/mito2/src/read/dedup.rs Show resolved Hide resolved
@killme2008
Copy link
Contributor

I am not convinced that UpdateMode is an appropriate term. It is utilized during scanning to combine duplicate rows, but referred to as UpdateMode. Personally, I would like MergeMode or ScanMergeMode like that.

@fengjiachun
Copy link
Collaborator

I am not convinced that UpdateMode is an appropriate term. It is utilized during scanning to combine duplicate rows, but referred to as UpdateMode. Personally, I would like MergeMode or ScanMergeMode like that.

UpdateMode seems to be viewed from the perspective of writing. cc @evenyag

@evenyag
Copy link
Contributor Author

evenyag commented Jun 26, 2024

I am not convinced that UpdateMode is an appropriate term. It is utilized during scanning to combine duplicate rows, but referred to as UpdateMode. Personally, I would like MergeMode or ScanMergeMode like that.

UpdateMode seems to be viewed from the perspective of writing. cc @evenyag

Good idea. MergeMode may be proper.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Outside diff range and nitpick comments (3)
src/mito2/src/read/seq_scan.rs (1)

Line range hint 38-223: Overall Assessment of SeqScan Class.

The SeqScan class is well-structured and effectively integrates the new MergeMode for deduplication. The class methods are well-organized, and the logic is clear and efficient. This integration enhances the flexibility and robustness of sequential scans in handling different data consistency scenarios.

To further improve, consider adding unit tests specifically targeting the behavior changes introduced by different merge modes, ensuring that each mode functions as expected under various conditions.

src/mito2/src/read/dedup.rs (2)

383-396: New Dedup Strategy Implementation: LastNonNull

The implementation of the LastNonNull struct introduces a new deduplication strategy focusing on maintaining the last non-null fields. The constructor and the overall structure look well-defined. However, consider adding more detailed documentation to explain the strategy's behavior, especially how it handles edge cases and interacts with different types of batches.


Line range hint 737-819: Comprehensive Testing for LastNonNull Strategy

The tests for the LastNonNull strategy are crucial for ensuring its correctness and robustness. The provided tests cover various scenarios, including handling empty batches and duplications. However, consider adding more tests for complex batch structures and large data volumes to ensure the strategy performs well under stress.

Would you like me to help in writing these additional tests?

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 7c463f0 and d9d076a.

Files selected for processing (15)
  • src/mito2/benches/memtable_bench.rs (4 hunks)
  • src/mito2/src/compaction.rs (2 hunks)
  • src/mito2/src/compaction/compactor.rs (3 hunks)
  • src/mito2/src/compaction/window.rs (2 hunks)
  • src/mito2/src/engine.rs (1 hunks)
  • src/mito2/src/engine/merge_mode_test.rs (1 hunks)
  • src/mito2/src/memtable.rs (4 hunks)
  • src/mito2/src/memtable/partition_tree.rs (3 hunks)
  • src/mito2/src/memtable/partition_tree/tree.rs (3 hunks)
  • src/mito2/src/memtable/time_series.rs (14 hunks)
  • src/mito2/src/read/dedup.rs (22 hunks)
  • src/mito2/src/read/scan_region.rs (5 hunks)
  • src/mito2/src/read/seq_scan.rs (2 hunks)
  • src/mito2/src/region/opener.rs (2 hunks)
  • src/mito2/src/region/options.rs (8 hunks)
Files skipped from review as they are similar to previous changes (8)
  • src/mito2/benches/memtable_bench.rs
  • src/mito2/src/compaction/compactor.rs
  • src/mito2/src/compaction/window.rs
  • src/mito2/src/engine.rs
  • src/mito2/src/memtable/partition_tree.rs
  • src/mito2/src/memtable/partition_tree/tree.rs
  • src/mito2/src/read/scan_region.rs
  • src/mito2/src/region/opener.rs
Additional comments not posted (15)
src/mito2/src/engine/merge_mode_test.rs (2)

30-85: Well-structured test for merge mode behavior during write and query operations.

The function test_merge_mode_write_query effectively simulates the intended behavior of the LastNonNull merge mode. It properly initializes the testing environment, sets up the necessary configurations, performs data operations, and verifies the results against expected outcomes.


88-208: Comprehensive test for merge mode behavior during compaction.

The function test_merge_mode_compaction thoroughly tests the impact of the LastNonNull merge mode on data compaction. It includes detailed setup, execution, and validation steps that ensure the merge mode functions as expected during the compaction process.

src/mito2/src/memtable.rs (1)

37-37: Integration of MergeMode in memtable configurations is correctly implemented.

The addition of MergeMode to the MemtableOptions and its usage in the MemtableBuilderProvider are well-integrated. The changes ensure that the merge_mode is considered when building memtables, which is crucial for the new functionality introduced in this PR.

Also applies to: 254-291

src/mito2/src/region/options.rs (2)

Line range hint 37-72: Proper definition and integration of MergeMode in RegionOptions.

The MergeMode enum is well-defined with appropriate serialization properties. Its integration into RegionOptions allows for correct configuration of merge behaviors at the region level, which is a key part of the functionality introduced in this PR.

Also applies to: 116-116


500-513: Effective testing of MergeMode parsing in RegionOptions.

The tests ensure that the MergeMode can be correctly parsed from configuration strings, validating both correct inputs and handling invalid scenarios appropriately.

src/mito2/src/compaction.rs (1)

57-57: Approved import addition for MergeMode.

This import is necessary to support the new compaction strategies introduced in the PR. It allows the use of MergeMode in defining compaction behaviors.

src/mito2/src/read/seq_scan.rs (1)

38-38: Approved import addition and use of MergeMode in SeqScan.

The import and usage of MergeMode are essential for implementing the new deduplication strategies during sequential scans. It allows the SeqScan to configure its behavior based on the specified merge mode, enhancing flexibility in handling different data consistency requirements.

Also applies to: 44-44

src/mito2/src/read/dedup.rs (3)

Line range hint 407-452: Review of push_batch Method in LastNonNull

The push_batch method is crucial for the deduplication process. It handles the logic of merging batches based on keys and timestamps. The use of std::mem::replace for managing buffers is appropriate and ensures that the buffer is updated atomically. However, ensure that the method's error handling is robust, especially in scenarios where batch operations might fail, such as slicing or merging failures.

Verification successful

Robust Error Handling in push_batch Method Confirmed

The push_batch method in LastNonNull handles errors by propagating them using the ? operator, ensuring that any failure in batch operations like slicing or merging is correctly managed. This approach provides robust error handling as requested.

  • Lines 425, 432, and 452: Errors from merge_last_non_null are propagated using the ? operator.
Scripts executed

The following scripts were executed for the analysis:


Script:

#!/bin/bash
# Description: Verify robust error handling in push_batch method.
# Test: Search for error handling patterns in the push_batch method. Expect: Specific error handling for batch operations.

rg --type rust $'push_batch' ./src/mito2/src/read/dedup.rs

Length of output: 613



Script:

#!/bin/bash
# Description: Verify robust error handling in push_batch method.
# Test: Search for error handling patterns in the push_batch method. Expect: Specific error handling for batch operations.

ast-grep --lang rust --pattern $'fn push_batch(&mut self, batch: Batch, metrics: &mut DedupMetrics) -> Result<Option<Batch>> {
  $$$
}' ./src/mito2/src/read/dedup.rs

Length of output: 3726


465-465: Ensure Correct Merging in LastNonNull.finish

The finish method in LastNonNull should ensure that any remaining data in the buffer is processed correctly. This is a critical point to prevent data loss or corruption at the end of the deduplication process. Consider adding checks to ensure that the buffer's handling is consistent with the strategy's requirements.

Verification successful

Buffer Handling in finish Method Verified

The finish method in the LastNonNull strategy correctly processes the buffer by calling merge_last_non_null, ensuring that any remaining data is handled according to the deduplication strategy. The implementation aligns with the strategy requirements, preventing data loss or corruption.

  • src/mito2/src/read/dedup.rs: finish method and merge_last_non_null method
Scripts executed

The following scripts were executed for the analysis:


Script:

#!/bin/bash
# Description: Verify buffer handling in LastNonNull.finish.
# Test: Search for buffer handling logic in the finish method. Expect: Consistent handling according to strategy requirements.

rg --type rust $'finish' ./src/mito2/src/read/dedup.rs

Length of output: 527



Script:

#!/bin/bash
# Search for the LastNonNull struct and its finish method implementation in the relevant files.

# Locate the implementation of the LastNonNull struct
ast-grep --lang rust --pattern 'struct LastNonNull { $$$ }'

# Locate the implementation of the finish method within LastNonNull
ast-grep --lang rust --pattern 'impl LastNonNull {
  $$$
  fn finish(&mut self, $_) -> $_ {
    $$$
  }
  $$$
}'

Length of output: 176



Script:

#!/bin/bash
# Search for the finish method implementations across the entire codebase
rg 'fn finish(&mut self, $_) -> $_ {' --type rust

# Search for the merge_last_non_null method to understand its context
rg 'fn merge_last_non_null' --type rust

Length of output: 277



Script:

#!/bin/bash
# Corrected search for the finish method implementations across the entire codebase
rg 'fn finish\(&mut self, [^)]*\) -> [^)]* \{' --type rust

# Print the merge_last_non_null method implementation for context
rg -A 10 'fn merge_last_non_null' src/mito2/src/read/dedup.rs

Length of output: 771



Script:

#!/bin/bash
# Print the complete implementations of the finish methods in src/mito2/src/read/dedup.rs
rg -A 20 'fn finish' src/mito2/src/read/dedup.rs

# Print the complete implementation of the merge_last_non_null method in src/mito2/src/read/dedup.rs
rg -A 20 'fn merge_last_non_null' src/mito2/src/read/dedup.rs

Length of output: 3047


Line range hint 932-946: Handling Null Values in LastNonNull.merge_null

The test test_last_non_null_merge_null checks the handling of null values in batches. This is an important aspect of the LastNonNull strategy. Ensure that the merging logic is correctly implemented and consider edge cases where all fields might be null.

Verification successful

To verify the handling of null values in the LastNonNull.merge_null method, we need to locate and examine its implementation. Additionally, we should check for other tests that cover edge cases involving all-null fields.


To verify the handling of null values in the LastNonNull.merge_null method, we need to correct the ast-grep pattern and locate the merge_null method implementation. Additionally, we should search for the LastNonNull struct and its methods, and re-run the search for test cases involving merge_null.


To verify the handling of null values in the LastNonNull.merge_null method, we need to refine the search pattern to locate the merge_null method specifically and re-run the search for test cases involving merge_null.


To verify the handling of null values in the LastNonNull strategy, we need to search for all methods within the LastNonNull implementation to identify where the null handling logic might be located.


To verify the handling of null values in the LastNonNull strategy, we need to broaden our search to include other parts of the dedup.rs file or related modules where the null handling logic might reside.


To verify the handling of null values in the LastNonNull strategy, we need to locate and examine the merge_last_non_null method implementation.


To verify the handling of null values in the LastNonNull strategy, we need to search for the LastFieldsBuilder struct and its methods to locate the merge_last_non_null method implementation.


The merge_last_non_null method in the LastFieldsBuilder implementation is responsible for merging the last non-null fields, including handling null values. This method appears to correctly handle null values and edge cases.

  • The method checks if last_fields is empty and handles the buffer accordingly.
  • It iterates over builders and last_fields to push non-null values.
  • It includes logic for cases where the buffer has only one row or needs to replace the last row.
  • It resets the builder state after merging.

Based on this review, the null handling logic in the merge_last_non_null method appears to be correctly implemented.

Verified handling of null values in LastNonNull.merge_null method.

  • The merge_last_non_null method in LastFieldsBuilder correctly handles null values and edge cases.
Scripts executed

The following scripts were executed for the analysis:


Script:

#!/bin/bash
# Description: Verify handling of all-null fields in batches.
# Test: Search for handling of all-null fields in test cases. Expect: Specific handling for all-null scenarios.

rg --type rust $'merge_null' ./src/mito2/src/read/dedup.rs

Length of output: 104



Script:

#!/bin/bash
# Locate the implementation of the `merge_null` method in the `LastNonNull` class.
ast-grep --lang rust --pattern 'impl LastNonNull { $$$ merge_null($$) { $$$ } $$$ }'

# Search for other test cases that might cover edge cases involving all-null fields.
rg --type rust 'merge_null' ./tests

Length of output: 596



Script:

#!/bin/bash
# Correct the pattern to locate the implementation of the `merge_null` method in the `LastNonNull` class.
ast-grep --lang rust --pattern 'impl LastNonNull { $$$ merge_null($$) { $$$ } }'

# Search for the `LastNonNull` struct definition and its methods.
ast-grep --lang rust --pattern 'struct LastNonNull { $$$ }'
ast-grep --lang rust --pattern 'impl LastNonNull { $$$ }'

# Re-run the search for other test cases that might cover edge cases involving all-null fields.
rg --type rust 'merge_null' ./tests

Length of output: 870



Script:

#!/bin/bash
# Refine the pattern to locate the `merge_null` method implementation in the `LastNonNull` class.
ast-grep --lang rust --pattern 'impl LastNonNull { fn merge_null($$) { $$$ } }'

# Re-run the search for other test cases that might cover edge cases involving all-null fields.
rg --type rust 'merge_null' ./tests

Length of output: 117



Script:

#!/bin/bash
# Search for all methods within the `LastNonNull` implementation.
ast-grep --lang rust --pattern 'impl LastNonNull { $$$ }'

Length of output: 689



Script:

#!/bin/bash
# Broaden the search to include other parts of the `dedup.rs` file to locate the null handling logic.
rg --type rust 'merge' ./src/mito2/src/read/dedup.rs

Length of output: 1750



Script:

#!/bin/bash
# Locate the implementation of the `merge_last_non_null` method.
ast-grep --lang rust --pattern 'fn merge_last_non_null($$) { $$$ }'

Length of output: 69



Script:

#!/bin/bash
# Search for the `LastFieldsBuilder` struct and its methods.
ast-grep --lang rust --pattern 'struct LastFieldsBuilder { $$$ }'
ast-grep --lang rust --pattern 'impl LastFieldsBuilder { $$$ }'

Length of output: 10649

src/mito2/src/memtable/time_series.rs (5)

63-76: Ensure proper initialization of merge_mode in TimeSeriesMemtableBuilder.

The addition of merge_mode to TimeSeriesMemtableBuilder is correctly implemented. It is initialized in the constructor and passed to the TimeSeriesMemtable during construction. This change ensures that the merge behavior is configurable at the memtable level.


88-88: Validate the propagation of merge_mode during memtable construction.

The merge_mode is properly passed to the TimeSeriesMemtable constructor from TimeSeriesMemtableBuilder. This ensures consistency in behavior based on the configured merge mode.


Line range hint 103-135: Check conditional dedup logic based on merge_mode.

The conditional logic to disable dedup when merge_mode is set to LastNonNull is a smart way to optimize performance by avoiding unnecessary deduplication. However, ensure this logic aligns with the expected behavior and does not inadvertently affect other functionalities.


272-277: Ensure correct application of LastNonNullIter based on merge_mode.

The conditional application of LastNonNullIter in the iter method is correctly implemented. When merge_mode is LastNonNull, the custom iterator is used, ensuring that the deduplication strategy aligns with the specified mode.


298-298: Verify correct handling of merge_mode in ranges method.

The merge_mode is correctly passed to the TimeSeriesIterBuilder, ensuring that the iteration behavior is consistent with the configured merge strategy. This is crucial for maintaining the integrity of data retrieval based on the merge mode settings.

src/mito2/src/read/dedup.rs Outdated Show resolved Hide resolved
src/mito2/src/compaction.rs Show resolved Hide resolved
src/mito2/src/compaction.rs Show resolved Hide resolved
src/mito2/src/read/seq_scan.rs Show resolved Hide resolved
@killme2008
Copy link
Contributor

@evenyag Don't forget to change the PR title and content.

Copy link
Contributor

@killme2008 killme2008 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job. The final piece to support InfluxDB line protocol.

src/mito2/src/compaction.rs Show resolved Hide resolved
src/mito2/src/engine/append_mode_test.rs Show resolved Hide resolved
src/mito2/src/engine/merge_mode_test.rs Show resolved Hide resolved
src/mito2/src/read/dedup.rs Show resolved Hide resolved
src/mito2/src/region/options.rs Outdated Show resolved Hide resolved
@evenyag evenyag changed the title feat: Implements update_mode region options feat: Implements merge_mode region options Jun 27, 2024
@evenyag
Copy link
Contributor Author

evenyag commented Jun 27, 2024

@coderabbitai pause

Copy link
Contributor

coderabbitai bot commented Jun 27, 2024

Actions performed

Reviews paused.

make merge mode optional and use default while it is None
Copy link
Contributor

@killme2008 killme2008 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@killme2008 killme2008 added this pull request to the merge queue Jun 27, 2024
Merged via the queue into GreptimeTeam:main with commit 10b7a3d Jun 27, 2024
49 checks passed
zyy17 pushed a commit to zyy17/greptimedb that referenced this pull request Jun 28, 2024
* feat: add update_mode to region options

* test: add test

* feat: last not null iter

* feat: time series last not null

* feat: partition tree update mode

* feat: partition tree

* fix: last not null iter slice

* test: add test for compaction

* test: use second resolution

* style: fix clippy

* chore: merge two lines

Co-authored-by: Jeremyhi <jiachun_feng@proton.me>

* chore: address CR comments

* refactor: UpdateMode -> MergeMode

* refactor: LastNotNull -> LastNonNull

* chore: return None earlier

* feat: validate region options

make merge mode optional and use default while it is None

* test: fix tests

---------

Co-authored-by: Jeremyhi <jiachun_feng@proton.me>
zyy17 pushed a commit to zyy17/greptimedb that referenced this pull request Jun 28, 2024
* feat: add update_mode to region options

* test: add test

* feat: last not null iter

* feat: time series last not null

* feat: partition tree update mode

* feat: partition tree

* fix: last not null iter slice

* test: add test for compaction

* test: use second resolution

* style: fix clippy

* chore: merge two lines

Co-authored-by: Jeremyhi <jiachun_feng@proton.me>

* chore: address CR comments

* refactor: UpdateMode -> MergeMode

* refactor: LastNotNull -> LastNonNull

* chore: return None earlier

* feat: validate region options

make merge mode optional and use default while it is None

* test: fix tests

---------

Co-authored-by: Jeremyhi <jiachun_feng@proton.me>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs-not-required This change does not impact docs.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants