-
Notifications
You must be signed in to change notification settings - Fork 313
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Implements merge_mode
region options
#4208
Conversation
Note Reviews pausedUse the following commands to manage reviews:
WalkthroughThe recent changes introduce a Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant Engine
participant CompactionSstReaderBuilder
participant ScanInput
User ->> Engine: Start compaction
Engine ->> CompactionSstReaderBuilder: Initialize with MergeMode
Engine ->> CompactionSstReaderBuilder: Build SST reader
CompactionSstReaderBuilder ->> Engine: Return SST reader
Engine ->> ScanInput: Configure with MergeMode
ScanInput ->> Engine: Initiate scanning
Engine ->> User: Return compaction results
Poem
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (invoked as PR comments)
Additionally, you can add CodeRabbit Configration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
Outside diff range and nitpick comments (1)
src/mito2/src/compaction/window.rs (1)
Line range hint
212-263
: Proper handling ofUpdateMode
in compaction window configurationThe addition of
UpdateMode
to theRegionOptions
within theVersion
struct is a strategic enhancement that aligns the compaction behavior with the update strategies defined at the region level.
- Alignment with Design: This change ensures that the compaction process respects the update mode settings, which is crucial for maintaining data integrity and expected behavior during the compaction.
- Potential Improvement: It would be beneficial to add tests specifically targeting the interaction between the update mode and the compaction process to ensure that the integration is seamless and behaves as expected under various conditions.
Would you like to develop these targeted tests or need further clarification on potential edge cases that should be covered?
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (17)
- src/mito2/benches/memtable_bench.rs (4 hunks)
- src/mito2/src/compaction.rs (2 hunks)
- src/mito2/src/compaction/compactor.rs (3 hunks)
- src/mito2/src/compaction/window.rs (2 hunks)
- src/mito2/src/engine.rs (1 hunks)
- src/mito2/src/engine/append_mode_test.rs (1 hunks)
- src/mito2/src/engine/update_mode_test.rs (1 hunks)
- src/mito2/src/memtable.rs (4 hunks)
- src/mito2/src/memtable/partition_tree.rs (3 hunks)
- src/mito2/src/memtable/partition_tree/tree.rs (3 hunks)
- src/mito2/src/memtable/time_series.rs (14 hunks)
- src/mito2/src/read/dedup.rs (2 hunks)
- src/mito2/src/read/scan_region.rs (5 hunks)
- src/mito2/src/read/seq_scan.rs (2 hunks)
- src/mito2/src/region/opener.rs (2 hunks)
- src/mito2/src/region/options.rs (8 hunks)
- src/mito2/src/test_util.rs (1 hunks)
Files skipped from review due to trivial changes (1)
- src/mito2/src/engine/append_mode_test.rs
Additional comments not posted (40)
src/mito2/benches/memtable_bench.rs (1)
Line range hint
27-128
: Update mode integration in benchmark testsThe benchmark tests for
write_rows
andfull_scan
functions have been updated to include theUpdateMode::LastRow
. This is a crucial update as it ensures that the performance implications of different update modes are measured and understood.
- Performance Consideration: It's essential to also benchmark the
LastNotNull
mode to compare its performance impact relative toLastRow
. This would provide a more comprehensive understanding of how each mode affects performance.- Refinement: Consider parameterizing the benchmark tests to run with both
LastRow
andLastNotNull
modes. This could be achieved by using a loop or parameterized benchmarks if supported by the Criterion library.Would you like a script to verify the performance impact comparison between
LastRow
andLastNotNull
modes?src/mito2/src/memtable.rs (1)
Line range hint
36-286
: Comprehensive integration ofUpdateMode
in Memtable configurationsThe integration of
UpdateMode
in various memtable configurations is a significant enhancement, ensuring that different update strategies are supported at a fundamental level in the storage layer.
- Consistency: The addition of
UpdateMode
to bothTimeSeriesMemtableBuilder
andPartitionTreeMemtableBuilder
ensures that the update mode is consistently applied across different types of memtables.- Best Practices: Consider adding default cases in the match statements for
MemtableBuilderProvider
to handle unexpectedMemtableOptions
. This would prevent potential runtime errors and improve the robustness of the code.- Performance: The impact of different update modes on memtable performance should be analyzed, especially since memtables are critical for write performance.
Would you like assistance in analyzing the performance impact of different update modes on memtables?
src/mito2/src/compaction/compactor.rs (4)
31-31
: Clarify import ofCompactionSstReaderBuilder
.The import of
CompactionSstReaderBuilder
is crucial for the changes in this file. Ensure that this import is necessary and correctly placed in the context of the file's other imports.
144-144
: Review the integration ofupdate_mode
inMemtableBuilderProvider
.The integration of
update_mode
in theMemtableBuilderProvider
builder method is a significant change. It directly affects how memtables are built based on theupdate_mode
setting. Ensure that this integration is tested and that the behavior aligns with the expected outcomes for different update modes.Verification successful
Tests for
update_mode
integration found inupdate_mode_test.rs
The integration of
update_mode
withinMemtableBuilderProvider
appears to be tested insrc/mito2/src/engine/update_mode_test.rs
. Specifically, the following tests are present:
test_update_mode_write_query
test_update_mode_compaction
These tests suggest that different
update_mode
settings are being evaluated.
src/mito2/src/engine/update_mode_test.rs
Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify the integration of `update_mode` in `MemtableBuilderProvider`. # Test: Search for the method usage in tests. Expect: Tests covering different `update_mode` settings. rg --type rust $'MemtableBuilderProvider::new.*update_mode'Length of output: 60
Script:
#!/bin/bash # Search for `update_mode` within the `MemtableBuilderProvider` context rg --type rust 'MemtableBuilderProvider' | rg 'update_mode' # Search for test files that might be testing `update_mode` indirectly rg --type rust 'update_mode' --testsLength of output: 526
Script:
#!/bin/bash # Search for `update_mode` within the `MemtableBuilderProvider` context rg --type rust 'MemtableBuilderProvider.*update_mode' # Corrected command to search for test files that might be testing `update_mode` indirectly rg --type rust 'update_mode' --glob '*test*'Length of output: 457
284-293
: Review the construction ofCompactionSstReaderBuilder
withupdate_mode
.The construction of
CompactionSstReaderBuilder
now includes theupdate_mode
. This is crucial for ensuring that the SST reader behaves according to the specified update mode during the compaction process. It is important to verify that this integration is correct and that theCompactionSstReaderBuilder
is properly utilizing theupdate_mode
.Verification successful
Review the construction of
CompactionSstReaderBuilder
withupdate_mode
.The
CompactionSstReaderBuilder
correctly includes and initializes theupdate_mode
field. Theupdate_mode
is passed fromcompaction_region.current_version.options.update_mode
and is integrated properly in the builder.
src/mito2/src/compaction/compactor.rs
:
- Line where
update_mode
is fetched:let update_mode = compaction_region.current_version.options.update_mode;
- Line where
update_mode
is passed to the builder:update_mode,
Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify the correct usage of `update_mode` in `CompactionSstReaderBuilder`. # Test: Search for the instantiation of `CompactionSstReaderBuilder` in the codebase. Expect: Correct initialization with `update_mode`. rg --type rust $'CompactionSstReaderBuilder.*update_mode'Length of output: 58
Script:
#!/bin/bash # Find all instances of `CompactionSstReaderBuilder` in the codebase. rg --type rust 'CompactionSstReaderBuilder'Length of output: 366
Script:
#!/bin/bash # Check the definition of `CompactionSstReaderBuilder` for `update_mode`. rg --type rust 'update_mode' src/mito2/src/compaction.rsLength of output: 134
Script:
#!/bin/bash # Verify how `update_mode` is used in `compactor.rs`. rg --type rust 'update_mode' src/mito2/src/compaction/compactor.rsLength of output: 228
282-282
: Ensure the propagation ofupdate_mode
to the compaction logic.The propagation of
update_mode
from theRegionOptions
to the compaction logic is a critical change. This ensures that the compaction process respects theupdate_mode
settings. Verify that this change is consistent with the intended design and that it is covered by unit tests.Verification successful
Ensure the propagation of
update_mode
to the compaction logic.The propagation of
update_mode
from theRegionOptions
to the compaction logic is verified. Theupdate_mode
is set and used appropriately incompactor.rs
, and the test functiontest_update_mode_compaction
covers theupdate_mode
during the compaction process.
src/mito2/src/compaction/compactor.rs
src/mito2/src/engine/update_mode_test.rs
Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify the propagation of `update_mode` in the compaction logic. # Test: Search for the usage of `update_mode` in the compaction logic. Expect: Proper handling based on `update_mode`. rg --type rust $'update_mode.*compaction'Length of output: 247
Script:
#!/bin/bash # Search for all instances of `update_mode` in the `compactor.rs` file to verify its usage in the compaction logic. rg 'update_mode' src/mito2/src/compaction/compactor.rsLength of output: 216
Script:
#!/bin/bash # Extract the content of the `test_update_mode_compaction` function from `update_mode_test.rs` to verify it covers `update_mode` in the compaction process. ast-grep --lang rust --pattern 'async fn test_update_mode_compaction() { $$$ }' src/mito2/src/engine/update_mode_test.rsLength of output: 9487
src/mito2/src/memtable/partition_tree/tree.rs (2)
43-45
: Added imports for LastNotNullIter and BatchThese imports are necessary for the new functionality to handle different update modes. Ensure that these are used appropriately in the implementation to avoid any unused imports.
247-252
: Ensure correct iterator wrapping based on update_modeThe conditional wrapping of the iterator based on the
update_mode
is a critical part of the functionality. This change appears correct and should work as intended. However, ensure that theLastNotNullIter
is thoroughly tested to handle edge cases in data consistency and performance impacts.src/mito2/src/region/opener.rs (2)
195-199
: Correct implementation of memtable builder initialization with update_modeThis change correctly passes the
update_mode
to the memtable builder. It's crucial that this is tested extensively to ensure that the memtables are initialized properly based on the selected update mode, especially since this affects data integrity and query results.
329-329
: Ensure update_mode is correctly passed to memtable builder during region openingThe addition of
update_mode
in the memtable builder during the region opening process is crucial for ensuring that the region adheres to the specified update strategies. This change looks good, but make sure to validate through integration tests that the update mode is respected in all operations post-opening.src/mito2/src/region/options.rs (5)
37-47
: Introduced newUpdateMode
enum for handling duplicate rows.The
UpdateMode
enum has been correctly defined with two variantsLastRow
andLastNotNull
, which are crucial for the new functionality of handling duplicate rows based on the update mode. The use of#[default]
onLastRow
is appropriate as it provides a sensible default behavior.
70-72
: Addedupdate_mode
field toRegionOptions
.The new
update_mode
field inRegionOptions
is well-integrated and includes a helpful comment explaining its effect in conjunction withappend_mode
. This addition aligns with the PR's objectives to manage how duplicate rows are updated.
116-116
: Ensureupdate_mode
is correctly assigned inRegionOptions
conversion.The assignment of
update_mode
in the conversion from aHashMap
toRegionOptions
is correctly implemented. This ensures that theupdate_mode
setting is preserved when options are loaded from external configurations.
Line range hint
199-210
: Addedupdate_mode
toRegionOptionsWithoutEnum
.The inclusion of
update_mode
inRegionOptionsWithoutEnum
is necessary due to the limitations of#[serde(default)]
with enums. This change correctly mirrors the structure ofRegionOptions
, ensuring that serialization processes handle theupdate_mode
appropriately.
500-513
: Test cases forupdate_mode
handling inRegionOptions
.The test cases added for
update_mode
effectively cover the scenarios for valid modes (last_row
,last_not_null
) and an invalid mode (unknown
). This thorough testing ensures the robustness of theupdate_mode
functionality.src/mito2/src/engine.rs (1)
53-54
: Added test module forupdate_mode
.The inclusion of the
update_mode_test
module is crucial for ensuring that the new functionality works as expected. It's good to see proactive testing being incorporated alongside new features.src/mito2/src/compaction.rs (5)
56-56
: Addition of UpdateMode importThe
UpdateMode
enum has been imported to be used in the compaction functionalities. This import is necessary due to the new compaction strategies introduced in this PR.
446-455
: Introduction of CompactionSstReaderBuilderThis new structure is designed to build batch readers for compaction tasks. It includes various settings like
append_mode
,filter_deleted
, and crucially,update_mode
, which dictates how duplicates are handled during compaction.
- Fields Review: The fields are well-structured and relevant for the compaction process. However, ensure that the
cache
field, which is optional, is handled properly in all scenarios where it might beNone
.- Use of 'update_mode': Proper inclusion and handling of the
update_mode
suggest good integration with the new functionality.Overall, this structure appears well-designed for its purpose.
457-477
: Review ofbuild_sst_reader
MethodThis method constructs a
BoxedBatchReader
configured according to theCompactionSstReaderBuilder
settings. The method correctly incorporates various settings like file inputs, append mode, and importantly, theupdate_mode
.
- Handling of
update_mode
: The method correctly adjusts the scan input to include theupdate_mode
, which is critical for applying the correct duplication strategy.- Error Handling: The method includes robust error handling, particularly with file not found errors which are ignored during compaction—a sensible default given the context.
This method is well-implemented and aligns with the objectives of handling different update modes during compaction.
Line range hint
480-504
: Review oftime_range_to_predicate
FunctionThis function converts a
TimestampRange
into aPredicate
for filtering rows outside the specified range during scans. It handles different cases of range boundaries (start and/or end specified).
- Correctness: The function uses
datafusion_expr
to create conditions based on the metadata's time index column. The use ofts_to_lit
for converting timestamps to literals appears correct.- Error Handling: The function properly handles potential overflows when converting timestamps, which is crucial for avoiding runtime errors.
This utility function is well-implemented and crucial for ensuring that only relevant data is processed during compaction.
Line range hint
506-530
: Review ofget_expired_ssts
FunctionThis function identifies SST files that have expired based on their TTL. It correctly calculates the expiration time and filters out files that are beyond this time.
- Error Handling: The function handles potential errors during the subtraction of duration from the timestamp, which is critical for avoiding incorrect behavior in edge cases.
- Efficiency: The function efficiently collects expired files using flat_map, which is suitable given the potentially large number of files in a compaction scenario.
This function is well-implemented and plays a crucial role in maintaining the health of the storage by removing outdated data.
src/mito2/src/memtable/partition_tree.rs (1)
84-86
: Approved addition of theupdate_mode
field inPartitionTreeConfig
.The addition aligns with the PR's objective to support different update strategies for handling duplicate rows. Marking it to skip deserialization is appropriate if this field is expected to be set programmatically rather than through configuration files.
src/mito2/src/read/seq_scan.rs (1)
214-223
: Approved handling of update modes inSeqScan
.The implementation correctly handles the
LastRow
andLastNotNull
update modes in the deduplication process. This is crucial for ensuring that the scan behavior aligns with the specified update mode, which is central to the functionality introduced in this PR.src/mito2/src/read/scan_region.rs (4)
45-45
: Addition ofUpdateMode
import:The import of
UpdateMode
is necessary for the new functionality related to handling different update modes in scanning operations. This aligns with the PR's objectives to manage how duplicate rows are updated.
403-404
: Documentation and clarity forupdate_mode
inScanInput
:While the implementation of
update_mode
inScanInput
is correct, adding a brief comment explaining its role, especially how it affects the deduplication process during scans, would enhance code readability and maintainability.- /// Mode to update duplicate rows. + /// Mode to update duplicate rows. Determines how duplicate entries are handled based on the specified update mode.
505-510
: Proper implementation ofwith_update_mode
method:The method
with_update_mode
correctly sets theupdate_mode
forScanInput
, ensuring that the scanning operations can utilize the specified update mode. This method is well-implemented and follows Rust's idiomatic use of builder patterns.
299-300
: Ensure correct handling ofupdate_mode
inScanInput
:The addition of
update_mode
to theScanInput
struct is crucial for implementing the new deduplication strategies. This change is consistent with the PR objectives and should be tested thoroughly to ensure that it integrates well with the existing scanning logic.Verification successful
Ensure correct handling of
update_mode
inScanInput
:The
update_mode
is correctly integrated into theScanInput
struct and is used in the scanning operations as follows:
compaction.rs
:
ScanInput
is created andupdate_mode
is set appropriately.scan_region.rs
:
ScanInput
is created andupdate_mode
is set appropriately.unordered_scan.rs
:
ScanInput
is created andupdate_mode
is set appropriately.seq_scan.rs
:
ScanInput
is created andupdate_mode
is set appropriately.These integrations are consistent with the PR objectives and ensure that the new deduplication strategies are implemented correctly.
Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify the integration of update_mode in ScanInput with scanning operations. # Test: Search for usage of ScanInput and ensure update_mode is considered. rg --type rust $'ScanInput' --context 5Length of output: 14602
src/mito2/src/read/dedup.rs (2)
471-564
: Implementation ofLastNotNullIter
iteratorThe newly added
LastNotNullIter
is crucial for handling theLastNotNull
deduplication strategy. The iterator is well-structured and seems to correctly manage the batch processing based on theLastNotNull
logic. However, there are a few points that need attention:
- Documentation and Comments: More detailed comments explaining the logic, especially around complex conditional flows in
next_batch_for_merge
andnext
, would enhance maintainability.- Error Handling: Ensure that all possible errors from
iter.next()
andbatch.slice()
are appropriately handled and logged.- Performance Considerations: Review if there are any potential performance bottlenecks, especially in loops and conditional checks within
next_batch_for_merge
. Consider optimizing the slicing logic to avoid unnecessary data copying.Overall, the implementation aligns with the PR's objectives but could benefit from further documentation and potential optimizations.
1081-1136
: Unit Tests forLastNotNullIter
The unit tests provided for
LastNotNullIter
are comprehensive and cover a variety of scenarios including batch splitting and handling multiple batches. These tests are essential for ensuring the correct behavior of the deduplication strategy under different conditions. Some suggestions:
- Additional Test Scenarios: Consider adding tests for edge cases, such as batches with all rows having the same timestamp or batches that mix
OpType::Put
andOpType::Delete
in more complex patterns.- Performance Testing: While functional correctness is verified, adding performance benchmarks to understand the impact of this iterator on large datasets could be beneficial.
It's good to see thorough testing around this new feature, which is critical for maintaining data integrity in GreptimeDB.
[APROVED]src/mito2/src/memtable/time_series.rs (11)
47-49
: Import ofLastNotNullIter
andUpdateMode
The import of
LastNotNullIter
andUpdateMode
reflects the new functionality introduced for handling different update modes in time series memtables. This aligns with the PR objectives to support different update strategies.
60-60
: Addition ofupdate_mode
field inTimeSeriesMemtableBuilder
Adding the
update_mode
field to theTimeSeriesMemtableBuilder
struct is crucial for initializing the memtable with the desired update behavior. This change is necessary to support the new feature.
65-73
: Constructor modification to includeupdate_mode
The constructor of
TimeSeriesMemtableBuilder
has been appropriately modified to accept anupdate_mode
parameter and initialize the corresponding field. This ensures that the builder pattern remains consistent with the new functionality.
85-85
: Passupdate_mode
toTimeSeriesMemtable
constructorPassing the
update_mode
from the builder to theTimeSeriesMemtable
instance is essential for ensuring that the memtable behaves according to the specified update mode. This is a critical link in the chain of changes required to support the new feature.
100-100
: Addition ofupdate_mode
inTimeSeriesMemtable
struct and constructorThe addition of the
update_mode
field in theTimeSeriesMemtable
struct and its initialization in the constructor are key to applying the update mode logic throughout the memtable's lifecycle. This change correctly propagates the update mode setting from the configuration to the operational behavior of the memtable.Also applies to: 109-109
118-122
: Logic to disable deduplication based onupdate_mode
The conditional logic to disable deduplication when the
update_mode
isLastNotNull
ensures that the memtable does not perform unnecessary deduplication operations that conflict with the intended update behavior. This is a smart use of conditional logic to optimize performance based on configuration.
132-132
: Ensureupdate_mode
is part of the memtable stateStoring
update_mode
as part of theTimeSeriesMemtable
state is necessary for later use during operations like iteration, ensuring that the update behavior is consistently applied. This encapsulation of state within the memtable is well-implemented.
262-267
: Conditional iterator creation based onupdate_mode
The conditional creation of
LastNotNullIter
based on theupdate_mode
during iteration is a crucial implementation detail that ensures the correct handling of updates according to the specified mode. This dynamic behavior is essential for the flexibility of the update mechanism.
288-288
: Includeupdate_mode
inTimeSeriesIterBuilder
Incorporating the
update_mode
into theTimeSeriesIterBuilder
is necessary for constructing iterators that respect the specified update behavior. This change ensures that the iterator construction is aligned with the new update mode functionality.
337-337
: Passupdate_mode
during memtable forkingPassing the
update_mode
during the forking of aTimeSeriesMemtable
ensures that the new instance maintains the same update behavior as the original. This consistency is crucial for operations that involve replicating or splitting memtables.
874-874
: Update mode handling inTimeSeriesIterBuilder
The inclusion of
update_mode
in theTimeSeriesIterBuilder
and its use in the iterator construction process are well-implemented. This ensures that the iterator behavior is consistent with the specified update mode, providing flexibility in how updates are processed during iteration.Also applies to: 884-890
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #4208 +/- ##
==========================================
- Coverage 84.85% 84.60% -0.25%
==========================================
Files 1040 1042 +2
Lines 183023 183488 +465
==========================================
- Hits 155297 155238 -59
- Misses 27726 28250 +524 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Co-authored-by: Jeremyhi <jiachun_feng@proton.me>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (2)
- src/mito2/src/read/dedup.rs (2 hunks)
- src/mito2/src/test_util.rs (1 hunks)
Additional comments not posted (4)
src/mito2/src/test_util.rs (1)
923-953
: Enhance Documentation and Verify Data Type HandlingThe function
build_rows_with_fields
is well-implemented for creating rows with optional fields. However, the documentation could be further improved for clarity and completeness:
- The data type for the fields is specified as
i64
, but the conversion tof64
in the function suggests that these might represent floating-point numbers. Clarifying this in the documentation would help.- It would be beneficial to mention that the timestamp values are expected to be in seconds since they are multiplied by 1000 to convert to milliseconds.
Additionally, ensure that the data type conversions (i64 to f64) are safe and intended, as this could lead to precision issues or unexpected behavior if not handled correctly.
/// Build rows with schema (string, f64, f64, ts_millis). /// - `key`: A string key that is common across all rows. /// - `timestamps`: Array of timestamp values in seconds. /// - `fields`: Array of tuples where each tuple contains two optional i64 values (interpreted as f64), representing two optional float fields. /// Returns a vector of `Row` each containing the key, two optional float fields, and a timestamp.Verification successful
Enhance Documentation and Verify Data Type Handling
The function
build_rows_with_fields
is well-implemented for creating rows with optional fields. However, the documentation could be further improved for clarity and completeness:
- The data type for the fields is specified as
i64
, but the conversion tof64
in the function suggests that these might represent floating-point numbers. Clarifying this in the documentation would help.- It would be beneficial to mention that the timestamp values are expected to be in seconds since they are multiplied by 1000 to convert to milliseconds.
Additionally, ensure that the data type conversions (i64 to f64) are safe and intended, as this could lead to precision issues or unexpected behavior if not handled correctly.
The usages of
build_rows_with_fields
in thesrc/mito2/src/engine/update_mode_test.rs
file show that the function is called with appropriate data types (timestamps in seconds andi64
values for fields).
- Lines from
src/mito2/src/engine/update_mode_test.rs
:
let rows = build_rows_with_fields("a", &[2, 3], &[(Some(12), None), (Some(13), None)]);
let rows = build_rows_with_fields("a", &[1, 2], &[(Some(11), None), (Some(22), Some(222))]);
let rows = build_rows_with_fields("a", &[1, 2], &[(Some(21), None), (Some(22), None)]);
These lines confirm that the arguments passed to
build_rows_with_fields
are consistent with the expected data types.Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify data type handling in the function `build_rows_with_fields`. # Test: Search for other usages of `build_rows_with_fields` to ensure correct data types are passed. rg --type rust $'build_rows_with_fields'Length of output: 870
Script:
#!/bin/bash # Extract specific lines from src/mito2/src/engine/update_mode_test.rs to inspect the arguments passed to build_rows_with_fields. sed -n '2p;3p;4p' src/mito2/src/engine/update_mode_test.rsLength of output: 198
Script:
#!/bin/bash # Find the exact line numbers of `build_rows_with_fields` usages in `src/mito2/src/engine/update_mode_test.rs` and extract those lines. rg --line-number 'build_rows_with_fields' src/mito2/src/engine/update_mode_test.rs | cut -d: -f1 | xargs -I{} sed -n '{}p' src/mito2/src/engine/update_mode_test.rsLength of output: 651
src/mito2/src/read/dedup.rs (3)
471-484
: Well-structured definition ofLastNotNullIter
.The struct
LastNotNullIter
is well-defined with appropriate visibility and clear documentation. The use ofOption
for the iterator and batch ensures proper handling of end-of-iteration states, which is crucial for robustness.
486-496
: Review of thenew
constructor inLastNotNullIter
.The constructor initializes all fields appropriately and sets the deduplication strategy to
LastNotNull
withfilter_deleted
asfalse
, which is consistent with the stated use case for memtables. The use ofDedupMetrics::default()
for metrics initialization ensures that all counters start at zero.
511-544
: Comprehensive review of batch processing methods inLastNotNullIter
.The methods
next_batch_for_merge
andnext_batch
are well-implemented. They handle batch slicing based on duplicate detection and apply the deduplication strategy correctly. The use ofOption
for state management and the clear separation of concerns between fetching and processing batches are commendable.However, ensure that the method handles all possible return values from
next_batch_for_merge
correctly, especially in edge cases where batches might be empty or the iterator might be exhausted unexpectedly.Verification successful
The
next_batch
method insrc/mito2/src/read/dedup.rs
relies onnext_batch_for_merge
to handle edge cases. Thenext_batch_for_merge
function appears to manage the state of the iterator and handle empty or exhausted batches.Verification of edge case handling in
next_batch_for_merge
:
- The method checks if
self.current_batch
isNone
and attempts to fetch a new batch from the iterator.- If the iterator is exhausted, it sets
self.iter
toNone
and returnsOk(None)
.- If a batch is found, it checks for duplicates using
Self::find_split_index
.- If no duplicates are found, it returns the current batch.
- If duplicates are found, it slices the batch and updates
self.current_batch
with the remaining rows.This logic ensures that the method handles edge cases where the iterator is exhausted or the batch is empty.
Conclusion:
The
next_batch_for_merge
method correctly handles edge cases, ensuring the iterator state is managed appropriately and batches are processed correctly.Actionable directive:
- No further changes are needed for edge case handling in
next_batch_for_merge
.Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify handling of edge cases in next_batch method. # Test: Search for the handling of edge cases in the next_batch method. Expect: Specific checks for empty or exhausted states. rg --type rust $'next_batch' ./src/mito2/src/read/dedup.rsLength of output: 514
I am not convinced that |
|
Good idea. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
Outside diff range and nitpick comments (3)
src/mito2/src/read/seq_scan.rs (1)
Line range hint
38-223
: Overall Assessment ofSeqScan
Class.The
SeqScan
class is well-structured and effectively integrates the newMergeMode
for deduplication. The class methods are well-organized, and the logic is clear and efficient. This integration enhances the flexibility and robustness of sequential scans in handling different data consistency scenarios.To further improve, consider adding unit tests specifically targeting the behavior changes introduced by different merge modes, ensuring that each mode functions as expected under various conditions.
src/mito2/src/read/dedup.rs (2)
383-396
: New Dedup Strategy Implementation: LastNonNullThe implementation of the
LastNonNull
struct introduces a new deduplication strategy focusing on maintaining the last non-null fields. The constructor and the overall structure look well-defined. However, consider adding more detailed documentation to explain the strategy's behavior, especially how it handles edge cases and interacts with different types of batches.
Line range hint
737-819
: Comprehensive Testing for LastNonNull StrategyThe tests for the
LastNonNull
strategy are crucial for ensuring its correctness and robustness. The provided tests cover various scenarios, including handling empty batches and duplications. However, consider adding more tests for complex batch structures and large data volumes to ensure the strategy performs well under stress.Would you like me to help in writing these additional tests?
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (15)
- src/mito2/benches/memtable_bench.rs (4 hunks)
- src/mito2/src/compaction.rs (2 hunks)
- src/mito2/src/compaction/compactor.rs (3 hunks)
- src/mito2/src/compaction/window.rs (2 hunks)
- src/mito2/src/engine.rs (1 hunks)
- src/mito2/src/engine/merge_mode_test.rs (1 hunks)
- src/mito2/src/memtable.rs (4 hunks)
- src/mito2/src/memtable/partition_tree.rs (3 hunks)
- src/mito2/src/memtable/partition_tree/tree.rs (3 hunks)
- src/mito2/src/memtable/time_series.rs (14 hunks)
- src/mito2/src/read/dedup.rs (22 hunks)
- src/mito2/src/read/scan_region.rs (5 hunks)
- src/mito2/src/read/seq_scan.rs (2 hunks)
- src/mito2/src/region/opener.rs (2 hunks)
- src/mito2/src/region/options.rs (8 hunks)
Files skipped from review as they are similar to previous changes (8)
- src/mito2/benches/memtable_bench.rs
- src/mito2/src/compaction/compactor.rs
- src/mito2/src/compaction/window.rs
- src/mito2/src/engine.rs
- src/mito2/src/memtable/partition_tree.rs
- src/mito2/src/memtable/partition_tree/tree.rs
- src/mito2/src/read/scan_region.rs
- src/mito2/src/region/opener.rs
Additional comments not posted (15)
src/mito2/src/engine/merge_mode_test.rs (2)
30-85
: Well-structured test for merge mode behavior during write and query operations.The function
test_merge_mode_write_query
effectively simulates the intended behavior of theLastNonNull
merge mode. It properly initializes the testing environment, sets up the necessary configurations, performs data operations, and verifies the results against expected outcomes.
88-208
: Comprehensive test for merge mode behavior during compaction.The function
test_merge_mode_compaction
thoroughly tests the impact of theLastNonNull
merge mode on data compaction. It includes detailed setup, execution, and validation steps that ensure the merge mode functions as expected during the compaction process.src/mito2/src/memtable.rs (1)
37-37
: Integration ofMergeMode
in memtable configurations is correctly implemented.The addition of
MergeMode
to theMemtableOptions
and its usage in theMemtableBuilderProvider
are well-integrated. The changes ensure that themerge_mode
is considered when building memtables, which is crucial for the new functionality introduced in this PR.Also applies to: 254-291
src/mito2/src/region/options.rs (2)
Line range hint
37-72
: Proper definition and integration ofMergeMode
inRegionOptions
.The
MergeMode
enum is well-defined with appropriate serialization properties. Its integration intoRegionOptions
allows for correct configuration of merge behaviors at the region level, which is a key part of the functionality introduced in this PR.Also applies to: 116-116
500-513
: Effective testing ofMergeMode
parsing inRegionOptions
.The tests ensure that the
MergeMode
can be correctly parsed from configuration strings, validating both correct inputs and handling invalid scenarios appropriately.src/mito2/src/compaction.rs (1)
57-57
: Approved import addition forMergeMode
.This import is necessary to support the new compaction strategies introduced in the PR. It allows the use of
MergeMode
in defining compaction behaviors.src/mito2/src/read/seq_scan.rs (1)
38-38
: Approved import addition and use ofMergeMode
inSeqScan
.The import and usage of
MergeMode
are essential for implementing the new deduplication strategies during sequential scans. It allows theSeqScan
to configure its behavior based on the specified merge mode, enhancing flexibility in handling different data consistency requirements.Also applies to: 44-44
src/mito2/src/read/dedup.rs (3)
Line range hint
407-452
: Review of push_batch Method in LastNonNullThe
push_batch
method is crucial for the deduplication process. It handles the logic of merging batches based on keys and timestamps. The use ofstd::mem::replace
for managing buffers is appropriate and ensures that the buffer is updated atomically. However, ensure that the method's error handling is robust, especially in scenarios where batch operations might fail, such as slicing or merging failures.Verification successful
Robust Error Handling in
push_batch
Method ConfirmedThe
push_batch
method inLastNonNull
handles errors by propagating them using the?
operator, ensuring that any failure in batch operations like slicing or merging is correctly managed. This approach provides robust error handling as requested.
- Lines 425, 432, and 452: Errors from
merge_last_non_null
are propagated using the?
operator.Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify robust error handling in push_batch method. # Test: Search for error handling patterns in the push_batch method. Expect: Specific error handling for batch operations. rg --type rust $'push_batch' ./src/mito2/src/read/dedup.rsLength of output: 613
Script:
#!/bin/bash # Description: Verify robust error handling in push_batch method. # Test: Search for error handling patterns in the push_batch method. Expect: Specific error handling for batch operations. ast-grep --lang rust --pattern $'fn push_batch(&mut self, batch: Batch, metrics: &mut DedupMetrics) -> Result<Option<Batch>> { $$$ }' ./src/mito2/src/read/dedup.rsLength of output: 3726
465-465
: Ensure Correct Merging in LastNonNull.finishThe
finish
method inLastNonNull
should ensure that any remaining data in the buffer is processed correctly. This is a critical point to prevent data loss or corruption at the end of the deduplication process. Consider adding checks to ensure that the buffer's handling is consistent with the strategy's requirements.Verification successful
Buffer Handling in
finish
Method VerifiedThe
finish
method in theLastNonNull
strategy correctly processes the buffer by callingmerge_last_non_null
, ensuring that any remaining data is handled according to the deduplication strategy. The implementation aligns with the strategy requirements, preventing data loss or corruption.
src/mito2/src/read/dedup.rs
:finish
method andmerge_last_non_null
methodScripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify buffer handling in LastNonNull.finish. # Test: Search for buffer handling logic in the finish method. Expect: Consistent handling according to strategy requirements. rg --type rust $'finish' ./src/mito2/src/read/dedup.rsLength of output: 527
Script:
#!/bin/bash # Search for the LastNonNull struct and its finish method implementation in the relevant files. # Locate the implementation of the LastNonNull struct ast-grep --lang rust --pattern 'struct LastNonNull { $$$ }' # Locate the implementation of the finish method within LastNonNull ast-grep --lang rust --pattern 'impl LastNonNull { $$$ fn finish(&mut self, $_) -> $_ { $$$ } $$$ }'Length of output: 176
Script:
#!/bin/bash # Search for the finish method implementations across the entire codebase rg 'fn finish(&mut self, $_) -> $_ {' --type rust # Search for the merge_last_non_null method to understand its context rg 'fn merge_last_non_null' --type rustLength of output: 277
Script:
#!/bin/bash # Corrected search for the finish method implementations across the entire codebase rg 'fn finish\(&mut self, [^)]*\) -> [^)]* \{' --type rust # Print the merge_last_non_null method implementation for context rg -A 10 'fn merge_last_non_null' src/mito2/src/read/dedup.rsLength of output: 771
Script:
#!/bin/bash # Print the complete implementations of the finish methods in src/mito2/src/read/dedup.rs rg -A 20 'fn finish' src/mito2/src/read/dedup.rs # Print the complete implementation of the merge_last_non_null method in src/mito2/src/read/dedup.rs rg -A 20 'fn merge_last_non_null' src/mito2/src/read/dedup.rsLength of output: 3047
Line range hint
932-946
: Handling Null Values in LastNonNull.merge_nullThe test
test_last_non_null_merge_null
checks the handling of null values in batches. This is an important aspect of theLastNonNull
strategy. Ensure that the merging logic is correctly implemented and consider edge cases where all fields might be null.Verification successful
To verify the handling of null values in the
LastNonNull.merge_null
method, we need to locate and examine its implementation. Additionally, we should check for other tests that cover edge cases involving all-null fields.
To verify the handling of null values in the
LastNonNull.merge_null
method, we need to correct theast-grep
pattern and locate themerge_null
method implementation. Additionally, we should search for theLastNonNull
struct and its methods, and re-run the search for test cases involvingmerge_null
.
To verify the handling of null values in the
LastNonNull.merge_null
method, we need to refine the search pattern to locate themerge_null
method specifically and re-run the search for test cases involvingmerge_null
.
To verify the handling of null values in the
LastNonNull
strategy, we need to search for all methods within theLastNonNull
implementation to identify where the null handling logic might be located.
To verify the handling of null values in the
LastNonNull
strategy, we need to broaden our search to include other parts of thededup.rs
file or related modules where the null handling logic might reside.
To verify the handling of null values in the
LastNonNull
strategy, we need to locate and examine themerge_last_non_null
method implementation.
To verify the handling of null values in the
LastNonNull
strategy, we need to search for theLastFieldsBuilder
struct and its methods to locate themerge_last_non_null
method implementation.
The
merge_last_non_null
method in theLastFieldsBuilder
implementation is responsible for merging the last non-null fields, including handling null values. This method appears to correctly handle null values and edge cases.
- The method checks if
last_fields
is empty and handles the buffer accordingly.- It iterates over
builders
andlast_fields
to push non-null values.- It includes logic for cases where the buffer has only one row or needs to replace the last row.
- It resets the builder state after merging.
Based on this review, the null handling logic in the
merge_last_non_null
method appears to be correctly implemented.Verified handling of null values in
LastNonNull.merge_null
method.
- The
merge_last_non_null
method inLastFieldsBuilder
correctly handles null values and edge cases.Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify handling of all-null fields in batches. # Test: Search for handling of all-null fields in test cases. Expect: Specific handling for all-null scenarios. rg --type rust $'merge_null' ./src/mito2/src/read/dedup.rsLength of output: 104
Script:
#!/bin/bash # Locate the implementation of the `merge_null` method in the `LastNonNull` class. ast-grep --lang rust --pattern 'impl LastNonNull { $$$ merge_null($$) { $$$ } $$$ }' # Search for other test cases that might cover edge cases involving all-null fields. rg --type rust 'merge_null' ./testsLength of output: 596
Script:
#!/bin/bash # Correct the pattern to locate the implementation of the `merge_null` method in the `LastNonNull` class. ast-grep --lang rust --pattern 'impl LastNonNull { $$$ merge_null($$) { $$$ } }' # Search for the `LastNonNull` struct definition and its methods. ast-grep --lang rust --pattern 'struct LastNonNull { $$$ }' ast-grep --lang rust --pattern 'impl LastNonNull { $$$ }' # Re-run the search for other test cases that might cover edge cases involving all-null fields. rg --type rust 'merge_null' ./testsLength of output: 870
Script:
#!/bin/bash # Refine the pattern to locate the `merge_null` method implementation in the `LastNonNull` class. ast-grep --lang rust --pattern 'impl LastNonNull { fn merge_null($$) { $$$ } }' # Re-run the search for other test cases that might cover edge cases involving all-null fields. rg --type rust 'merge_null' ./testsLength of output: 117
Script:
#!/bin/bash # Search for all methods within the `LastNonNull` implementation. ast-grep --lang rust --pattern 'impl LastNonNull { $$$ }'Length of output: 689
Script:
#!/bin/bash # Broaden the search to include other parts of the `dedup.rs` file to locate the null handling logic. rg --type rust 'merge' ./src/mito2/src/read/dedup.rsLength of output: 1750
Script:
#!/bin/bash # Locate the implementation of the `merge_last_non_null` method. ast-grep --lang rust --pattern 'fn merge_last_non_null($$) { $$$ }'Length of output: 69
Script:
#!/bin/bash # Search for the `LastFieldsBuilder` struct and its methods. ast-grep --lang rust --pattern 'struct LastFieldsBuilder { $$$ }' ast-grep --lang rust --pattern 'impl LastFieldsBuilder { $$$ }'Length of output: 10649
src/mito2/src/memtable/time_series.rs (5)
63-76
: Ensure proper initialization ofmerge_mode
inTimeSeriesMemtableBuilder
.The addition of
merge_mode
toTimeSeriesMemtableBuilder
is correctly implemented. It is initialized in the constructor and passed to theTimeSeriesMemtable
during construction. This change ensures that the merge behavior is configurable at the memtable level.
88-88
: Validate the propagation ofmerge_mode
during memtable construction.The
merge_mode
is properly passed to theTimeSeriesMemtable
constructor fromTimeSeriesMemtableBuilder
. This ensures consistency in behavior based on the configured merge mode.
Line range hint
103-135
: Check conditionaldedup
logic based onmerge_mode
.The conditional logic to disable
dedup
whenmerge_mode
is set toLastNonNull
is a smart way to optimize performance by avoiding unnecessary deduplication. However, ensure this logic aligns with the expected behavior and does not inadvertently affect other functionalities.
272-277
: Ensure correct application ofLastNonNullIter
based onmerge_mode
.The conditional application of
LastNonNullIter
in theiter
method is correctly implemented. Whenmerge_mode
isLastNonNull
, the custom iterator is used, ensuring that the deduplication strategy aligns with the specified mode.
298-298
: Verify correct handling ofmerge_mode
inranges
method.The
merge_mode
is correctly passed to theTimeSeriesIterBuilder
, ensuring that the iteration behavior is consistent with the configured merge strategy. This is crucial for maintaining the integrity of data retrieval based on the merge mode settings.
@evenyag Don't forget to change the PR title and content. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good job. The final piece to support InfluxDB line protocol.
update_mode
region optionsmerge_mode
region options
@coderabbitai pause |
Actions performedReviews paused. |
make merge mode optional and use default while it is None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
* feat: add update_mode to region options * test: add test * feat: last not null iter * feat: time series last not null * feat: partition tree update mode * feat: partition tree * fix: last not null iter slice * test: add test for compaction * test: use second resolution * style: fix clippy * chore: merge two lines Co-authored-by: Jeremyhi <jiachun_feng@proton.me> * chore: address CR comments * refactor: UpdateMode -> MergeMode * refactor: LastNotNull -> LastNonNull * chore: return None earlier * feat: validate region options make merge mode optional and use default while it is None * test: fix tests --------- Co-authored-by: Jeremyhi <jiachun_feng@proton.me>
* feat: add update_mode to region options * test: add test * feat: last not null iter * feat: time series last not null * feat: partition tree update mode * feat: partition tree * fix: last not null iter slice * test: add test for compaction * test: use second resolution * style: fix clippy * chore: merge two lines Co-authored-by: Jeremyhi <jiachun_feng@proton.me> * chore: address CR comments * refactor: UpdateMode -> MergeMode * refactor: LastNotNull -> LastNonNull * chore: return None earlier * feat: validate region options make merge mode optional and use default while it is None * test: fix tests --------- Co-authored-by: Jeremyhi <jiachun_feng@proton.me>
I hereby agree to the terms of the GreptimeDB CLA.
Refer to a related PR or issue link (optional)
What's changed and what's your intention?
This PR implements a new region option
update_mode
to control how to update existing rows by key. There are two update modes available:LastRow
: overwrites all fields by the values in the latest rowLastNotNull
: only overwrites the null value by the latest value for each fieldMemtables have individual dedup implementations so they can't reuse the
DedupReader
. To avoid repeating the dedup logic for each memtable, this PR implements a new iteratorLastNotNullIter
as a wrapper for the memtable iter.LastNotNullIter
splits batches and invokes theLastNotNull
dedup strategy to dedup them. TheLastNotNullIter
only takes effect whenupdate_mode
isLastNotNull
.After this PR, I will expose the option to table options and add some sqlness tests.
Checklist
Summary by CodeRabbit
New Features
MergeMode
options for handling duplicate row merging in various components.LastNotNullIter
for deduplication based on theLastNotNull
strategy.Enhancements
MergeMode
options.TimeSeriesMemtable
,PartitionTreeConfig
, andScanInput
to includemerge_mode
configuration.Tests
MergeMode
options and append mode functionality.