Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Keeps nullness attributes of merged in comparison column values #10704

Merged

Conversation

egalpin
Copy link
Member

@egalpin egalpin commented May 1, 2023

Keeps nullness attributes of merged comparison column values to avoid IndexOutOfBounds exception on segment read.

@codecov-commenter
Copy link

codecov-commenter commented May 1, 2023

Codecov Report

Merging #10704 (284ae68) into master (ab79548) will increase coverage by 0.03%.
The diff coverage is 80.95%.

@@             Coverage Diff              @@
##             master   #10704      +/-   ##
============================================
+ Coverage     70.25%   70.28%   +0.03%     
+ Complexity     6429     5681     -748     
============================================
  Files          2112     2164      +52     
  Lines        113996   116347    +2351     
  Branches      17219    17597     +378     
============================================
+ Hits          80084    81776    +1692     
- Misses        28312    28857     +545     
- Partials       5600     5714     +114     
Flag Coverage Δ
integration1 24.07% <0.00%> (-0.24%) ⬇️
integration2 23.77% <0.00%> (-0.24%) ⬇️
unittests1 67.77% <80.95%> (-0.04%) ⬇️
unittests2 13.67% <0.00%> (-0.19%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...apache/pinot/segment/local/upsert/UpsertUtils.java 23.52% <0.00%> (-0.72%) ⬇️
.../pinot/segment/local/upsert/ComparisonColumns.java 96.77% <100.00%> (+3.02%) ⬆️
...not/segment/local/upsert/PartialUpsertHandler.java 100.00% <100.00%> (ø)

... and 286 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

Copy link
Contributor

@Jackie-Jiang Jackie-Jiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel this is incorrect.
When loading sealed segment, ComparisonColumns might have multiple values set, in which case we need to compare multiple values to determine which value is newer.
So the fix should be: when comparisonIndex is -1, loop over all values and check if it has any value larger than the value in another ComparisonColumns. There is no need to modify the value within the current ComparisonColumns because it should already have values set.

@@ -177,26 +179,35 @@ public void close()

public static class MultiComparisonColumnReader implements UpsertUtils.ComparisonColumnReader {
private final PinotSegmentColumnReader[] _comparisonColumnReaders;
private final NullValueVectorReader[] _comparisonColumnNullReaders;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not needed. You may use PinotSegmentColumnReader.isNull() instead

@egalpin
Copy link
Member Author

egalpin commented May 2, 2023

So the fix should be: when comparisonIndex is -1, loop over all values and check if it has any value larger than the value in another ComparisonColumns. There is no need to modify the value within the current ComparisonColumns because it should already have values set.

One case I'm struggling to figure out how to support is when altering from one set of comparison columns to a new set. In such a case, I believe there would be a need to modify the value within the current ComparisonColumns (ex. adding 1 new column in addition to a set of existing comparison columns, or migrating from a single comparison column to a set of columns).

@Jackie-Jiang
Copy link
Contributor

One case I'm struggling to figure out how to support is when altering from one set of comparison columns to a new set. In such a case, I believe there would be a need to modify the value within the current ComparisonColumns (ex. adding 1 new column in addition to a set of existing comparison columns, or migrating from a single comparison column to a set of columns).

Are you referring to modifying the comparison columns after table is created? I don't think that is allowed even with single comparison column with partial upsert because we should not change the history of the records.
If the new comparison column is newly added, it should work because all of them should be backfilled with default value, and the new consuming record will be handled properly

@egalpin
Copy link
Member Author

egalpin commented May 3, 2023

Ya that makes sense, otherwise it would literally be changing the semantics of which docs should have been persisted in the first place. I do want to allow for support of adding a brand new column to the schema + comparisonColumns list though, which I agree should work.

When reloading sealed segments, is processing done in global order of docId? Or are segments loaded in parallel? If they're loaded strictly in global order of docId, I feel we might not need any comparison at all; effectively, any doc with a higher docId was persisted after a doc with a lower docId and therefore whatever the upsert comparison semantics were at the time when the doc was persisted must have been truthy (otherwise the doc would have been rejected). Therefore we could ensure to maintain whatever the prior upsert semantics were, then use the new semantics moving forward, by forcing compareTo to return 1 for every doc read from a sealed segment.

@egalpin egalpin marked this pull request as ready for review May 3, 2023 22:45
@Jackie-Jiang
Copy link
Contributor

Sealed segments are loaded in parallel without any global order (we don't really have global ordered doc ids)

Copy link
Contributor

@Jackie-Jiang Jackie-Jiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be good if we can add an unit test to verify the behavior of ComparisonColumns

@egalpin
Copy link
Member Author

egalpin commented May 26, 2023

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants