Support Deletes In Upserts by atris · Pull Request #10702 · apache/pinot

atris · 2023-04-29T09:15:25Z

Please refer to: https://docs.google.com/document/d/17WUvE6hZwauj3_gM5cPURWSjdE2e4q9au70VeFYYx4E/edit?usp=sharing

cc @xiangfu0 @yupeng9 @kishoreg @Jackie-Jiang

codecov-commenter · 2023-04-29T09:51:21Z

Codecov Report

Merging #10702 (3340656) into master (45eea26) will increase coverage by 38.30%.
The diff coverage is 71.11%.

@@              Coverage Diff              @@
##             master   #10702       +/-   ##
=============================================
+ Coverage     31.97%   70.27%   +38.30%     
- Complexity      453     6439     +5986     
=============================================
  Files          2111     2112        +1     
  Lines        113959   114036       +77     
  Branches      17196    17228       +32     
=============================================
+ Hits          36433    80144    +43711     
+ Misses        74234    28283    -45951     
- Partials       3292     5609     +2317

Flag	Coverage Δ
integration1	`24.30% <0.00%> (-0.03%)`	⬇️
integration2	`24.15% <0.00%> (?)`
unittests1	`67.80% <71.11%> (?)`
unittests2	`13.81% <0.00%> (-0.03%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
...cal/upsert/BasePartitionUpsertMetadataManager.java	`50.00% <0.00%> (+50.00%)`	⬆️
...not/segment/local/upsert/PartialUpsertHandler.java	`77.77% <50.00%> (+77.77%)`	⬆️
...local/indexsegment/mutable/MutableSegmentImpl.java	`57.53% <75.00%> (+57.53%)`	⬆️
...t/ConcurrentMapPartitionUpsertMetadataManager.java	`80.14% <78.12%> (+80.14%)`	⬆️

... and 1342 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

Jackie-Jiang

This won't work. We need to keep 2 bitmaps, one for valid docs and one for queryable docs. If we only keep valid docs, and count delete as removing it from the valid doc, the next event won't work as expected because we lost track of the delete event.

This is already a design (#10452) that covers this, so let's not duplicate the work.

@navina will post the implementation soon

navina · 2023-04-30T08:18:34Z

This won't work. We need to keep 2 bitmaps, one for valid docs and one for queryable docs. If we only keep valid docs, and count delete as removing it from the valid doc, the next event won't work as expected because we lost track of the delete event.

This is precisely what we realized during our previous attempt (#10035) . Your solution isn't any different from what was attempted before.

@atris : You had reached out to me via DM (on slack) to collaborate and agreed to review the existing design. Yet, you went ahead and did your own work. This is duplicated effort and not setting the right standard for committers of the Apache Pinot project.

atris · 2023-04-30T08:27:36Z

This won't work. We need to keep 2 bitmaps, one for valid docs and one for queryable docs. If we only keep valid docs, and count delete as removing it from the valid doc, the next event won't work as expected because we lost track of the delete event.

This is already a design (#10452) that covers this, so let's not duplicate the work.

@navina will post the implementation soon

@Jackie-Jiang and myself discussed offline, and we have agreed to iterate on this PR -- and look on solving the snapshot of validDocID map issues.

The main issue that I wish to avoid is the separate map for queryable docIDs. The three issues we face today with the tombstone approach are:

Losing the deleted docID when the consuming segment seals.
The delete tombstone record not loading when we snapshot validDocIDs.
Losing the deleted timestamp and hence not able to handle out of order events.

This PR solves 3. by maintaining the full record of deletion in the segment and the primary key index -- as is demonstrated in the added unit tests. For 1, it does not matter as long as we are able to rebuild the primary key index with the deleted primary key records present.

The main challenge is 2, and we need to think about it a bit more. Will keep the group posted.

atris · 2023-04-30T08:56:35Z

This won't work. We need to keep 2 bitmaps, one for valid docs and one for queryable docs. If we only keep valid docs, and count delete as removing it from the valid doc, the next event won't work as expected because we lost track of the delete event.

This is precisely what we realized during our previous attempt (#10035) . Your solution isn't any different from what was attempted before.

There are many subtle changes from the last PR -- and this actually works for most cases except the snapshot load, which is easy to fix using the queryable map. The last PR was unable to handle out of order records and had no concept of denying partial updates on deleted records but allowing new records with the same primary key to be updated.

To answer your specific question, we are not losing the deleted event in this PR. That is clearly demonstrated in the unit tests.

For further details, please compare the two PRs.

@atris : You had reached out to me via DM (on slack) to collaborate and agreed to review the existing design. Yet, you went ahead and did your own work. This is duplicated effort and not setting the right standard for committers of the Apache Pinot project.

Thank you for highlighting this. I do not wish to go into details of our 1:1 conversation here -- but my understanding was that:

You were planning on working this post May 10th.
You mentioned that it would be hard for us to collaborate since we are in different timezones.
I had mentioned that I have a patch I have been playing with -- and have a hard timeline.

In all honesty, I feel it is against the spirit of OSS if we "block" issues that we have not yet written code for. A good example is the issue on adding support for GROUP BY for filter aggregations -- where the issue was assigned to me, and I had a half baked PR that I did not get around to finishing. Since Evan posted a PR, I was happy to let my patch take a step back.

In this case, I only finished my PR since you mentioned that you will work on it post May 10th and would find it hard to collaborate with me -- which led me to think that our internal timelines might not be matched.

If you have any further comments, I would be happy to talk 1:1. Also, please feel free to review the PR and leave any feedback -- I would be happy to address it.

Jackie-Jiang · 2023-04-30T22:04:25Z

@Jackie-Jiang and myself discussed offline, and we have agreed to iterate on this PR -- and look on solving the snapshot of validDocID map issues.

The main issue that I wish to avoid is the separate map for queryable docIDs.

This is not what I said.. I said not to directly close this PR and see if it is possible to solve the problems, and if @navina cannot get an implementation sticking with your timeline we can go with this one.
Since Navina already gets an implementation #10703 and I feel that can solve all the existing problem with the overhead of keeping 2 bitmaps per segment, I'm leaning towards that solution as I don't see a good way to solve the snapshot issue with one single bitmap. The main problem with one single bitmap is that it breaks the contract that all the keys in the upsert metadata should point to a valid doc in the segment, and if we break that contract a lot of things could go wrong.

atris · 2023-05-01T02:54:57Z

This is not what I said.. I said not to directly close this PR and see if it is possible to solve the problems, and if @navina cannot get an implementation sticking with your timeline we can go with this one.

Sure, I should have been more verbose, my bad.

Since Navina already gets an implementation #10703 and I feel that can solve all the existing problem with the overhead of keeping 2 bitmaps per segment, I'm leaning towards that solution as I don't see a good way to solve the snapshot issue with one single bitmap. The main problem with one single bitmap is that it breaks the contract that all the keys in the upsert metadata should point to a valid doc in the segment, and if we break that contract a lot of things could go wrong.

Could you please elaborate a bit more on where things could break if we had keys in upsert metadata but not in valid doc IDs?

My understanding is that the two places where things will break are:

Loading a segment because the deleted doc IDs will be ignored during segment reload and hence will no longer be queryable.
Deleting a segment because the snapshot will ignore the deleted docIDs.

I do not see anything breaking at the runtime because the access pattern defines that the upsert metadata dictates which keys to be looked up in valid doc IDs -- and since the upsert metadata has the superset of docIDs, it should not lose any data. I think this is fortified by the fact that this PR makes the change and there are no tests breaking.

Please provide your inputs

navina · 2023-05-01T06:24:06Z

Thank you for highlighting this. I do not wish to go into details of our 1:1 conversation here -- but my understanding was that:

You were planning on working this post May 10th.

You mentioned that it would be hard for us to collaborate since we are in different timezones.

I had mentioned that I have a patch I have been playing with -- and have a hard timeline.

While all those are true, I told you that my design was open for feedback and you agreed to review it.
Obviously, I also have a local patch that I was working based on my design. Per Apache Pinot's guidelines (https://docs.pinot.apache.org/developers/developers-and-contributors/contribution-guidelines), I was taking ownership of the GitHub issue and sharing design for feedback.

In this case, I only finished my PR since you mentioned that you will work on it post May 10th and would find it hard to collaborate with me -- which led me to think that our internal timelines might not be matched.

You didn't communicate your timeline in our conversation nor did you explain the use-case when I asked.

I am not aware of how you communicated the handover of your group by issue to another contributor and I don't think it is relevant to the tasks I own.

I have put in effort in this work and have shared my design ( https://docs.google.com/document/d/19s1AHCRjmqeVa0z_djYBTAt_meg6qWcUxwRO3y-G9O4/edit?usp=drivesdk ) and PR ( #10703 ) for review.
Thanks!

mayankshriv · 2023-05-01T13:55:51Z

Hi Team, reading with a neutral perspective, it seems like there were some miscommunications that led to duplicate effort on both sides. But I am 100% sure everyone here here:

Has the best interest of the Apache Pinot project before anything.
Would benefit from this feature being completed as soon as possible.

With above in mind, and the contribution guidelines, I do see that this design doc being reviewed.

My recommendation would be for all of us to collaborate on that design and any associated PRs to take this feature to completion as soon as possible.

Thank you all!
#One Team.

yupeng9 · 2023-05-01T15:32:47Z

Hi Team, reading with a neutral perspective, it seems like there were some miscommunications that led to duplicate effort on both sides. But I am 100% sure everyone here here:

Has the best interest of the Apache Pinot project before anything.

Would benefit from this feature being completed as soon as possible.

With above in mind, and the contribution guidelines, I do see that this design doc being reviewed.

My recommendation would be for all of us to collaborate on that design and any associated PRs to take this feature to completion as soon as possible.

Thank you all! #One Team.

+1 to #One Team

Also I hope we can agree on the direction. One key difference I can see between these two approaches is to reuse validDocId or not for deletion. I also left this comment on @navina's design. This is a key as it has implications on correctness/efficiency/complexity etc.

Could you provide a detailed example on reusing validDocId is insufficient to help this discussion. Also, I'm curious what's the behavior if there's an update to record after deletion, will we ignore it or revive the record?

navina · 2023-05-01T16:00:46Z

Also I hope we can agree on the direction. One key difference I can see between these two approaches is to reuse validDocId or not for deletion. I also left this comment on @navina's design. This is a key as it has implications on correctness/efficiency/complexity etc.
Could you provide a detailed example on reusing validDocId is insufficient to help this discussion.

I have added the reasoning in the design doc - https://docs.google.com/document/d/19s1AHCRjmqeVa0z_djYBTAt_meg6qWcUxwRO3y-G9O4/edit#heading=h.muy7y9j64ptn . can we continue the conversation in the design doc?

Also, I'm curious what's the behavior if there's an update to record after deletion, will we ignore it or revive the record?

That's a good question. Going by the current upsert guarantees, if the event containing the update to a record has newer comparison values, it will be considered as a new record. I think it should be fine to revive it. I am not seeing any issues on doing that. Or we could make that configurable. Will add a section in the doc to further discuss this if you have any concerns.

atris · 2023-05-01T16:02:50Z

Also I hope we can agree on the direction. One key difference I can see between these two approaches is to reuse validDocId or not for deletion. I also left this comment on @navina's design. This is a key as it has implications on correctness/efficiency/complexity etc.

Could you provide a detailed example on reusing validDocId is insufficient to help this discussion. Also, I'm curious what's the behavior if there's an update to record after deletion, will we ignore it or revive the record?

Please refer to the attached design document to this PR

atris · 2023-05-01T18:50:13Z

Closing this PR -- I have highlighted my concerns with the proposed design and have proposed an alternate in the attached design document. I will help land the other PR now

Atri Sharma added 4 commits April 28, 2023 14:45

Support deletion in full upsert

6a673e8

Second version with partial upsert

11edd5e

Change the column name

269ab38

More stuff

3340656

Jackie-Jiang requested changes Apr 30, 2023

View reviewed changes

atris closed this May 1, 2023

Conversation

atris commented Apr 29, 2023

Uh oh!

codecov-commenter commented Apr 29, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Jackie-Jiang left a comment

Choose a reason for hiding this comment

Uh oh!

navina commented Apr 30, 2023

Uh oh!

atris commented Apr 30, 2023

Uh oh!

atris commented Apr 30, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Jackie-Jiang commented Apr 30, 2023

Uh oh!

atris commented May 1, 2023

Uh oh!

navina commented May 1, 2023

Uh oh!

mayankshriv commented May 1, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yupeng9 commented May 1, 2023

Uh oh!

navina commented May 1, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

atris commented May 1, 2023

Uh oh!

atris commented May 1, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

codecov-commenter commented Apr 29, 2023 •

edited

Loading

atris commented Apr 30, 2023 •

edited

Loading

mayankshriv commented May 1, 2023 •

edited

Loading

navina commented May 1, 2023 •

edited

Loading