Skip to content

Support Deletes In Upserts#10702

Closed
atris wants to merge 4 commits intoapache:masterfrom
atris:upsert_delete
Closed

Support Deletes In Upserts#10702
atris wants to merge 4 commits intoapache:masterfrom
atris:upsert_delete

Conversation

@atris
Copy link
Contributor

@atris atris commented Apr 29, 2023

@codecov-commenter
Copy link

codecov-commenter commented Apr 29, 2023

Codecov Report

Merging #10702 (3340656) into master (45eea26) will increase coverage by 38.30%.
The diff coverage is 71.11%.

@@              Coverage Diff              @@
##             master   #10702       +/-   ##
=============================================
+ Coverage     31.97%   70.27%   +38.30%     
- Complexity      453     6439     +5986     
=============================================
  Files          2111     2112        +1     
  Lines        113959   114036       +77     
  Branches      17196    17228       +32     
=============================================
+ Hits          36433    80144    +43711     
+ Misses        74234    28283    -45951     
- Partials       3292     5609     +2317     
Flag Coverage Δ
integration1 24.30% <0.00%> (-0.03%) ⬇️
integration2 24.15% <0.00%> (?)
unittests1 67.80% <71.11%> (?)
unittests2 13.81% <0.00%> (-0.03%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...cal/upsert/BasePartitionUpsertMetadataManager.java 50.00% <0.00%> (+50.00%) ⬆️
...not/segment/local/upsert/PartialUpsertHandler.java 77.77% <50.00%> (+77.77%) ⬆️
...local/indexsegment/mutable/MutableSegmentImpl.java 57.53% <75.00%> (+57.53%) ⬆️
...t/ConcurrentMapPartitionUpsertMetadataManager.java 80.14% <78.12%> (+80.14%) ⬆️

... and 1342 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

Copy link
Contributor

@Jackie-Jiang Jackie-Jiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This won't work. We need to keep 2 bitmaps, one for valid docs and one for queryable docs. If we only keep valid docs, and count delete as removing it from the valid doc, the next event won't work as expected because we lost track of the delete event.

This is already a design (#10452) that covers this, so let's not duplicate the work.

@navina will post the implementation soon

@navina
Copy link
Contributor

navina commented Apr 30, 2023

This won't work. We need to keep 2 bitmaps, one for valid docs and one for queryable docs. If we only keep valid docs, and count delete as removing it from the valid doc, the next event won't work as expected because we lost track of the delete event.

This is precisely what we realized during our previous attempt (#10035) . Your solution isn't any different from what was attempted before.

@atris : You had reached out to me via DM (on slack) to collaborate and agreed to review the existing design. Yet, you went ahead and did your own work. This is duplicated effort and not setting the right standard for committers of the Apache Pinot project.

@atris
Copy link
Contributor Author

atris commented Apr 30, 2023

This won't work. We need to keep 2 bitmaps, one for valid docs and one for queryable docs. If we only keep valid docs, and count delete as removing it from the valid doc, the next event won't work as expected because we lost track of the delete event.

This is already a design (#10452) that covers this, so let's not duplicate the work.

@navina will post the implementation soon

@Jackie-Jiang and myself discussed offline, and we have agreed to iterate on this PR -- and look on solving the snapshot of validDocID map issues.

The main issue that I wish to avoid is the separate map for queryable docIDs. The three issues we face today with the tombstone approach are:

  1. Losing the deleted docID when the consuming segment seals.
  2. The delete tombstone record not loading when we snapshot validDocIDs.
  3. Losing the deleted timestamp and hence not able to handle out of order events.

This PR solves 3. by maintaining the full record of deletion in the segment and the primary key index -- as is demonstrated in the added unit tests. For 1, it does not matter as long as we are able to rebuild the primary key index with the deleted primary key records present.

The main challenge is 2, and we need to think about it a bit more. Will keep the group posted.

@atris
Copy link
Contributor Author

atris commented Apr 30, 2023

This won't work. We need to keep 2 bitmaps, one for valid docs and one for queryable docs. If we only keep valid docs, and count delete as removing it from the valid doc, the next event won't work as expected because we lost track of the delete event.

This is precisely what we realized during our previous attempt (#10035) . Your solution isn't any different from what was attempted before.

There are many subtle changes from the last PR -- and this actually works for most cases except the snapshot load, which is easy to fix using the queryable map. The last PR was unable to handle out of order records and had no concept of denying partial updates on deleted records but allowing new records with the same primary key to be updated.

To answer your specific question, we are not losing the deleted event in this PR. That is clearly demonstrated in the unit tests.

For further details, please compare the two PRs.

@atris : You had reached out to me via DM (on slack) to collaborate and agreed to review the existing design. Yet, you went ahead and did your own work. This is duplicated effort and not setting the right standard for committers of the Apache Pinot project.

Thank you for highlighting this. I do not wish to go into details of our 1:1 conversation here -- but my understanding was that:

  1. You were planning on working this post May 10th.
  2. You mentioned that it would be hard for us to collaborate since we are in different timezones.
  3. I had mentioned that I have a patch I have been playing with -- and have a hard timeline.

In all honesty, I feel it is against the spirit of OSS if we "block" issues that we have not yet written code for. A good example is the issue on adding support for GROUP BY for filter aggregations -- where the issue was assigned to me, and I had a half baked PR that I did not get around to finishing. Since Evan posted a PR, I was happy to let my patch take a step back.

In this case, I only finished my PR since you mentioned that you will work on it post May 10th and would find it hard to collaborate with me -- which led me to think that our internal timelines might not be matched.

If you have any further comments, I would be happy to talk 1:1. Also, please feel free to review the PR and leave any feedback -- I would be happy to address it.

@Jackie-Jiang
Copy link
Contributor

@Jackie-Jiang and myself discussed offline, and we have agreed to iterate on this PR -- and look on solving the snapshot of validDocID map issues.

The main issue that I wish to avoid is the separate map for queryable docIDs.

This is not what I said.. I said not to directly close this PR and see if it is possible to solve the problems, and if @navina cannot get an implementation sticking with your timeline we can go with this one.
Since Navina already gets an implementation #10703 and I feel that can solve all the existing problem with the overhead of keeping 2 bitmaps per segment, I'm leaning towards that solution as I don't see a good way to solve the snapshot issue with one single bitmap. The main problem with one single bitmap is that it breaks the contract that all the keys in the upsert metadata should point to a valid doc in the segment, and if we break that contract a lot of things could go wrong.

@atris
Copy link
Contributor Author

atris commented May 1, 2023

This is not what I said.. I said not to directly close this PR and see if it is possible to solve the problems, and if @navina cannot get an implementation sticking with your timeline we can go with this one.

Sure, I should have been more verbose, my bad.

Since Navina already gets an implementation #10703 and I feel that can solve all the existing problem with the overhead of keeping 2 bitmaps per segment, I'm leaning towards that solution as I don't see a good way to solve the snapshot issue with one single bitmap. The main problem with one single bitmap is that it breaks the contract that all the keys in the upsert metadata should point to a valid doc in the segment, and if we break that contract a lot of things could go wrong.

Could you please elaborate a bit more on where things could break if we had keys in upsert metadata but not in valid doc IDs?

My understanding is that the two places where things will break are:

  1. Loading a segment because the deleted doc IDs will be ignored during segment reload and hence will no longer be queryable.
  2. Deleting a segment because the snapshot will ignore the deleted docIDs.

I do not see anything breaking at the runtime because the access pattern defines that the upsert metadata dictates which keys to be looked up in valid doc IDs -- and since the upsert metadata has the superset of docIDs, it should not lose any data. I think this is fortified by the fact that this PR makes the change and there are no tests breaking.

Please provide your inputs

@navina
Copy link
Contributor

navina commented May 1, 2023

Thank you for highlighting this. I do not wish to go into details of our 1:1 conversation here -- but my understanding was that:

  1. You were planning on working this post May 10th.
  2. You mentioned that it would be hard for us to collaborate since we are in different timezones.
  3. I had mentioned that I have a patch I have been playing with -- and have a hard timeline.

While all those are true, I told you that my design was open for feedback and you agreed to review it.
Obviously, I also have a local patch that I was working based on my design. Per Apache Pinot's guidelines (https://docs.pinot.apache.org/developers/developers-and-contributors/contribution-guidelines), I was taking ownership of the GitHub issue and sharing design for feedback.

In this case, I only finished my PR since you mentioned that you will work on it post May 10th and would find it hard to collaborate with me -- which led me to think that our internal timelines might not be matched.

You didn't communicate your timeline in our conversation nor did you explain the use-case when I asked.

I am not aware of how you communicated the handover of your group by issue to another contributor and I don't think it is relevant to the tasks I own.

I have put in effort in this work and have shared my design ( https://docs.google.com/document/d/19s1AHCRjmqeVa0z_djYBTAt_meg6qWcUxwRO3y-G9O4/edit?usp=drivesdk ) and PR ( #10703 ) for review.
Thanks!

@mayankshriv
Copy link
Contributor

mayankshriv commented May 1, 2023

Hi Team, reading with a neutral perspective, it seems like there were some miscommunications that led to duplicate effort on both sides. But I am 100% sure everyone here here:

  • Has the best interest of the Apache Pinot project before anything.
  • Would benefit from this feature being completed as soon as possible.

With above in mind, and the contribution guidelines, I do see that this design doc being reviewed.

My recommendation would be for all of us to collaborate on that design and any associated PRs to take this feature to completion as soon as possible.

Thank you all!
#One Team.

@yupeng9
Copy link
Contributor

yupeng9 commented May 1, 2023

Hi Team, reading with a neutral perspective, it seems like there were some miscommunications that led to duplicate effort on both sides. But I am 100% sure everyone here here:

  • Has the best interest of the Apache Pinot project before anything.
  • Would benefit from this feature being completed as soon as possible.

With above in mind, and the contribution guidelines, I do see that this design doc being reviewed.

My recommendation would be for all of us to collaborate on that design and any associated PRs to take this feature to completion as soon as possible.

Thank you all! #One Team.

+1 to #One Team

Also I hope we can agree on the direction. One key difference I can see between these two approaches is to reuse validDocId or not for deletion. I also left this comment on @navina's design. This is a key as it has implications on correctness/efficiency/complexity etc.

Could you provide a detailed example on reusing validDocId is insufficient to help this discussion. Also, I'm curious what's the behavior if there's an update to record after deletion, will we ignore it or revive the record?

@navina
Copy link
Contributor

navina commented May 1, 2023

Also I hope we can agree on the direction. One key difference I can see between these two approaches is to reuse validDocId or not for deletion. I also left this comment on @navina's design. This is a key as it has implications on correctness/efficiency/complexity etc.
Could you provide a detailed example on reusing validDocId is insufficient to help this discussion.

I have added the reasoning in the design doc - https://docs.google.com/document/d/19s1AHCRjmqeVa0z_djYBTAt_meg6qWcUxwRO3y-G9O4/edit#heading=h.muy7y9j64ptn . can we continue the conversation in the design doc?

Also, I'm curious what's the behavior if there's an update to record after deletion, will we ignore it or revive the record?

That's a good question. Going by the current upsert guarantees, if the event containing the update to a record has newer comparison values, it will be considered as a new record. I think it should be fine to revive it. I am not seeing any issues on doing that. Or we could make that configurable. Will add a section in the doc to further discuss this if you have any concerns.

@atris
Copy link
Contributor Author

atris commented May 1, 2023

Also I hope we can agree on the direction. One key difference I can see between these two approaches is to reuse validDocId or not for deletion. I also left this comment on @navina's design. This is a key as it has implications on correctness/efficiency/complexity etc.

Could you provide a detailed example on reusing validDocId is insufficient to help this discussion. Also, I'm curious what's the behavior if there's an update to record after deletion, will we ignore it or revive the record?

Please refer to the attached design document to this PR

@atris
Copy link
Contributor Author

atris commented May 1, 2023

Closing this PR -- I have highlighted my concerns with the proposed design and have proposed an alternate in the attached design document. I will help land the other PR now

@atris atris closed this May 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants