Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Get deletes working in the realtime branch [LUCENE-2655] #3729

Closed
asfimport opened this issue Sep 20, 2010 · 28 comments
Closed

Get deletes working in the realtime branch [LUCENE-2655] #3729

asfimport opened this issue Sep 20, 2010 · 28 comments

Comments

@asfimport
Copy link

asfimport commented Sep 20, 2010

Deletes don't work anymore, a patch here will fix this.


Migrated from LUCENE-2655 by Jason Rutherglen, resolved Dec 09 2010
Attachments: LUCENE-2655.patch
Linked issues:

@asfimport
Copy link
Author

asfimport commented Sep 20, 2010

@asfimport
Copy link
Author

Jason Rutherglen (migrated from JIRA)

Maybe we could reuse (factor out) TermsHashPerField's
custom hash here, for the buffered Terms? It efficiently maps a
BytesRef --> int.

I'm trying to get a feel for what we kind of deletes we want
working in the flush by DWPT merge viz-a-viz the realtime branch
(ie, the release where we have the new realtime search/indexing
functionality).

Factoring out BytesHash and storing the terms in a byte block
pool will allow replacing the current hash map of terms and
likely conserve more RAM. Will we need to replace doc id upto
and instead use sequence ids?

We can additionally, for now, implement flushing deletes on
merges, like today, for the flush by DWPT merge to trunk. Then
implement live, aka foreground deletes, for the realtime search
branch merge to trunk.

@asfimport
Copy link
Author

Jason Rutherglen (migrated from JIRA)

A couple more things...

The BytesHash or some other aptly named class can implement the
quick sort of terms, again from TermsHashField, which will
replace the sorted terms map used in the RT branch for deletes.
The RT branch isn't yet calculating the RAM usage of deletes. By
using the byte block pool, calculating the RAM usage will be
trivial (as the the BBP automatically records the num bytes used).

The RT branch has an implementation of delete using the min/max
sequence ids for a given segment. What else is needed?

@asfimport
Copy link
Author

Jason Rutherglen (migrated from JIRA)

Here's a basic patch with a test case showing delete by term not working. It's simply not finding the docs in the reader in apply deletes, I'm guessing it's something basic that's wrong.

@asfimport
Copy link
Author

Jason Rutherglen (migrated from JIRA)

Maybe I was using the new flex API wrongly, when I pass in the deleted docs to MultiFields.getTermDocsEnum, the test case passes.

@asfimport
Copy link
Author

Jason Rutherglen (migrated from JIRA)

I'm only seeing this test from TestIndexWriterDelete fail:

[junit] junit.framework.AssertionFailedError: expected:<3> but was:<0>
[junit] at org.apache.lucene.index.TestIndexWriterDelete.testMaxBufferedDeletes(TestIndexWriterDelete.java:118)
[junit] at org.apache.lucene.util.LuceneTestCase.runBare(LuceneTestCase.java:328)

@asfimport
Copy link
Author

Jason Rutherglen (migrated from JIRA)

There's this one as well, which I'll focus on, though it's an error in IR.isCurrent, it doesn't immediately appear to be related to deletes.

[junit] Testsuite: org.apache.lucene.index.TestIndexWriterReader
[junit] Testcase: testUpdateDocument(org.apache.lucene.index.TestIndexWriterReader):	FAILED
[junit] null
[junit] junit.framework.AssertionFailedError: null
[junit] 	at org.apache.lucene.index.TestIndexWriterReader.testUpdateDocument(TestIndexWriterReader.java:104)
[junit] 	at org.apache.lucene.util.LuceneTestCase.runBare(LuceneTestCase.java:328)

@asfimport
Copy link
Author

asfimport commented Oct 1, 2010

Michael McCandless (@mikemccand) (migrated from JIRA)

Stepping back here...

There are two different goals mixed into the RT branch effort, I
think:

  1. Make thread states fully independent so flushing is no longer sync'd (plus it's a nice simplification, eg no more *PerThread in the indexing chain)
  2. Enable direct searching on the thread states RAM buffer, for awesome
    NRT performance

It seems to me like the first one is not so far off? Ie we nearly
have it already (#3400)... it's just that we don't have the
deletes working?

Whereas the 2nd one is a much bigger change, and still iterating under
#3549.

Is it possible to decouple #1 from #2? Ie, bring it to a committable
state and land it on trunk and let it bake some?

Eg, on deletes, what if we simply have each thread state buffer its own
delete term -> thread's docID, privately? We know this approach
will "work" (it does today), right? It's just wasteful of RAM (though,
cutover to BytesRefHash should help alot here), and, makes
deletes somewhat slower since you must now enroll the del term
into each thread state....

It wouldn't actually be that wasteful of RAM, since the BytesRef instance
would be shared across all the maps. Also, if we wanted (separately)
we could make a more efficient buffer when the app deletes many terms
at once, or, many calls to delete-by-single-term with no adds in between
(much like how Amazon optimizes N one-click purchases in a row...).

I really want to re-run my 6-thread indexing test on beast and see the
indexing throughput double!!

@asfimport
Copy link
Author

Jason Rutherglen (migrated from JIRA)

Are you saying we should simply make deletes work (as is, no BytesRefHash conversion) then cleanup the RT branch as a merge to trunk of the DWPT changes? I was thinking along those lines. I can spend time making the rest of the unit tests work on the existing RT revision, though should this should probably happen in conjunction with a merge from trunk.

Or simply make the tests pass, and merge RT -> trunk afterwards?

Also, from what I've seen, deletes seem to work, I'm not sure what exactly Michael is referring to. I'll run the full 'suite' of unit tests, and just make each work?

@asfimport
Copy link
Author

Michael McCandless (@mikemccand) (migrated from JIRA)

Are you saying we should simply make deletes work (as is, no BytesRefHash conversion) then cleanup the RT branch as a merge to trunk of the DWPT changes?

Yes! Just keep using the Map we now use, but now it must be per-DWPT. And when IW.deleteDocs(Term/Query) is called, we must go to each DWPT, grab its current docID, and enroll Term/Query -> docID into that DWPT's pending deletes map.

Also, from what I've seen, deletes seem to work, I'm not sure what exactly Michael is referring to.

But this is using the long sequenceID right? (adds 8 bytes per buffered docID). I think we wanted to explore ways to reduce that? Or, if we can make it modal (you only spend these 8 bytes if IW is open in NRT mode), then that's OK?

I was hoping to cleanly separate the DWPT cutover (solves the nasty flush bottleneck) from the NRT improvements.

@asfimport
Copy link
Author

asfimport commented Oct 1, 2010

Jason Rutherglen (migrated from JIRA)

when IW.deleteDocs(Term/Query) is called, we must go to each DWPT, grab its current docID, and enroll Term/Query -> docID into that DWPT's pending deletes map.

Ok, that's the change you're referring to. In the current RT revision, the deletes are held in one map in DW, guess we need to change that. However if we do, why do we need to keep the seq id or docid as the value in the map? When the delete arrives into the DWPT, we know that any buffered docs with that term/query need to be deleted on flush? (ie, lets not worry about the RT search use case, yet). ie2, we can simply add the terms/queries to a set, and apply them on flush, ala #3753?

NRT improvements

We're referring to #2590 as NRT and #3388 as 'RT'. I'm guessing you mean RT?

@asfimport
Copy link
Author

asfimport commented Oct 1, 2010

Michael McCandless (@mikemccand) (migrated from JIRA)

I'm guessing you mean RT?

Duh sorry yes... I'll try to use the right name :)

In the current RT revision, the deletes are held in one map in DW, guess we need to change that.

Right, one map that maps the del term to the long sequence ID right? I'm thinking we revert just this part, and go back to how trunk how buffers deletes (maps to docID), but per-DWPT.

However if we do, why do we need to keep the seq id or docid as the value in the map? When the delete arrives into the DWPT, we know that any buffered docs with that term/query need to be deleted on flush? (ie, lets not worry about the RT search use case, yet). ie2, we can simply add the terms/queries to a set, and apply them on flush, ala #3753?

For the interleaved case. Ie, #3753 is optional – we still must handle the interleaved case correctly (and, I think, by default). But if an app uses the opto in #3753 then we only need a single Set.

Basically, the DWPT change, alone, is hugely valuable and (I think) decouple-able from the trickier/riskier/RAM-consuminger RT changes.

@asfimport
Copy link
Author

Jason Rutherglen (migrated from JIRA)

go back to how trunk how buffers deletes (maps to docID), but per-DWPT.

I don't think we need seq ids for the flush-by-DWPT merge to trunk. I was getting confused about the docid being a start from, rather than a stop at. I'll implement a map of query/term -> docid-upto per DWPT.

Basically, the DWPT change, alone, is hugely valuable and (I think) decouple-able from the trickier/riskier/RAM-consuminger RT changes.

Yes indeed!

I'll try to use the right name

NRT -> RT. The next one will be need to be either R or T, shall we decide now or later?

@asfimport
Copy link
Author

Jason Rutherglen (migrated from JIRA)

We've implied an additional change to the way deletes are flushed in that today, they're flushed in applyDeletes when segments are merged, however with flush-DWPT we're applying deletes after flushing the DWPT segment.

Also we'll have a globalesque buffered deletes presumably located in IW that buffers deletes for the existing segments, and these should [as today] be applied only when segments are merged or getreader is called?

@asfimport
Copy link
Author

Michael McCandless (@mikemccand) (migrated from JIRA)

We've implied an additional change to the way deletes are flushed in that today, they're flushed in applyDeletes when segments are merged, however with flush-DWPT we're applying deletes after flushing the DWPT segment.

Why is that change needed again? Deferring until merge kickoff is a win, eg for a non N/R/T (heh) app, it means 1/10th the reader open cost (w/ default mergeFactor=10)? Opening/closing readers can be costly for a large index.

Really, some day, we ought to only apply deletes to those segments about to be merged (and keep the buffer for the rest of the segments). Eg most merges are small... yet we'll pay huge cost opening that massive grandaddy segment every time these small merges kick off. But that's another issue...

Why can't we just do what we do today? Ie push the DWPT buffered deletes into the flushed deletes?

@asfimport
Copy link
Author

Jason Rutherglen (migrated from JIRA)

Ok, I have been stuck/excited about not having to use/understand the remap-docids method, because it's hard to debug. However I see what you're saying, and why remap-docids exists. I'll push the DWP buffered deletes to the flushed deletes.

we'll pay huge cost opening that massive grandaddy segment

This large cost is from loading the terms index and deleted docs? When those large segments are merged though, the IO cost is so substantial that loading tii or del into RAM probably doesn't account for much of the aggregate IO, they're probably in the noise? Or are you referring to the NRT apply deletes flush, however that is on a presumably pooled reader? Or you're just saying that today we're applying deletes across the board to all segments prior to a merge, regardless of whether or not they're even involved in the merge? It seems like that is changeable?

@asfimport
Copy link
Author

asfimport commented Oct 2, 2010

Michael McCandless (@mikemccand) (migrated from JIRA)

Ok, I have been stuck/excited about not having to use/understand the remap-docids method, because it's hard to debug. However I see what you're saying, and why remap-docids exists. I'll push the DWP buffered deletes to the flushed deletes.

I think we still must remap, at least on the pushed (deletesFlushed) deletes?

On the buffered deletes for the DWPT (deletesInRAM), I think we can make these relative to the DWPT (ie start from 0), but on pushing them into flushed deletes we re-base them?

This large cost is from loading the terms index and deleted docs?

Yes. We don't (hopefully) load norms, field cache, etc.

When those large segments are merged though, the IO cost is so substantial that loading tii or del into RAM probably doesn't account for much of the aggregate IO, they're probably in the noise?

Well, the applyDeletes method is sync'd, vs merging which is fully concurrent. (Also, merging doesn't load the tii).

Or are you referring to the NRT apply deletes flush, however that is on a presumably pooled reader?

Right, it would be pooled for the NRT case, so this is only a (sizable) perf problem for the non-nrt case.

Or you're just saying that today we're applying deletes across the board to all segments prior to a merge, regardless of whether or not they're even involved in the merge? It seems like that is changeable?

Right! That's what we do today (apply deletes to all segs) whereas it's really only necessary to apply them to the segments being merged. I opened #3754 to track this.

@asfimport
Copy link
Author

asfimport commented Oct 10, 2010

Jason Rutherglen (migrated from JIRA)

With per DWPT deletes, we're maintaining a docidupto/limit per term/query. When the DWPT deletes are transferred to an IW deletes flushed, there isn't a way to globalize the docupto/limit value that's been maintained per DWPT. Today we globalize the docidupto/limit value because we're only flushing one segment at a time.

We need to implement per segment deletes queuing to make this patch work, basically implement #3754. I'll start in this direction.

@asfimport
Copy link
Author

Michael McCandless (@mikemccand) (migrated from JIRA)

With per DWPT deletes, we're maintaining a docidupto/limit per term/query. When the DWPT deletes are transferred to an IW deletes flushed, there isn't a way to globalize the docupto/limit value that's been maintained per DWPT. Today we globalize the docidupto/limit value because we're only flushing one segment at a time.

I think we can still globalize?

The trick is it'd require doing the remapping in the same sync block that appends the new SegmentInfo into the SegmentInfos, right? Because it's a that moment that the newly flushed segment is assigned its position w/ the rest of the segments. And so, at that moment, it knows the sum of the maxDoc() of all prior segments, and it can globalize all of the docIDs mapped to the pending delete Term/Query?

@asfimport
Copy link
Author

Jason Rutherglen (migrated from JIRA)

When a new segment is flushed, then the docid-upto is reset to what's in the latest segment? Aren't we then losing point-in-timeness of the docid-uptos per DWPT by simply resetting dut to the highest value of the newest segment?

@asfimport
Copy link
Author

Michael McCandless (@mikemccand) (migrated from JIRA)

When a new segment is flushed, then the docid-upto is reset to what's in the latest segment?

Do you mean the flushedDocCount (in DocumentsWriter), by "docid-upto"? Ie, this is what we use to "globalize" the docid-upto stored for each delete Term/Query.

Aren't we then losing point-in-timeness of the docid-uptos per DWPT by simply resetting dut to the highest value of the newest segment?

I'm confused... that flushedDocCount is just an int, tracking the total doc count of all already-flushed segments. This shouldn't affect point-in-timeness....?

@asfimport
Copy link
Author

Jason Rutherglen (migrated from JIRA)

If there are 2 DWPTs, and each has a the following pending deletes:

DWPT #1 maxdoc: 50; term id:1 limit:25

DWPT #2 maxdoc: 45; term id:1 limit:30

Where limit is the docid-upto stored as an int value in the pending deletes map. If we flush one of these DWPTs without first applying the deletes to a bitvector, and we flush the DWPTs in order, then DWPT #1 would flush the deletes, the term doc limit would be reset to segmentinfos maxdoc + 25. However there' a 50 - 25 = 25 gap. When DWPT #2 is flushed, the limit for term id:1 is set to segmentinfos maxdoc + 50 + 30. eg, we're effectively losing DWPT #1s limit (26 - 50)?

@asfimport
Copy link
Author

asfimport commented Oct 13, 2010

Michael McCandless (@mikemccand) (migrated from JIRA)

Just to confirm: when a delete is done, we go and buffer that delete into each DWPT, right? (Mapped to the then-current docid-upto for that DWPT).

OK I see the problem. It's because we now have a single pool for deletesFlushed, right? Ie, DWPT #2 will overwrite the term id:1 entry.

But, I think the switch to generations of pending deletes (#3754) would fix this? Maybe we should go do that one first...

Ie, DWPT #1's flush would enter a new gen delete pool (maps term -> global docid-upto). Then DWPT #2's flush would also enter a new gen delete pool. Hmm, but not quite... the generations can't simply stack on top of one another. I think there's a graph structure somehow? Ie every DWPT that's flushed must record the segments that existed (were already flushed) when it was first created, because it's only those segments that should get the deleted term. Segments in the future obviously shouldn't get it. And segments from parallel DWPTs (ie that existed at the same time) should also not get it since they will separately track the deleted term.

BTW, I think this makes #3753 all the more important (the ability to delete such that delete will only apply to already-committed segments), since this'd mean we only store the pending delete in a single map instead of map per DWPT.

@asfimport
Copy link
Author

asfimport commented Oct 13, 2010

Jason Rutherglen (migrated from JIRA)

I think the switch to generations of pending deletes (#3754) would fix this? Maybe we should go do that one first.

Yes, lets work on that one as it directly relates to solving this issue. The generational system won't work for DWPT pending flushed deletes for the reasons described (ie, the docidupto/limit will be inappropriately globalized).

I think there's a graph structure somehow? Ie every DWPT that's flushed must record the segments that existed (were already flushed) when it was first created, because it's only those segments that should get the deleted term

You're saying record a list of segments that existed at the time of flushing a DWPT's deletes? Lets get that data structure mapped out to start on #3754?

@asfimport
Copy link
Author

asfimport commented Oct 20, 2010

Michael McCandless (@mikemccand) (migrated from JIRA)

You're saying record a list of segments that existed at the time of flushing a DWPT's deletes?

Actually, I think it's simpler! I think the DWPT just records the index of the last segment in the index, as of when it is created (or re-inited after it's been flushed).

On flush of a given DWPT, its buffered deletes are recorded against that segment, and still carry over the lastSegmentIndex. This way, when we finally do resolve these deletes to docIDs, we 1) always apply the delete if segment <= lastSegmentIndex, or 2) the doc is in that segment and is <= the docID upto. I think this'd mean we can keep the docid-upto as local docIDs, which is nice (no globalizing/shifting-on-merge/flush needed).

So, segments flushed in the current IW session will carry this private pool of pending deletes. But, pre-existing segments in the index don't need their own pool. Instead, when it's time to resolve the buffered deletes against them (because they are about to be merged), they must walk all of the per-segment pools, resolving the deletes from that pool if its segment index is <= the lastSegmentIndex of that pool. We should take care to efficiently handle dup'd terms, ie where the same del term is present in multiple pools. The most recent one "wins", and we should do only one delete (per segment) for that term.

These per-segment delete pools must be updated on merge. EG if the lastSegmentIndex of a pool gets merged, that's fine, but then on merge commit we must move that lastSegmentIndex "backwards" to the last segment before the merge, because any deletes necessary within the segment will have been resolved already.

When segments with del pools are merged, we obviously apply the deletes to the segments being merged, but, then, we have to coalesce those pools and move them into a single pool on the segment just before the merge. We could actually use a Set at this point since there is no more docid-upto for this pool (ie, it applies to all docs on that segment and in segments prior to it).

So I think this is much simpler than I first thought!

Lets get that data structure mapped out to start on #3754?

+1

@asfimport
Copy link
Author

asfimport commented Oct 20, 2010

Jason Rutherglen (migrated from JIRA)

Can you explain what you mean by "per-segment pools" compared with the flushed deletes data structure in there today, ie, why are they pools?

I can start on the simplified version of this for the non-DWPT use case, eg, #3754. That version'll implement the delete pool per segment only for those segments created after the last flush/commit?

@asfimport
Copy link
Author

asfimport commented Oct 20, 2010

Michael McCandless (@mikemccand) (migrated from JIRA)

Can you explain what you mean by "per-segment pools" compared with the flushed deletes data structure in there today, ie, why are they pools?

Sorry, by pool I mean what we have today. Ie a map from del term/query -> global docid-upto.

Except, with this change it will be per-segment, only for those segments flushed since IW was opened or since last commit.

I can start on the simplified version of this for the non-DWPT use case, eg, #3754.

Yes, please!

This approach works fine for the single-DWPT case (ie what we now have). It'll be good to get it first working on this simpler case, then port to the RT branch to get it working for multiple DWPTs.

That version'll implement the delete pool per segment only for those segments created after the last flush/commit?

Created after IW was opened or after last commit, yes.

I think we'll need an applyAllDeletes (called by commit) vs an applyDeletesToSegment (called by merge).

@asfimport
Copy link
Author

asfimport commented Dec 9, 2010

Jason Rutherglen (migrated from JIRA)

Per-segment deletes were implemented as a part of #3754.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant