Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LUCENE-9536: Optimize OrdinalMap when one segment contains all distinct values. #1948

Merged
merged 4 commits into from
Nov 2, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
36 changes: 25 additions & 11 deletions lucene/core/src/java/org/apache/lucene/index/OrdinalMap.java
Original file line number Diff line number Diff line change
Expand Up @@ -172,10 +172,12 @@ public static OrdinalMap build(IndexReader.CacheKey owner, TermsEnum subs[], lon

/** Cache key of whoever asked for this awful thing */
public final IndexReader.CacheKey owner;
// number of global ordinals
final long valueCount;
// globalOrd -> (globalOrd - segmentOrd) where segmentOrd is the the ordinal in the first segment that contains this term
final PackedLongValues globalOrdDeltas;
final LongValues globalOrdDeltas;
// globalOrd -> first segment container
final PackedLongValues firstSegments;
final LongValues firstSegments;
// for every segment, segmentOrd -> globalOrd
final LongValues segmentToGlobalOrds[];
// the map from/to segment ids
Expand Down Expand Up @@ -271,13 +273,25 @@ protected boolean lessThan(TermsEnumIndex a, TermsEnumIndex b) {
globalOrd++;
}

this.firstSegments = firstSegments.build();
this.globalOrdDeltas = globalOrdDeltas.build();
long ramBytesUsed = BASE_RAM_BYTES_USED + segmentMap.ramBytesUsed();
this.valueCount = globalOrd;

// If the first segment contains all of the global ords, then we can apply a small optimization
// and hardcode the first segment indices and global ord deltas as all zeroes.
if (ordDeltaBits.length > 0 && ordDeltaBits[0] == 0L && ordDeltas[0].size() == this.valueCount) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm why only the first segment? Couldn't it be the 3rd segment, in addition, that matches the global ords?

Edit: ahh OK I understand now -- this opto is indeed specific to the first segment, so we can store this.firstSegments as all 0s. Good!

Do we (somewhere, couldn't find it here) pre-sort all segments by the cardinality descending? Then we could know all segments that meet this optimization are at the start of the segments list, and possibly building the ordinal map is faster (not sure). But then we would need to un-sort in the end to return the final OrdinalMap. But it might enable this opto to apply more often, except, I think we would then need an additional dereference on lookup, hmm.

Does our PackedLongValues.monotonicBuilder already optimize for the case where it is all 0s, for the case where another segment (not the first) has all the global values as well?

Copy link
Member Author

@jtibshirani jtibshirani Oct 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we (somewhere, couldn't find it here) pre-sort all segments by the cardinality descending?

We do in fact -- the segments are sorted by 'weight', which in all call sites corresponds to the number of unique terms. This was added in LUCENE-5782.

Does our PackedLongValues.monotonicBuilder already optimize for the case where it is all 0s, for the case where another segment (not the first) has all the global values as well?

When constructing the individual PackedInts.Reader instances, we do identify the all 0s case and use the lightweight PackedInts.NullReader. It's great we optimize that case, but it does mean this PR doesn't make an enormous space difference.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we (somewhere, couldn't find it here) pre-sort all segments by the cardinality descending?

We do in fact -- the segments are sorted by 'weight', which in all call sites corresponds to the number of unique terms. This was added in LUCENE-5782.

Ahh, so then we know the first segment will indeed have the most unique terms, and therefore the highest chance of having "all 0s" ord deltas.

I think 2nd and 3rd segments also might have all 0s ord deltas? But we can try to optimize that in a followon issue ... progress not perfection!

Does our PackedLongValues.monotonicBuilder already optimize for the case where it is all 0s, for the case where another segment (not the first) has all the global values as well?

When constructing the individual PackedInts.Reader instances, we do identify the all 0s case and use the lightweight PackedInts.NullReader. It's great we optimize that case, but it does mean this PR doesn't make an enormous space difference.

Got it. Well, it's great that all these layers optimize :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think 2nd and 3rd segments also might have all 0s ord deltas? But we can try to optimize that in a followon issue ... progress not perfection!

I wonder if you are confused here, the proposed changes optimize the mapping from global ordinals to the ordinals of one arbitrary segment. When a segment has all value, we can simplify by always picking this segment, but there is no need to optimize this for the 2nd or 3rd segments, since we only need to be able to translate global ordinals to the ordinal of a single segment. Or maybe I'm the one confused by what you were suggesting. :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aha! Sorry, I was indeed confused ;)

This is to enable "retrieve BytesRef for this global ordinal" use-case, right? For that, we first pick a segment to use (the first one also containing that BytesRef), then map to its segment-local ordinal, then retrieve the BytesRef for that using the existing doc values API for that segment.

We do not (need to, nor) expose an API today to "retrieve segment N's ordinal corresponding to global ordinal M". Only the reverse direction (segment N's ordinal M maps to global ordinal O).

I think I understand now!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to enable "retrieve BytesRef for this global ordinal" use-case, right?

Right!

We do not (need to, nor) expose an API today to "retrieve segment N's ordinal corresponding to global ordinal M"

Correct.

this.firstSegments = LongValues.ZEROES;
this.globalOrdDeltas = LongValues.ZEROES;
} else {
PackedLongValues packedFirstSegments = firstSegments.build();
PackedLongValues packedGlobalOrdDeltas = globalOrdDeltas.build();
this.firstSegments = packedFirstSegments;
this.globalOrdDeltas = packedGlobalOrdDeltas;
ramBytesUsed += packedFirstSegments.ramBytesUsed() + packedGlobalOrdDeltas.ramBytesUsed();
}

// ordDeltas is typically the bottleneck, so let's see what we can do to make it faster
segmentToGlobalOrds = new LongValues[subs.length];
long ramBytesUsed = BASE_RAM_BYTES_USED + this.globalOrdDeltas.ramBytesUsed()
+ this.firstSegments.ramBytesUsed() + RamUsageEstimator.shallowSizeOf(segmentToGlobalOrds)
+ segmentMap.ramBytesUsed();
ramBytesUsed += RamUsageEstimator.shallowSizeOf(segmentToGlobalOrds);
for (int i = 0; i < ordDeltas.length; ++i) {
final PackedLongValues deltas = ordDeltas[i].build();
if (ordDeltaBits[i] == 0L) {
Expand Down Expand Up @@ -317,6 +331,7 @@ public long get(long ord) {
ramBytesUsed += RamUsageEstimator.shallowSizeOf(segmentToGlobalOrds[i]);
}
}

this.ramBytesUsed = ramBytesUsed;
}

Expand Down Expand Up @@ -348,7 +363,7 @@ public int getFirstSegmentNumber(long globalOrd) {
* Returns the total number of unique terms in global ord space.
*/
public long getValueCount() {
return globalOrdDeltas.size();
return valueCount;
}

@Override
Expand All @@ -359,10 +374,9 @@ public long ramBytesUsed() {
@Override
public Collection<Accountable> getChildResources() {
List<Accountable> resources = new ArrayList<>();
resources.add(Accountables.namedAccountable("global ord deltas", globalOrdDeltas));
resources.add(Accountables.namedAccountable("first segments", firstSegments));
resources.add(Accountables.namedAccountable("segment map", segmentMap));
// TODO: would be nice to return actual child segment deltas too, but the optimizations are confusing
// TODO: would be nice to return the ordinal and segment maps too, but it's not straightforward
// because of optimizations.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could be do something like if (firstSegments != LongValues.ZEROES) { resources.add(Accountables.namedAccountable("first segments", firstSegments)); }?

Copy link
Member Author

@jtibshirani jtibshirani Oct 15, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we'd need a cast here, since LongValues doesn't implement Accountable. Alternatively, we could consider a bigger change to have LongValues implement Accountable.

Update: I just saw LUCENE-9387, it probably doesn't make sense to increase usage of Accountable.

return resources;
}
}
59 changes: 54 additions & 5 deletions lucene/core/src/test/org/apache/lucene/index/TestOrdinalMap.java
Original file line number Diff line number Diff line change
Expand Up @@ -17,10 +17,6 @@
package org.apache.lucene.index;


import java.io.IOException;
import java.lang.reflect.Field;
import java.util.HashMap;

import org.apache.lucene.analysis.MockAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.SortedDocValuesField;
Expand All @@ -32,6 +28,10 @@
import org.apache.lucene.util.RamUsageTester;
import org.apache.lucene.util.TestUtil;

import java.io.IOException;
import java.lang.reflect.Field;
import java.util.HashMap;

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does (would) https://issues.apache.org/jira/browse/LUCENE-9564 enforce import ordering check?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would avoid this sort of change (I use IntelliJ which autoformats java imports at the end).

public class TestOrdinalMap extends LuceneTestCase {

private static final Field ORDINAL_MAP_OWNER_FIELD;
Expand All @@ -46,7 +46,7 @@ public class TestOrdinalMap extends LuceneTestCase {
private static final RamUsageTester.Accumulator ORDINAL_MAP_ACCUMULATOR = new RamUsageTester.Accumulator() {

public long accumulateObject(Object o, long shallowSize, java.util.Map<Field,Object> fieldValues, java.util.Collection<Object> queue) {
if (o == LongValues.IDENTITY) {
if (o == LongValues.ZEROES || o == LongValues.IDENTITY) {
return 0L;
}
if (o instanceof OrdinalMap) {
Expand Down Expand Up @@ -95,4 +95,53 @@ public void testRamBytesUsed() throws IOException {
dir.close();
}

/**
* Tests the case where one segment contains all of the global ords. In this case, we apply a
* small optimization and hardcode the first segment indices and global ord deltas as all zeroes.
*/
public void testOneSegmentWithAllValues() throws IOException {
Directory dir = newDirectory();
IndexWriterConfig cfg = new IndexWriterConfig(new MockAnalyzer(random())).setCodec(
TestUtil.alwaysDocValuesFormat(TestUtil.getDefaultDocValuesFormat()));
RandomIndexWriter iw = new RandomIndexWriter(random(), dir, cfg);

int numTerms = 1000;
for (int i = 0; i < numTerms; ++i) {
Document d = new Document();
String term = String.valueOf(i);
d.add(new SortedDocValuesField("sdv", new BytesRef(term)));
iw.addDocument(d);
}
iw.forceMerge(1);

for (int i = 0; i < 10; ++i) {
Document d = new Document();
String term = String.valueOf(random().nextInt(numTerms));
d.add(new SortedDocValuesField("sdv", new BytesRef(term)));
iw.addDocument(d);
}
iw.commit();

DirectoryReader r = iw.getReader();
SortedDocValues sdv = MultiDocValues.getSortedValues(r, "sdv");
assertNotNull(sdv);
assertTrue(sdv instanceof MultiDocValues.MultiSortedDocValues);

// Check that the optimization kicks in.
OrdinalMap map = ((MultiDocValues.MultiSortedDocValues) sdv).mapping;
assertEquals(LongValues.ZEROES, map.firstSegments);
assertEquals(LongValues.ZEROES, map.globalOrdDeltas);

// Check the map's basic behavior.
assertEquals(numTerms, (int) map.getValueCount());
for (int i = 0; i < numTerms; i++) {
assertEquals(0, map.getFirstSegmentNumber(i));
assertEquals(i, map.getFirstSegmentOrd(i));
}

iw.close();
r.close();
dir.close();
}

}