Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions lucene/CHANGES.txt
Original file line number Diff line number Diff line change
Expand Up @@ -69,6 +69,8 @@ Improvements

* GITHUB#15124: Use RamUsageEstimator to calculate size for non-accountable queries. (Sagar Upadhyaya)

* GITHUB#15453: Avoid unnecessary sorting and instantiations in readMapOfStrings. (Benjamin Lerer)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we call out the parts of the code that mean anything on this? e.g. segment attributes & diagnostics are now not ordered. Also, this will mean two segment infos that were previously equivalent will now have different toStrings (which is likely OK)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In SegmentInfo the maps were not ordered before the change because te use of Map.copyOf, in the constructor, was discarding the TreeMap ordering. A similar thing was happening for the TreeSet. Its elements were converted and added to a HashSet. Only FieldInfo attributes was somehow retaining the order has long as no new attibutes were added. Of course that was only valid if you add less than 10 elements within the maps.


Optimizations
---------------------
* GITHUB#14011: Reduce allocation rate in HNSW concurrent merge. (Viliam Durina)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -185,7 +185,6 @@ private DocValuesConsumer getInstance(FieldInfo field, boolean ignoreCurrentForm
}
final String formatName = format.getName();

field.putAttribute(PER_FIELD_FORMAT_KEY, formatName);
Integer suffix = null;

ConsumerAndSuffix consumer = formats.get(format);
Expand Down Expand Up @@ -229,8 +228,8 @@ private DocValuesConsumer getInstance(FieldInfo field, boolean ignoreCurrentForm
assert suffixes.containsKey(formatName);
suffix = consumer.suffix;
}

field.putAttribute(PER_FIELD_SUFFIX_KEY, Integer.toString(suffix));
field.putAttributes(
Map.of(PER_FIELD_FORMAT_KEY, formatName, PER_FIELD_SUFFIX_KEY, Integer.toString(suffix)));
// TODO: we should only provide the "slice" of FIS
// that this DVF actually sees ...
return consumer.consumer;
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -148,7 +148,6 @@ private KnnVectorsWriter getInstance(FieldInfo field) throws IOException {
}
final String formatName = format.getName();

field.putAttribute(PER_FIELD_FORMAT_KEY, formatName);
Integer suffix;

WriterAndSuffix writerAndSuffix = formats.get(format);
Expand Down Expand Up @@ -176,7 +175,8 @@ private KnnVectorsWriter getInstance(FieldInfo field) throws IOException {
assert suffixes.containsKey(formatName);
suffix = writerAndSuffix.suffix;
}
field.putAttribute(PER_FIELD_SUFFIX_KEY, Integer.toString(suffix));
field.putAttributes(
Map.of(PER_FIELD_FORMAT_KEY, formatName, PER_FIELD_SUFFIX_KEY, Integer.toString(suffix)));
return writerAndSuffix.writer;
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -245,8 +245,12 @@ private Map<PostingsFormat, FieldsGroup> buildFieldsGroupMapping(

groupBuilder.addField(field);

fieldInfo.putAttribute(PER_FIELD_FORMAT_KEY, formatName);
fieldInfo.putAttribute(PER_FIELD_SUFFIX_KEY, Integer.toString(groupBuilder.suffix));
fieldInfo.putAttributes(
Map.of(
PER_FIELD_FORMAT_KEY,
formatName,
PER_FIELD_SUFFIX_KEY,
Integer.toString(groupBuilder.suffix)));
}

Map<PostingsFormat, FieldsGroup> formatToGroups =
Expand Down
16 changes: 15 additions & 1 deletion lucene/core/src/java/org/apache/lucene/index/FieldInfo.java
Original file line number Diff line number Diff line change
Expand Up @@ -113,7 +113,7 @@ public FieldInfo(
this.omitNorms = false;
}
this.dvGen = dvGen;
this.attributes = Objects.requireNonNull(attributes);
this.attributes = Map.copyOf(Objects.requireNonNull(attributes));
this.pointDimensionCount = pointDimensionCount;
this.pointIndexDimensionCount = pointIndexDimensionCount;
this.pointNumBytes = pointNumBytes;
Expand Down Expand Up @@ -679,6 +679,20 @@ public synchronized String putAttribute(String key, String value) {
return oldValue;
}

/**
* Puts some codec attribute values.
*
* <p>If multiples attributes need to be added {@code putAttributes} is more efficient than
* calling {@code putAttribute} multiple times as it avoid unnecessary copies and synchronisation.
*/
public synchronized void putAttributes(Map<String, String> map) {
HashMap<String, String> newMap = new HashMap<>(attributes);
newMap.putAll(map);
// This needs to be thread-safe as multiple threads may be updating (different) attributes
// concurrently due to concurrent merging.
attributes = Collections.unmodifiableMap(newMap);
}

/** Returns internal codec attributes map. */
public synchronized Map<String, String> attributes() {
return attributes;
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -775,7 +775,7 @@ FieldInfo add(FieldInfo fi, long dvGen) {
if (curFi != null) {
curFi.verifySameSchema(fi);
if (fi.attributes() != null) {
fi.attributes().forEach((k, v) -> curFi.putAttribute(k, v));
curFi.putAttributes(fi.attributes());
}
if (fi.hasPayloads()) {
curFi.setStorePayloads();
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -362,6 +362,7 @@ public synchronized String putAttribute(String key, String value) {
// This needs to be thread-safe because multiple threads may be updating (different) attributes
// at the same time due to concurrent merging, plus some threads may be calling toString() on
// segment info while other threads are updating attributes.
// We use unmodifiableMap instead of Map.copyOf to avoid an unnecessary copy.
attributes = Collections.unmodifiableMap(newMap);
return oldValue;
}
Expand Down
29 changes: 12 additions & 17 deletions lucene/core/src/java/org/apache/lucene/store/DataInput.java
Original file line number Diff line number Diff line change
Expand Up @@ -18,14 +18,9 @@

import java.io.IOException;
import java.nio.charset.StandardCharsets;
import java.util.Collections;
import java.util.HashMap;
import java.util.HashSet;
import java.util.Map;
import java.util.Objects;
import java.util.Set;
import java.util.TreeMap;
import java.util.TreeSet;
import org.apache.lucene.util.BitUtil;

/**
Expand Down Expand Up @@ -248,17 +243,17 @@ public DataInput clone() {
public Map<String, String> readMapOfStrings() throws IOException {
int count = readVInt();
if (count == 0) {
return Collections.emptyMap();
return Map.of();
} else if (count == 1) {
return Collections.singletonMap(readString(), readString());
return Map.of(readString(), readString());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes only work if input arguments are non-null, which I'm not sure we ever enforced or documented so it would be an incompatible change. Applies to both sets and maps.

import java.util.*;

public class Test {
	public static void main(String[] args) {
		String s = null;
		System.out.println("This works");
		Collections.singleton(s);
		System.out.println("This does not");
		Set.of(s);
	}
}

To play it safe, I'd limit the change to just removing the treeset (and leaving hash set). Then this can be backported with no problems. If you'd like to move to non-null arguments then this needs to be documented in the migration, method javadocs, etc.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently the readString method cannot return a null String and SegmentInfo convert the Maps through Map.copyOf which enforce the same check, just later on. Also writeString do not support null values. Therefore, it does not look like null String are supported by the code. That being said I understand the concern. The choice of using immutable maps (created throught Map.of and Map.copyOf) and set was to align with what was done within SegmentInfo and avoid an extra instantiation as a Map.copyOf of a immutable map is a noop.

} else {
Map<String, String> map = count > 10 ? new HashMap<>() : new TreeMap<>();
@SuppressWarnings("unchecked")
Map.Entry<String, String>[] entries =
(Map.Entry<String, String>[]) new Map.Entry<?, ?>[count];
for (int i = 0; i < count; i++) {
final String key = readString();
final String val = readString();
map.put(key, val);
entries[i] = Map.entry(readString(), readString());
}
return Collections.unmodifiableMap(map);
return Map.ofEntries(entries);
}
}

Expand All @@ -270,15 +265,15 @@ public Map<String, String> readMapOfStrings() throws IOException {
public Set<String> readSetOfStrings() throws IOException {
int count = readVInt();
if (count == 0) {
return Collections.emptySet();
return Set.of();
} else if (count == 1) {
return Collections.singleton(readString());
return Set.of(readString());
} else {
Set<String> set = count > 10 ? new HashSet<>() : new TreeSet<>();
String[] set = new String[count];
for (int i = 0; i < count; i++) {
set.add(readString());
set[i] = readString();
}
return Collections.unmodifiableSet(set);
return Set.of(set);
}
}

Expand Down
Loading