Skip to content

Conversation

@blerer
Copy link

@blerer blerer commented Nov 26, 2025

Description

The patch remove the use of TreeMap from readMapOfStrings and the use of TreeSet from readSetOfString and return ImmutableMaps and ImmutableSets instead of UnmodifiableMaps. As the use of Map.copyOf on ImmutableMap is a noop the change also avoid the creation of a new map instance in SegmentInfo.

Closes #15453


* GITHUB#15124: Use RamUsageEstimator to calculate size for non-accountable queries. (Sagar Upadhyaya)

* GITHUB#15453: Avoid unnecessary sorting and instantiations in readMapOfStrings. (Benjamin Lerer)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we call out the parts of the code that mean anything on this? e.g. segment attributes & diagnostics are now not ordered. Also, this will mean two segment infos that were previously equivalent will now have different toStrings (which is likely OK)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In SegmentInfo the maps were not ordered before the change because te use of Map.copyOf, in the constructor, was discarding the TreeMap ordering. A similar thing was happening for the TreeSet. Its elements were converted and added to a HashSet. Only FieldInfo attributes was somehow retaining the order has long as no new attibutes were added. Of course that was only valid if you add less than 10 elements within the maps.

// This needs to be thread-safe as multiple threads may be updating (different) attributes
// concurrently due to concurrent merging.
attributes = Collections.unmodifiableMap(newMap);
attributes = Map.copyOf(newMap);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't replace unmodifiableMap with copyOf. There is no point in doing it, I think.

Copy link
Author

@blerer blerer Dec 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea was to align the code of FieldInfo with the one of SegmentInfo. Currently SegmentInfo is using ImmutableCollections.AbstractImmutableMap (through Map.copyOf) whereas FieldInfo is using UnmodifiableMap (through Collections.unmodifiable). Originally SegmentInfo was doing some defensive copies using Collections.unmodifiableMap(new HashMap<>(Objects.requireNonNull(diagnostics))); but they have been replaced by Maps.copyOf in #649

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But a copyOf is going to be doing more copying, while unmodifiableMap is a thin wrapper. Is it possible for this code to be called a lot? If not, then the difference probably doesn't matter.

Copy link
Author

@blerer blerer Dec 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a good point. Effectively this method is often called twice in a row within some DocValuesProducer and in some loop within FieldInfos.Builder (which seems to be called during merges). So yes we should probably keep unmodifiableMap here. It would also probably make sense to have some putAttributes methods that perform the update in one go instead of creating a new Map and reacquiring the synchronize lock each time. What do you think?

return Map.of();
} else if (count == 1) {
return Collections.singletonMap(readString(), readString());
return Map.of(readString(), readString());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes only work if input arguments are non-null, which I'm not sure we ever enforced or documented so it would be an incompatible change. Applies to both sets and maps.

import java.util.*;

public class Test {
	public static void main(String[] args) {
		String s = null;
		System.out.println("This works");
		Collections.singleton(s);
		System.out.println("This does not");
		Set.of(s);
	}
}

To play it safe, I'd limit the change to just removing the treeset (and leaving hash set). Then this can be backported with no problems. If you'd like to move to non-null arguments then this needs to be documented in the migration, method javadocs, etc.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently the readString method cannot return a null String and SegmentInfo convert the Maps through Map.copyOf which enforce the same check, just later on. Also writeString do not support null values. Therefore, it does not look like null String are supported by the code. That being said I understand the concern. The choice of using immutable maps (created throught Map.of and Map.copyOf) and set was to align with what was done within SegmentInfo and avoid an extra instantiation as a Map.copyOf of a immutable map is a noop.

@blerer
Copy link
Author

blerer commented Dec 1, 2025

I am fine in simply removing the TreeMap and TreeSet alternatives. It just results in some unecessary copies along the way and some inconsistencies between SegmentInfo and FieldInfo as well as within SegmentInfo itself. :-)

@dweiss
Copy link
Contributor

dweiss commented Dec 1, 2025

Ok, maybe I'm being paranoid. I think it's fine for main if it passes tests. I wouldn't backport though as we don't know what out there may be relying on this order.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Reading SegmentInfo and FieldInfo Map and Set fields perform unecessary sorting and object creation

3 participants