-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Avoid unnecessary sorting and instantiations in readMapOfStrings #15457
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
|
||
| * GITHUB#15124: Use RamUsageEstimator to calculate size for non-accountable queries. (Sagar Upadhyaya) | ||
|
|
||
| * GITHUB#15453: Avoid unnecessary sorting and instantiations in readMapOfStrings. (Benjamin Lerer) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we call out the parts of the code that mean anything on this? e.g. segment attributes & diagnostics are now not ordered. Also, this will mean two segment infos that were previously equivalent will now have different toStrings (which is likely OK)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In SegmentInfo the maps were not ordered before the change because te use of Map.copyOf, in the constructor, was discarding the TreeMap ordering. A similar thing was happening for the TreeSet. Its elements were converted and added to a HashSet. Only FieldInfo attributes was somehow retaining the order has long as no new attibutes were added. Of course that was only valid if you add less than 10 elements within the maps.
| // This needs to be thread-safe as multiple threads may be updating (different) attributes | ||
| // concurrently due to concurrent merging. | ||
| attributes = Collections.unmodifiableMap(newMap); | ||
| attributes = Map.copyOf(newMap); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't replace unmodifiableMap with copyOf. There is no point in doing it, I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea was to align the code of FieldInfo with the one of SegmentInfo. Currently SegmentInfo is using ImmutableCollections.AbstractImmutableMap (through Map.copyOf) whereas FieldInfo is using UnmodifiableMap (through Collections.unmodifiable). Originally SegmentInfo was doing some defensive copies using Collections.unmodifiableMap(new HashMap<>(Objects.requireNonNull(diagnostics))); but they have been replaced by Maps.copyOf in #649
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But a copyOf is going to be doing more copying, while unmodifiableMap is a thin wrapper. Is it possible for this code to be called a lot? If not, then the difference probably doesn't matter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is a good point. Effectively this method is often called twice in a row within some DocValuesProducer and in some loop within FieldInfos.Builder (which seems to be called during merges). So yes we should probably keep unmodifiableMap here. It would also probably make sense to have some putAttributes methods that perform the update in one go instead of creating a new Map and reacquiring the synchronize lock each time. What do you think?
| return Map.of(); | ||
| } else if (count == 1) { | ||
| return Collections.singletonMap(readString(), readString()); | ||
| return Map.of(readString(), readString()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These changes only work if input arguments are non-null, which I'm not sure we ever enforced or documented so it would be an incompatible change. Applies to both sets and maps.
import java.util.*;
public class Test {
public static void main(String[] args) {
String s = null;
System.out.println("This works");
Collections.singleton(s);
System.out.println("This does not");
Set.of(s);
}
}
To play it safe, I'd limit the change to just removing the treeset (and leaving hash set). Then this can be backported with no problems. If you'd like to move to non-null arguments then this needs to be documented in the migration, method javadocs, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently the readString method cannot return a null String and SegmentInfo convert the Maps through Map.copyOf which enforce the same check, just later on. Also writeString do not support null values. Therefore, it does not look like null String are supported by the code. That being said I understand the concern. The choice of using immutable maps (created throught Map.of and Map.copyOf) and set was to align with what was done within SegmentInfo and avoid an extra instantiation as a Map.copyOf of a immutable map is a noop.
|
I am fine in simply removing the |
|
Ok, maybe I'm being paranoid. I think it's fine for main if it passes tests. I wouldn't backport though as we don't know what out there may be relying on this order. |
Description
The patch remove the use of
TreeMapfromreadMapOfStringsand the use ofTreeSetfromreadSetOfStringand returnImmutableMapsandImmutableSetsinstead ofUnmodifiableMaps. As the use ofMap.copyOfonImmutableMapis a noop the change also avoid the creation of a new map instance inSegmentInfo.Closes #15453