-
Notifications
You must be signed in to change notification settings - Fork 428
OAK-11510 - Performance improvements to IndexDefinition class #2108
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…case-insensitive keys by always converting to lower case Strings used in get/put operations, by a TreeMap using a case-insensitive String comparator, thereby avoiding the need to convert to lower-case before put/get. This reduces significantly object allocation during indexing, because this map is queried for every node that is indexed. Add an additional field with the list of PropertyDefinitions, which is a copy of the values of the PropertyConfig map. It is faster to iterate directly over this list than to iterate over the values of the map. This should give a significant speedup to indexing operations, as they call very often these operations. Other minor cleanups.
…ds and return from method as soon as first matching rule is found.
oak-search/src/main/java/org/apache/jackrabbit/oak/plugins/index/search/IndexDefinition.java
Outdated
Show resolved
Hide resolved
oak-search/src/main/java/org/apache/jackrabbit/oak/plugins/index/search/IndexDefinition.java
Outdated
Show resolved
Hide resolved
| List<PropertyDefinition> functionRestrictions, | ||
| List<PropertyDefinition> syncProps, | ||
| List<PropertyDefinition> similarityProperties) { | ||
| TreeMap<String, PropertyDefinition> propDefns = new TreeMap<>(String.CASE_INSENSITIVE_ORDER); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if TreeMap with CASE_INSENSITIVE_ORDER is faster than HashMap... I know, toLowerCase is not needed, but then, aren't most property names already lowercase? So toLowerCase() will just return "this".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have noticed this optimization opportunity while profiling a run of the indexing job, where the calls to toLowerCase() were responsible for a significant part of object allocation in that run, and also showed up on CPU profiling. Property names are usually Camel case, so most of them will in fact have to be converted to lower case as most consist of two or more logical words.
| } | ||
| } | ||
| ensureNodeTypeIndexingIsConsistent(propDefns, syncProps); | ||
| return Map.copyOf(propDefns); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, we used to copy the map... I don't know why... But are you sure this is not needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see a reason why we must make a copy. Once this method finishes, there is no other reference to the map other than the one returned by the method, so there is no risk of the map being modified elsewhere. And the IndexDefinition class does not modify the map anywhere, just reads it. So I don't think it is necessary to make a copy or to make the map immutable. Maybe it was done just as a best-practice to ensure immutability. Or as an attempt at performance optimization, as an immutable map created by Map.copyOf() may be faster than a HashMap due to a more efficient internal representation. I am not sure, but in this case, avoiding the creation of lower case strings is a big gain, easily offsets any additional overhead of searching for a key on a tree as compared to search on a hashmap. Searching on a HashMap also requires computing the hash of the String, which can be expensive.
PropertyDefinitions, which is a copy of the values of thePropertyConfigmap. It is faster to iterate directly over this list than to iterate over the values of the map. This should give a significant speedup to indexing operations, as they call very often these operations.getApplicableIndexingRule()methods and return from method as soon as first matching rule is found.Other minor cleanups.