OAK-11510 - Performance improvements to IndexDefinition class #2108

nfsantos · 2025-02-21T11:29:34Z

Replace a HashMap with the property configs that was being used with case-insensitive keys by always converting to lower case Strings used in get/put operations, by a TreeMap using a case-insensitive String comparator, thereby avoiding the need to convert to lower-case before put/get. This reduces significantly object allocation during indexing, because this map is queried for every node that is indexed.
Add an additional field with the list of PropertyDefinitions, which is a copy of the values of the PropertyConfig map. It is faster to iterate directly over this list than to iterate over the values of the map. This should give a significant speedup to indexing operations, as they call very often these operations.
Do not create intermediate array in getApplicableIndexingRule() methods and return from method as soon as first matching rule is found.

Other minor cleanups.

…case-insensitive keys by always converting to lower case Strings used in get/put operations, by a TreeMap using a case-insensitive String comparator, thereby avoiding the need to convert to lower-case before put/get. This reduces significantly object allocation during indexing, because this map is queried for every node that is indexed. Add an additional field with the list of PropertyDefinitions, which is a copy of the values of the PropertyConfig map. It is faster to iterate directly over this list than to iterate over the values of the map. This should give a significant speedup to indexing operations, as they call very often these operations. Other minor cleanups.

…ds and return from method as soon as first matching rule is found.

oak-search/src/main/java/org/apache/jackrabbit/oak/plugins/index/search/IndexDefinition.java

thomasmueller · 2025-02-28T09:06:38Z

oak-search/src/main/java/org/apache/jackrabbit/oak/plugins/index/search/IndexDefinition.java

+                                                                       List<PropertyDefinition> functionRestrictions,
+                                                                       List<PropertyDefinition> syncProps,
+                                                                       List<PropertyDefinition> similarityProperties) {
+            TreeMap<String, PropertyDefinition> propDefns = new TreeMap<>(String.CASE_INSENSITIVE_ORDER);


I'm not sure if TreeMap with CASE_INSENSITIVE_ORDER is faster than HashMap... I know, toLowerCase is not needed, but then, aren't most property names already lowercase? So toLowerCase() will just return "this".

I have noticed this optimization opportunity while profiling a run of the indexing job, where the calls to toLowerCase() were responsible for a significant part of object allocation in that run, and also showed up on CPU profiling. Property names are usually Camel case, so most of them will in fact have to be converted to lower case as most consist of two or more logical words.

thomasmueller · 2025-02-28T09:07:41Z

oak-search/src/main/java/org/apache/jackrabbit/oak/plugins/index/search/IndexDefinition.java

                }
            }
            ensureNodeTypeIndexingIsConsistent(propDefns, syncProps);
-            return Map.copyOf(propDefns);


Hm, we used to copy the map... I don't know why... But are you sure this is not needed?

I don't see a reason why we must make a copy. Once this method finishes, there is no other reference to the map other than the one returned by the method, so there is no risk of the map being modified elsewhere. And the IndexDefinition class does not modify the map anywhere, just reads it. So I don't think it is necessary to make a copy or to make the map immutable. Maybe it was done just as a best-practice to ensure immutability. Or as an attempt at performance optimization, as an immutable map created by Map.copyOf() may be faster than a HashMap due to a more efficient internal representation. I am not sure, but in this case, avoiding the creation of lower case strings is a big gain, easily offsets any additional overhead of searching for a key on a tree as compared to search on a hashmap. Searching on a HashMap also requires computing the hash of the String, which can be expensive.

nfsantos added 5 commits February 21, 2025 12:28

Merge remote-tracking branch 'upstream/trunk' into OAK-11510

cb57b4d

Merge remote-tracking branch 'upstream/trunk' into OAK-11510

9035ee1

Do not create intermediate array in getApplicableIndexingRule() metho…

0e3f405

…ds and return from method as soon as first matching rule is found.

Merge remote-tracking branch 'upstream/trunk' into OAK-11510

46c6247

thomasmueller reviewed Feb 28, 2025

View reviewed changes

Incorporate review comments.

144a0f5

nfsantos requested a review from thomasmueller February 28, 2025 10:35

Merge remote-tracking branch 'upstream/trunk' into OAK-11510

fb2eaed

thomasmueller approved these changes Feb 28, 2025

View reviewed changes

nfsantos merged commit 1ceb9b1 into apache:trunk Feb 28, 2025
1 of 4 checks passed

nfsantos deleted the OAK-11510 branch February 28, 2025 12:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OAK-11510 - Performance improvements to IndexDefinition class #2108

OAK-11510 - Performance improvements to IndexDefinition class #2108

Uh oh!

nfsantos commented Feb 21, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

thomasmueller Feb 28, 2025

Uh oh!

nfsantos Feb 28, 2025

Uh oh!

thomasmueller Feb 28, 2025

Uh oh!

nfsantos Feb 28, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

OAK-11510 - Performance improvements to IndexDefinition class #2108

OAK-11510 - Performance improvements to IndexDefinition class #2108

Uh oh!

Conversation

nfsantos commented Feb 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

thomasmueller Feb 28, 2025

Choose a reason for hiding this comment

Uh oh!

nfsantos Feb 28, 2025

Choose a reason for hiding this comment

Uh oh!

thomasmueller Feb 28, 2025

Choose a reason for hiding this comment

Uh oh!

nfsantos Feb 28, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

nfsantos commented Feb 21, 2025 •

edited

Loading