Skip to content

Conversation

@nfsantos
Copy link
Contributor

@nfsantos nfsantos commented Feb 21, 2025

  • Replace a HashMap with the property configs that was being used with case-insensitive keys by always converting to lower case Strings used in get/put operations, by a TreeMap using a case-insensitive String comparator, thereby avoiding the need to convert to lower-case before put/get. This reduces significantly object allocation during indexing, because this map is queried for every node that is indexed.
  • Add an additional field with the list of PropertyDefinitions, which is a copy of the values of the PropertyConfig map. It is faster to iterate directly over this list than to iterate over the values of the map. This should give a significant speedup to indexing operations, as they call very often these operations.
  • Do not create intermediate array in getApplicableIndexingRule() methods and return from method as soon as first matching rule is found.

Other minor cleanups.

…case-insensitive keys by always converting to lower case Strings used in get/put operations, by a TreeMap using a case-insensitive String comparator, thereby avoiding the need to convert to lower-case before put/get. This reduces significantly object allocation during indexing, because this map is queried for every node that is indexed.

Add an additional field with the list of PropertyDefinitions, which is a copy of the values of the PropertyConfig map. It is faster to iterate directly over this list than to iterate over the values of the map. This should give a significant speedup to indexing operations, as they call very often these operations.

Other minor cleanups.
…ds and return from method as soon as first matching rule is found.
List<PropertyDefinition> functionRestrictions,
List<PropertyDefinition> syncProps,
List<PropertyDefinition> similarityProperties) {
TreeMap<String, PropertyDefinition> propDefns = new TreeMap<>(String.CASE_INSENSITIVE_ORDER);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if TreeMap with CASE_INSENSITIVE_ORDER is faster than HashMap... I know, toLowerCase is not needed, but then, aren't most property names already lowercase? So toLowerCase() will just return "this".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have noticed this optimization opportunity while profiling a run of the indexing job, where the calls to toLowerCase() were responsible for a significant part of object allocation in that run, and also showed up on CPU profiling. Property names are usually Camel case, so most of them will in fact have to be converted to lower case as most consist of two or more logical words.

}
}
ensureNodeTypeIndexingIsConsistent(propDefns, syncProps);
return Map.copyOf(propDefns);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, we used to copy the map... I don't know why... But are you sure this is not needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see a reason why we must make a copy. Once this method finishes, there is no other reference to the map other than the one returned by the method, so there is no risk of the map being modified elsewhere. And the IndexDefinition class does not modify the map anywhere, just reads it. So I don't think it is necessary to make a copy or to make the map immutable. Maybe it was done just as a best-practice to ensure immutability. Or as an attempt at performance optimization, as an immutable map created by Map.copyOf() may be faster than a HashMap due to a more efficient internal representation. I am not sure, but in this case, avoiding the creation of lower case strings is a big gain, easily offsets any additional overhead of searching for a key on a tree as compared to search on a hashmap. Searching on a HashMap also requires computing the hash of the String, which can be expensive.

@nfsantos nfsantos merged commit 1ceb9b1 into apache:trunk Feb 28, 2025
1 of 4 checks passed
@nfsantos nfsantos deleted the OAK-11510 branch February 28, 2025 12:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants