New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make lookup structures immutable. #7486
Conversation
This commit makes the lookup structures that are used for mappings immutable. When changes are required, a new instance is created while the current instance is left unmodified. This is done efficiently thanks to a hash table implementation based on a array hash trie, see org.elasticsearch.common.collect.CopyOnWriteHashMap. ManyMappingsBenchmark returns indexing times that are similar to the ones that can be observed in current master. Ultimately, I would like to see if we can make mappings completely immutable as well and updated atomically. This is not trivial however, eg. because of dynamic mappings. So here is a first baby step that should help move towards that direction.
/** | ||
* Return a new instance that has a different default analyzer. | ||
*/ | ||
public FieldNameAnalyzer setDefaultAnalyzer(Analyzer defaultAnalyzer) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe call this copyWithDefaultAnalyzer
or something? set implies it is mutating the existing object. But is this really needed at all? If I have a FieldNameAnalyzer
named f
, then I can do:
new FieldNameAnalyzer(f.analyzers(), defaultAnalyzer)
. This method seems to just mask what is actually happening (creating a new FieldNameAnalyzer).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed
I just pushed a new commit that makes adding/removing mappers to FieldMappersLookup perform in logarithmic time instead of linear. |
return cowMap; | ||
} else { | ||
return new CopyOnWriteHashMap<K, V>().copyAndPutAll(map); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not have copyAndPutAll()
as just the impl of a ctor which takes a map (maybe private)? I only see one other use of the function, and that looks like it could just call this copyOf
method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted to make the API "complete" like java's Map also has put
and putAll
but agreed that if it's not really used, let's make it private. We will still be able to put it back if we need it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I tried to do it, but it felt weird to have copyAndRemoveAll but not copyAndPutAll so I finally left it and made it used by the set impl.
However, I addressed your comment to use copyOf instead of putAll in ObjectMapper constructor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, sounds fine.
Looks ok overall. None of the comments I left are critical. I am, however, concerned about the number of allocations just to construct from a map. Perhaps there is a way to have a constructor that (1) iterates the source map and builds a list of the keys/values (2) sorts based on hash, and (3) build the nodes from the root down, allocating exactly what is needed? If this sounds too crazy just ignore me... |
@rjernst Thanks for the review, I pushed some more commits.
Writes can indeed be allocation intensive, as every write will allocate between 4 and 19 (small) objects. Your idea would work but I am a bit reluctant to do it as it not trivial to implement and doesn't seem necessary for now performance-wise. Maybe this is something we can think about in the future if/when we start having issues? |
Sure, that sounds fine. |
LGTM! |
This commit makes the lookup structures that are used for mappings immutable. When changes are required, a new instance is created while the current instance is left unmodified. This is done efficiently thanks to a hash table implementation based on a array hash trie, see org.elasticsearch.common.collect.CopyOnWriteHashMap. ManyMappingsBenchmark returns indexing times that are similar to the ones that can be observed in current master. Ultimately, I would like to see if we can make mappings completely immutable as well and updated atomically. This is not trivial however, eg. because of dynamic mappings. So here is a first baby step that should help move towards that direction. Close #7486
This commit makes the lookup structures that are used for mappings immutable.
When changes are required, a new instance is created while the current instance
is left unmodified. This is done efficiently thanks to a hash table
implementation based on a array hash trie, see
org.elasticsearch.common.collect.CopyOnWriteHashMap.
ManyMappingsBenchmark returns indexing times that are similar to the ones that
can be observed in current master.
Ultimately, I would like to see if we can make mappings completely immutable as
well and updated atomically. This is not trivial however, eg. because of dynamic
mappings. So here is a first baby step that should help move towards that
direction.