New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve performance for many new fields introduction in mapping #6707
Improve performance for many new fields introduction in mapping #6707
Conversation
I'm not sure how much I like introducing a mutable map here. Maybe we should also consider eg. using a persistent hash array mapped trie (although I never used one myself so I'm unsure about the performance impact) or batching updates to the immutable map by having a wrapper around a big immutable map and a smaller one that only carries the last updates? |
I am leaning towards what @jpountz said though. I think we should be careful with this mutability |
This change doesn't use ImmutableOpenMap where it was used for concurrency story (copy on write), it is still a mutable behavior on the class itself. I looked into doing paginated immutable open map, but the performance was not good because of the cloning (and maintaining the pages). At the end, with many fields, CHM is the best data structure when there are many fields. I like the idea that for the common case, with not many fields, we gain the concurrency story of copy on write. I like this change since it will fail if we misuse the data structure in mutating it concurrently, a protection we didn't have before. I have added the ability to externally set using settings the switch size, mainly for testings, so now its fully randomized so we test the switch case also when we have small number of fields. |
immutableMap = null; | ||
} | ||
|
||
public void finish() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of having this custom finish method, should it implement Releasable? This would also allow for using the try-with syntax
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make sense, will add!
I left a couple of comments but in general this looks good to me! |
When we have many new fields keep being introduced, the immutable open map we used becomes more and more expensive because of its clone characteristics, and we use it in several places. The usage semantics of it allows us to also use a CHM if we want to, but it would be nice to still maintain the concurrency aspects of volatile immutable map when the number of fields is sane. Introduce a new map like data structure, that can switch internally to CHM when a certain threshold is met. Also add a benchmark class to exploit the many new field mappings use case, which shows significant gains by using this change, to a level where mapping introduction is no longer a significant bottleneck. closes elastic#6707
@jpountz thanks!, applied the changes (had to force push, too many small ones accumulated) |
Just left a comment about the implementation of |
@jpountz didn't see the note on values since my last push, is it still relevant? I have removed the call to intern already.... |
That's weird, I couldn't see the changes yesterday but now I can... LGTM |
@@ -181,7 +182,7 @@ public Boolean paramAsBooleanOptional(String key, Boolean defaultValue) { | |||
|
|||
private ImmutableMap<String, FieldMappingMetaData> findFieldMappingsByType(DocumentMapper documentMapper, GetFieldMappingsIndexRequest request) throws ElasticsearchException { | |||
MapBuilder<String, FieldMappingMetaData> fieldMappings = new MapBuilder<>(); | |||
ImmutableList<FieldMapper> allFieldMappers = documentMapper.mappers().mappers(); | |||
List<FieldMapper> allFieldMappers = documentMapper.mappers().mappers(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can this be final?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will do
|
||
public DocumentFieldMappers(DocumentMapper docMapper) { | ||
public DocumentFieldMappers(@Nullable @IndexSettings Settings settings, DocumentMapper docMapper) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the @Nullable
good for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thats the existing contract of settings during parsing mapping, we might get null settings (mainly coming from unit tests if memory serves). didn't want to loose that semantic, and if we make it non nullable, we can remove it, but that would be in a different change.
I left a bunch of comments. I like the improvement though. |
@s1monw applied your comments, ready for another round |
LGTM |
When we have many new fields keep being introduced, the immutable open map we used becomes more and more expensive because of its clone characteristics, and we use it in several places. The usage semantics of it allows us to also use a CHM if we want to, but it would be nice to still maintain the concurrency aspects of volatile immutable map when the number of fields is sane. Introduce a new map like data structure, that can switch internally to CHM when a certain threshold is met. Also add a benchmark class to exploit the many new field mappings use case, which shows significant gains by using this change, to a level where mapping introduction is no longer a significant bottleneck. closes #6707
When we have many new fields keep being introduced, the immutable open map we used becomes more and more expensive because of its clone characteristics, and we use it in several places.
The usage semantics of it allows us to also use a CHM if we want to, but it would be nice to still maintain the concurrency aspects of volatile immutable map when the number of fields is sane.
Introduce a new map like data structure, that can switch internally to CHM when a certain threshold is met.
Also add a benchmark class to exploit the many new field mappings use case, which shows significant gains by using this change, to a level where mapping introduction is no longer a significant bottleneck.