Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance for many new fields introduction in mapping #6707

Closed

Conversation

kimchy
Copy link
Member

@kimchy kimchy commented Jul 3, 2014

When we have many new fields keep being introduced, the immutable open map we used becomes more and more expensive because of its clone characteristics, and we use it in several places.

The usage semantics of it allows us to also use a CHM if we want to, but it would be nice to still maintain the concurrency aspects of volatile immutable map when the number of fields is sane.

Introduce a new map like data structure, that can switch internally to CHM when a certain threshold is met.

Also add a benchmark class to exploit the many new field mappings use case, which shows significant gains by using this change, to a level where mapping introduction is no longer a significant bottleneck.

@jpountz
Copy link
Contributor

jpountz commented Jul 3, 2014

I'm not sure how much I like introducing a mutable map here. Maybe we should also consider eg. using a persistent hash array mapped trie (although I never used one myself so I'm unsure about the performance impact) or batching updates to the immutable map by having a wrapper around a big immutable map and a smaller one that only carries the last updates?

@s1monw
Copy link
Contributor

s1monw commented Jul 3, 2014

I am leaning towards what @jpountz said though. I think we should be careful with this mutability

@kimchy
Copy link
Member Author

kimchy commented Jul 4, 2014

This change doesn't use ImmutableOpenMap where it was used for concurrency story (copy on write), it is still a mutable behavior on the class itself. I looked into doing paginated immutable open map, but the performance was not good because of the cloning (and maintaining the pages). At the end, with many fields, CHM is the best data structure when there are many fields.

I like the idea that for the common case, with not many fields, we gain the concurrency story of copy on write. I like this change since it will fail if we misuse the data structure in mutating it concurrently, a protection we didn't have before.

I have added the ability to externally set using settings the switch size, mainly for testings, so now its fully randomized so we test the switch case also when we have small number of fields.

immutableMap = null;
}

public void finish() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of having this custom finish method, should it implement Releasable? This would also allow for using the try-with syntax

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make sense, will add!

@jpountz
Copy link
Contributor

jpountz commented Jul 4, 2014

I left a couple of comments but in general this looks good to me!

When we have many new fields keep being introduced, the immutable open map we used becomes more and more expensive because of its clone characteristics, and we use it in several places.

The usage semantics of it allows us to also use a CHM if we want to, but it would be nice to still maintain the concurrency aspects of volatile immutable map when the number of fields is sane.

Introduce a new map like data structure, that can switch internally to CHM when a certain threshold is met.

Also add a benchmark class to exploit the many new field mappings use case, which shows significant gains by using this change, to a level where mapping introduction is no longer a significant bottleneck.
closes elastic#6707
@kimchy
Copy link
Member Author

kimchy commented Jul 4, 2014

@jpountz thanks!, applied the changes (had to force push, too many small ones accumulated)

@jpountz
Copy link
Contributor

jpountz commented Jul 4, 2014

Just left a comment about the implementation of values(). Could you also move the removal of the calls to intern() to another PR?

@kimchy
Copy link
Member Author

kimchy commented Jul 5, 2014

@jpountz didn't see the note on values since my last push, is it still relevant? I have removed the call to intern already....

@jpountz
Copy link
Contributor

jpountz commented Jul 5, 2014

That's weird, I couldn't see the changes yesterday but now I can... LGTM

@@ -181,7 +182,7 @@ public Boolean paramAsBooleanOptional(String key, Boolean defaultValue) {

private ImmutableMap<String, FieldMappingMetaData> findFieldMappingsByType(DocumentMapper documentMapper, GetFieldMappingsIndexRequest request) throws ElasticsearchException {
MapBuilder<String, FieldMappingMetaData> fieldMappings = new MapBuilder<>();
ImmutableList<FieldMapper> allFieldMappers = documentMapper.mappers().mappers();
List<FieldMapper> allFieldMappers = documentMapper.mappers().mappers();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this be final?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will do


public DocumentFieldMappers(DocumentMapper docMapper) {
public DocumentFieldMappers(@Nullable @IndexSettings Settings settings, DocumentMapper docMapper) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the @Nullable good for?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thats the existing contract of settings during parsing mapping, we might get null settings (mainly coming from unit tests if memory serves). didn't want to loose that semantic, and if we make it non nullable, we can remove it, but that would be in a different change.

@s1monw
Copy link
Contributor

s1monw commented Jul 5, 2014

I left a bunch of comments. I like the improvement though.

@kimchy
Copy link
Member Author

kimchy commented Jul 5, 2014

@s1monw applied your comments, ready for another round

@s1monw
Copy link
Contributor

s1monw commented Jul 5, 2014

LGTM

@kimchy kimchy closed this in c8e5530 Jul 5, 2014
kimchy added a commit that referenced this pull request Jul 5, 2014
When we have many new fields keep being introduced, the immutable open map we used becomes more and more expensive because of its clone characteristics, and we use it in several places.

The usage semantics of it allows us to also use a CHM if we want to, but it would be nice to still maintain the concurrency aspects of volatile immutable map when the number of fields is sane.

Introduce a new map like data structure, that can switch internally to CHM when a certain threshold is met.

Also add a benchmark class to exploit the many new field mappings use case, which shows significant gains by using this change, to a level where mapping introduction is no longer a significant bottleneck.
closes #6707
@kimchy kimchy deleted the effectient_list_copy_on_new_mappings branch July 5, 2014 15:40
@jpountz jpountz removed the review label Jul 16, 2014
@clintongormley clintongormley changed the title Improve performance for many new fields introduction in mapping Mapping: Improve performance for many new fields introduction in mapping Jul 16, 2014
@clintongormley clintongormley added the :Search/Mapping Index mappings, including merging and defining field types label Jun 7, 2015
@clintongormley clintongormley changed the title Mapping: Improve performance for many new fields introduction in mapping Improve performance for many new fields introduction in mapping Jun 7, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>enhancement :Search/Mapping Index mappings, including merging and defining field types v1.3.0 v2.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants