Navigation Menu

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make lookup structures immutable. #7486

Closed
wants to merge 8 commits into from

Conversation

jpountz
Copy link
Contributor

@jpountz jpountz commented Aug 27, 2014

This commit makes the lookup structures that are used for mappings immutable.
When changes are required, a new instance is created while the current instance
is left unmodified. This is done efficiently thanks to a hash table
implementation based on a array hash trie, see
org.elasticsearch.common.collect.CopyOnWriteHashMap.

ManyMappingsBenchmark returns indexing times that are similar to the ones that
can be observed in current master.

Ultimately, I would like to see if we can make mappings completely immutable as
well and updated atomically. This is not trivial however, eg. because of dynamic
mappings. So here is a first baby step that should help move towards that
direction.

This commit makes the lookup structures that are used for mappings immutable.
When changes are required, a new instance is created while the current instance
is left unmodified. This is done efficiently thanks to a hash table
implementation based on a array hash trie, see
org.elasticsearch.common.collect.CopyOnWriteHashMap.

ManyMappingsBenchmark returns indexing times that are similar to the ones that
can be observed in current master.

Ultimately, I would like to see if we can make mappings completely immutable as
well and updated atomically. This is not trivial however, eg. because of dynamic
mappings. So here is a first baby step that should help move towards that
direction.
@jpountz jpountz added the review label Aug 27, 2014
/**
* Return a new instance that has a different default analyzer.
*/
public FieldNameAnalyzer setDefaultAnalyzer(Analyzer defaultAnalyzer) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe call this copyWithDefaultAnalyzer or something? set implies it is mutating the existing object. But is this really needed at all? If I have a FieldNameAnalyzer named f, then I can do:
new FieldNameAnalyzer(f.analyzers(), defaultAnalyzer). This method seems to just mask what is actually happening (creating a new FieldNameAnalyzer).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed

@jpountz
Copy link
Contributor Author

jpountz commented Aug 28, 2014

I just pushed a new commit that makes adding/removing mappers to FieldMappersLookup perform in logarithmic time instead of linear.

return cowMap;
} else {
return new CopyOnWriteHashMap<K, V>().copyAndPutAll(map);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not have copyAndPutAll() as just the impl of a ctor which takes a map (maybe private)? I only see one other use of the function, and that looks like it could just call this copyOf method?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to make the API "complete" like java's Map also has put and putAll but agreed that if it's not really used, let's make it private. We will still be able to put it back if we need it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I tried to do it, but it felt weird to have copyAndRemoveAll but not copyAndPutAll so I finally left it and made it used by the set impl.

However, I addressed your comment to use copyOf instead of putAll in ObjectMapper constructor.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, sounds fine.

@rjernst
Copy link
Member

rjernst commented Sep 5, 2014

Looks ok overall. None of the comments I left are critical.

I am, however, concerned about the number of allocations just to construct from a map. Perhaps there is a way to have a constructor that (1) iterates the source map and builds a list of the keys/values (2) sorts based on hash, and (3) build the nodes from the root down, allocating exactly what is needed? If this sounds too crazy just ignore me...

@jpountz
Copy link
Contributor Author

jpountz commented Sep 5, 2014

@rjernst Thanks for the review, I pushed some more commits.

I am, however, concerned about the number of allocations just to construct from a map. Perhaps there is a way to have a constructor that (1) iterates the source map and builds a list of the keys/values (2) sorts based on hash, and (3) build the nodes from the root down, allocating exactly what is needed? If this sounds too crazy just ignore me...

Writes can indeed be allocation intensive, as every write will allocate between 4 and 19 (small) objects. Your idea would work but I am a bit reluctant to do it as it not trivial to implement and doesn't seem necessary for now performance-wise. Maybe this is something we can think about in the future if/when we start having issues?

@rjernst
Copy link
Member

rjernst commented Sep 5, 2014

Maybe this is something we can think about in the future if/when we start having issues?

Sure, that sounds fine.

@rjernst
Copy link
Member

rjernst commented Sep 5, 2014

LGTM!

@jpountz jpountz self-assigned this Oct 2, 2014
@jpountz jpountz closed this in 3b38db1 Oct 2, 2014
jpountz added a commit that referenced this pull request Oct 2, 2014
This commit makes the lookup structures that are used for mappings immutable.
When changes are required, a new instance is created while the current instance
is left unmodified. This is done efficiently thanks to a hash table
implementation based on a array hash trie, see
org.elasticsearch.common.collect.CopyOnWriteHashMap.

ManyMappingsBenchmark returns indexing times that are similar to the ones that
can be observed in current master.

Ultimately, I would like to see if we can make mappings completely immutable as
well and updated atomically. This is not trivial however, eg. because of dynamic
mappings. So here is a first baby step that should help move towards that
direction.

Close #7486
@jpountz jpountz deleted the fix/mappings_cleanup branch October 2, 2014 11:43
@clintongormley clintongormley added the :Search/Mapping Index mappings, including merging and defining field types label Mar 19, 2015
@clintongormley clintongormley changed the title Mappings: Make lookup structures immutable. Make lookup structures immutable. Jun 7, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>enhancement :Search/Mapping Index mappings, including merging and defining field types v1.5.0 v2.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants