-
Notifications
You must be signed in to change notification settings - Fork 24.2k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
More resource efficient analysis wrapping usage
Today, we take great care to try and share the same analyzer instances across shards and indices (global analyzer). The idea is to share the same analyzer so the thread local resource it has will not be allocated per analyzer instance per thread. The problem is that AnalyzerWrapper keeps its resources on its own per thread storage, and with per field reuse strategy, it causes for per field per thread token stream components to be used. This is very evident with the StandardTokenizer that uses a buffer... This came out of test with "many fields", where the majority of 1GB heap was consumed by StandardTokenizer instances... closes #6714
- Loading branch information
Showing
5 changed files
with
73 additions
and
104 deletions.
There are no files selected for viewing
77 changes: 0 additions & 77 deletions
77
src/main/java/org/apache/lucene/analysis/CustomAnalyzerWrapper.java
This file was deleted.
Oops, something went wrong.
66 changes: 66 additions & 0 deletions
66
src/main/java/org/apache/lucene/analysis/SimpleAnalyzerWrapper.java
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,66 @@ | ||
/* | ||
* Licensed to Elasticsearch under one or more contributor | ||
* license agreements. See the NOTICE file distributed with | ||
* this work for additional information regarding copyright | ||
* ownership. Elasticsearch licenses this file to you under | ||
* the Apache License, Version 2.0 (the "License"); you may | ||
* not use this file except in compliance with the License. | ||
* You may obtain a copy of the License at | ||
* | ||
* http://www.apache.org/licenses/LICENSE-2.0 | ||
* | ||
* Unless required by applicable law or agreed to in writing, | ||
* software distributed under the License is distributed on an | ||
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
* KIND, either express or implied. See the License for the | ||
* specific language governing permissions and limitations | ||
* under the License. | ||
*/ | ||
|
||
package org.apache.lucene.analysis; | ||
|
||
import java.io.Reader; | ||
|
||
/** | ||
* A simple analyzer wrapper, that doesn't allow to wrap components or reader. By disallowing | ||
* it, it means that the thread local resources will be delegated to the wrapped analyzer, and not | ||
* also be allocated on this analyzer. | ||
* | ||
* This solves the problem of per field analyzer wrapper, where it also maintains a thread local | ||
* per field token stream components, while it can safely delegate those and not also hold these | ||
* data structures, which can become expensive memory wise. | ||
*/ | ||
public abstract class SimpleAnalyzerWrapper extends AnalyzerWrapper { | ||
|
||
public SimpleAnalyzerWrapper() { | ||
super(new DelegatingReuseStrategy()); | ||
((DelegatingReuseStrategy) getReuseStrategy()).wrapper = this; | ||
} | ||
|
||
@Override | ||
protected final TokenStreamComponents wrapComponents(String fieldName, TokenStreamComponents components) { | ||
return super.wrapComponents(fieldName, components); | ||
} | ||
|
||
@Override | ||
protected final Reader wrapReader(String fieldName, Reader reader) { | ||
return super.wrapReader(fieldName, reader); | ||
} | ||
|
||
private static class DelegatingReuseStrategy extends ReuseStrategy { | ||
|
||
AnalyzerWrapper wrapper; | ||
|
||
@Override | ||
public TokenStreamComponents getReusableComponents(Analyzer analyzer, String fieldName) { | ||
Analyzer wrappedAnalyzer = wrapper.getWrappedAnalyzer(fieldName); | ||
return wrappedAnalyzer.getReuseStrategy().getReusableComponents(wrappedAnalyzer, fieldName); | ||
} | ||
|
||
@Override | ||
public void setReusableComponents(Analyzer analyzer, String fieldName, TokenStreamComponents components) { | ||
Analyzer wrappedAnalyzer = wrapper.getWrappedAnalyzer(fieldName); | ||
wrappedAnalyzer.getReuseStrategy().setReusableComponents(wrappedAnalyzer, fieldName, components); | ||
} | ||
} | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters