-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Allows to use custom analysers in ES or Solr #223
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Committer of one or more commits is not listed as a CLA signer, either individual or as a member of an organization. |
|
It doesn't look like support has been added for setting the custom analyzer, just leveraging it in queries. I think you could support setting the custom analyzer by adding a new "analyzer" parameter to ParameterType.java and then updating ElasticSearchIndex.java#register to support same (see example). If nothing else this would allow a cleaner test implementation. Though it would also probably require some documentation updates to show an example of setting the custom analyzer. But I think adding this would make a more complete implementation. What do you think? |
|
You are right, this PR only allow to leveraging custom analyser in queries for Equals, NotEquals and Contains predicates. But it a first step. What do you think ? |
|
The idea of supporting an external mapping is interesting, I'll have to look over #233. Regarding this PR can the custom analyzer be set during initial mapping creation? If so I think there'd still be value in supporting this using the existing machinery within JanusGraph (e.g. ParameterType through ElasticSearchIndex.java#register). For one I think this would make the feature more accessible for user's who don't want to go deeper into ES mapping details. Also we could eventually look at adding corresponding support to the other indexing backends. I don't think updating the analyzer needs to be supported (same is already true of field type, etc. defined in ElasticSearchIndex.java#register). But can you explain more on why this would require a separate index instead of type for each property? I'm just seeing something like the following added to the String analyzer = (String) ParameterType.ANALYZER.findParameter(information.getParameters(), null);
if (analyzer != null) mapping.field("analyzer", analyzer); |
|
To use custom analyzer like https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-custom-analyzer.html, you need to declare it in index settings. |
|
How about just adding support for setting the built-in analyzer to use (e.g. |
ddf240b to
2fc3dd0
Compare
|
I do the change. I can implement it in ES and Solr (on Lucene is not possible). |
2fc3dd0 to
e07d174
Compare
sjudeng
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for addressing the questions. I think this is much better. Some more minor feedback is below.
docs/elasticsearch.txt
Outdated
|
|
||
| * Please refer to the https://www.elastic.co[Elasticsearch homepage] and available documentation for more information on Elasticsearch and how to setup an Elasticsearch cluster. | ||
|
|
||
| === Custom Analyzer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't include this section here since you already provide this in the textsearch page below and this doesn't seem to fit here. Instead how about adding a bullet "Analyzer" or "Text Analyzer" or something, with a sentence or two description, to the list at the top of this page ("Full Text", "Geo", etc.)?
docs/solr.txt
Outdated
| /////////// | ||
|
|
||
|
|
||
| ==== Custom Analyzer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as above ... consider adding a bullet to list at the top instead of this section here.
docs/textsearch.txt
Outdated
|
|
||
| ==== Custom Analyser | ||
|
|
||
| By default, JanusGraph will use the default analyzer from the indexing backend for properties with mapping.TEXT, and no analyzer for properties with mapping.STRING. If one wants to use another analyzer, it can be explicitly specified through a parameter : ParameterType.TEXT_ANALYZER for mapping.TEXT and parameterType.STRING_ANALYZER for mapping.STRING. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Capitalization fixes: Mapping.STRING, Mapping.TEXT, ParameterType.STRING_ANALYZER, etc.
|
|
||
| ===== For Elasticsearch | ||
|
|
||
| The name of the analyzer must be set as parameter value. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Include a "more information" link to ES documentation on analyzers?
| "which typically only happens the first time JanusGraph is started on top of ES. If the index JanusGraph is " + | ||
| "configured to use already exists, then this setting has no effect.", ConfigOption.Type.MASKABLE, 200L); | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove whitespace
| try { | ||
| ((Constructor<Tokenizer>) ClassLoader.getSystemClassLoader().loadClass(analyzer) | ||
| .getConstructor()).newInstance(); | ||
| } catch (InstantiationException | IllegalAccessException | IllegalArgumentException |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Simplify by using base ReflectiveOperationException (e.g. catch (ReflectiveOperationException | IllegalArgumentException | SecurityException e))?
| try { | ||
| ((Constructor<Tokenizer>) ClassLoader.getSystemClassLoader().loadClass(analyzer) | ||
| .getConstructor()).newInstance(); | ||
| } catch (InstantiationException | IllegalAccessException | IllegalArgumentException |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
catch (ReflectiveOperationException | IllegalArgumentException | SecurityException e)
| terms.add(termAtt.getBytesRef().utf8ToString()); | ||
| } | ||
| return terms; | ||
| } catch (InstantiationException | IllegalAccessException | IllegalArgumentException | InvocationTargetException |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
catch (ReflectiveOperationException | IllegalArgumentException | SecurityException e)
| } | ||
| } | ||
| //Since all data types must be defined in the schema.xml, pre-registering a type does not work | ||
| //But we check Analyse feature |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this check here necessary since it looks like errors would be handled in your customTokenize method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To fail during the property registration and not during the search. So the index will not have property with bad configuration.
I think if the configuration is wrong, JanusGraph should fail as soon as possible.
| put(TEXT,new StandardKeyInformation(String.class, Cardinality.SINGLE, new Parameter("mapping", | ||
| indexFeatures.supportsStringMapping(Mapping.TEXT)?Mapping.TEXT:Mapping.TEXTSTRING))); | ||
| put(TEXT,new StandardKeyInformation(String.class, Cardinality.SINGLE, | ||
| indexFeatures.supportsStringMapping(Mapping.TEXT)?Mapping.TEXT.asParameter():Mapping.TEXTSTRING.asParameter())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like how you cleaned these statements up. Going a little further how about (indexFeatures.supportsStringMapping(Mapping.TEXT)?Mapping.TEXT:Mapping.TEXTSTRING).asParameter())?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor, is it OK to glue text to ?: ?
3f01044 to
aece87e
Compare
amcp
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor comments
| this.defaultStringMapping = defaultMap; | ||
| this.supportedStringMappings = supportedMap; | ||
| this.wildcardField = wildcardField; | ||
| this.supportedCardinaities = supportedCardinaities; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please fix spelling: supportedCardinalities
| private final ImmutableSet<Mapping> supportedStringMappings; | ||
| private final String wildcardField; | ||
| private final boolean supportsNanoseconds; | ||
| private final boolean supportCustomAnalyser; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
two minor comments here
- please use the same naming convention as supportsNanoseconds
- Elasticsearch spelling here and elsewhere: supportsCustomAnalyzer
| return this; | ||
| } | ||
|
|
||
| public Builder supportCustomAnalyser() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
supportsCustomAnalyzer
| private Set<Cardinality> supportedCardinalities = Sets.newHashSet(); | ||
| private String wildcardField = "*"; | ||
| private boolean supportsNanoseconds; | ||
| private boolean supportCustomAnalyser; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
supportsCustomAnalyzer
| this.wildcardField = wildcardField; | ||
| this.supportedCardinaities = supportedCardinaities; | ||
| this.supportsNanoseconds = supportsNanoseconds; | ||
| this.supportCustomAnalyser = supportCustomAnalyser; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
supportsCustomAnalyzer
| case STRING: | ||
| mapping.field("index","not_analyzed"); | ||
| if (stringAnalyzer != null) { | ||
| mapping.field("analyzer", stringAnalyzer); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use constant
| break; | ||
| case TEXTSTRING: | ||
| if (textAnalyzer != null) { | ||
| mapping.field("analyzer",textAnalyzer); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use constant
| mapping.field("type", "string"); | ||
| mapping.field("index","not_analyzed"); | ||
| if (stringAnalyzer != null) { | ||
| mapping.field("analyzer", stringAnalyzer); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use constant
| if (stringAnalyzer != null) { | ||
| mapping.field("analyzer", stringAnalyzer); | ||
| }else{ | ||
| mapping.field("index","not_analyzed"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
externalize index to a string constant
| if (stringAnalyzer != null) { | ||
| mapping.field("analyzer", stringAnalyzer); | ||
| } else { | ||
| mapping.field("index","not_analyzed"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use constant
aece87e to
8288d7e
Compare
Custom analyzers can be set throw new ParameterType. Signed-off-by: David Clement <david.clement90@laposte.net>
|
@sjudeng checking now |
amcp
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a few more minor changes please
| } | ||
|
|
||
| public Builder supportsCustomAnalyser() { | ||
| supportsCustomAnalyser = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use ES spelling: supportsCustomAnalyzer
| return this; | ||
| } | ||
|
|
||
| public Builder supportsCustomAnalyser() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use ES spelling: supportsCustomAnalyzer
| private Set<Cardinality> supportedCardinalities = Sets.newHashSet(); | ||
| private String wildcardField = "*"; | ||
| private boolean supportsNanoseconds; | ||
| private boolean supportsCustomAnalyser; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use ES spelling: supportsCustomAnalyzer
|
|
||
| private static final IndexFeatures ES_FEATURES = new IndexFeatures.Builder() | ||
| .setDefaultStringMapping(Mapping.TEXT).supportedStringMappings(Mapping.TEXT, Mapping.TEXTSTRING, Mapping.STRING).setWildcardField("_all").supportsCardinality(Cardinality.SINGLE).supportsCardinality(Cardinality.LIST).supportsCardinality(Cardinality.SET).supportsNanoseconds().build(); | ||
| .setDefaultStringMapping(Mapping.TEXT).supportedStringMappings(Mapping.TEXT, Mapping.TEXTSTRING, Mapping.STRING).setWildcardField("_all").supportsCardinality(Cardinality.SINGLE).supportsCardinality(Cardinality.LIST).supportsCardinality(Cardinality.SET).supportsNanoseconds().supportsCustomAnalyser().build(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use ES spelling: supportsCustomAnalyzer
| mapping.field("index","not_analyzed"); | ||
| if (stringAnalyzer != null) { | ||
| mapping.field(ANALYZER, stringAnalyzer); | ||
| }else{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
spacing should be } else {
| //Since all data types must be defined in the schema.xml, pre-registering a type does not work | ||
| //But we check Analyse feature | ||
| String analyzer = (String) ParameterType.STRING_ANALYZER.findParameter(information.getParameters(), null); | ||
| if (analyzer !=null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a space after !=
| } | ||
| if (janusgraphPredicate == Text.PREFIX || janusgraphPredicate == Text.CONTAINS_PREFIX) { | ||
| return tokenize(informations, value, key, janusgraphPredicate, (String) ParameterType.TEXT_ANALYZER.findParameter(informations.get(key).getParameters(), null)); | ||
| }else if (janusgraphPredicate == Text.PREFIX || janusgraphPredicate == Text.CONTAINS_PREFIX) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a space after the {
| String tokenizer = (String) ParameterType.STRING_ANALYZER.findParameter(informations.get(key).getParameters(), null); | ||
| if(tokenizer != null){ | ||
| return tokenize(informations, value, key, janusgraphPredicate,tokenizer); | ||
| }else{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fix spacing please } else {
| List<String> terms; | ||
| if(tokenizer != null){ | ||
| terms = customTokenize(tokenizer, (String) value); | ||
| }else{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fix spacing please `} else {
| return terms; | ||
| } catch ( ReflectiveOperationException | IOException e) { | ||
| throw new IllegalArgumentException(e.getMessage(),e); | ||
| } finally{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a space after finally
|
@sjudeng I found a few more things to fix, so one more round. Thanks. |
| b.must(QueryBuilders.termQuery(fieldName, term)); | ||
| } | ||
| return b; | ||
| if (janusgraphPredicate == Text.CONTAINS ||janusgraphPredicate == Cmp.EQUAL ) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add space after || and remove space before )
| return QueryBuilders.termQuery(fieldName, (String) value); | ||
| } else if (janusgraphPredicate == Cmp.NOT_EQUAL) { | ||
| return QueryBuilders.boolQuery().mustNot(QueryBuilders.termQuery(fieldName, (String) value)); | ||
| return QueryBuilders.boolQuery().mustNot(QueryBuilders.matchQuery(fieldName, value).operator(Operator.AND)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove extra spaces
Signed-off-by: sjudeng <sjudeng@users.noreply.github.com>
8f715d1 to
0f822c7
Compare
|
@davidclement90 I pushed a commit to your branch with the requested code style updates. @amcp Can you check this over again when you have time? I'm working on a separate update to ElasicSearchIndex which this PR is blocking because of conflicts, so I'd like to get it merged if possible. |
|
@sjudeng merged this in, you are good to work on the other PR now. |
…atch Allows to use custom analysers in ES or Solr
…atch Allows to use custom analysers in ES or Solr
Issue : #222
Signed-off-by: David Clement david.clement90@laposte.net