Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Elasticsearch fails to find Hunspell dictionaries #34761

Closed
Bherzet opened this issue Oct 23, 2018 · 6 comments
Closed

Elasticsearch fails to find Hunspell dictionaries #34761

Bherzet opened this issue Oct 23, 2018 · 6 comments
Labels
:Search/Analysis How text is split into tokens

Comments

@Bherzet
Copy link

Bherzet commented Oct 23, 2018

Hi.

I'm migrating some codebase from 2.3.2 to 6.4.2. When running tests, we used to start Elasticsearch as follows:

/usr/share/elasticsearch/bin/elasticsearch -d \
    --cluster.name=test-$2 \
    --path.conf=`pwd`/$1/config \
    --path.data=$1/data \
    --path.logs=$1/logs \
    --http.port=$2 \
    --transport.tcp.port=$3

I changed this to:

ES_TMPDIR=`pwd`/$1/tmp \
ES_PATH_CONF=`pwd`/$1/config \
/usr/share/elasticsearch/bin/elasticsearch -d \
    -Ecluster.name=test-$2 \
    -Epath.data=`pwd`/$1/data \
    -Epath.logs=`pwd`/$1/logs \
    -Ehttp.port=$2 \
    -Etransport.tcp.port=$3

(where $1 is, in this case, tmp-elasticsearch)

Elasticsearch starts as expected, but it fails to find a Hunspell dictionary located under tmp-elasticsearch/config, such as shown here:

tmp-elasticsearch/
├── config
│   └── hunspell
│       ├── cs_CZ
│       │   ├── cs_CZ.aff
│       │   └── cs_CZ.dic
│       ├── de_DE
│       │   ├── de_DE.aff
│       │   └── de_DE.dic
│       ├── en_US
│       │   ├── en_US.aff
│       │   └── en_US.dic
│       ├── pl_PL
│       │   ├── pl_PL.aff
│       │   └── pl_PL.dic
│       └── sk_SK
│           ├── sk_SK.aff
│           └── sk_SK.dic

When attempting to create a mapping that incorporates a Hunspell filter, the following exception is yielded:

java.lang.IllegalStateException: failed to load hunspell dictionary for locale: pl_PL
        at org.elasticsearch.indices.analysis.HunspellService.lambda$new$0(HunspellService.java:101) ~[elasticsearch-6.4.2.jar:6.4.2]
        at java.util.concurrent.ConcurrentHashMap.computeIfAbsent(ConcurrentHashMap.java:1660) ~[?:1.8.0_171]
        at org.elasticsearch.indices.analysis.HunspellService.getDictionary(HunspellService.java:118) ~[elasticsearch-6.4.2.jar:6.4.2]
        at org.elasticsearch.index.analysis.HunspellTokenFilterFactory.<init>(HunspellTokenFilterFactory.java:44) ~[elasticsearch-6.4.2.jar:6.4.2]
        at org.elasticsearch.indices.analysis.AnalysisModule.lambda$setupTokenFilters$0(AnalysisModule.java:121) ~[elasticsearch-6.4.2.jar:6.4.2]
        at org.elasticsearch.plugins.AnalysisPlugin$1.get(AnalysisPlugin.java:148) ~[elasticsearch-6.4.2.jar:6.4.2]
        at org.elasticsearch.index.analysis.AnalysisRegistry.buildMapping(AnalysisRegistry.java:358) ~[elasticsearch-6.4.2.jar:6.4.2]
        at org.elasticsearch.index.analysis.AnalysisRegistry.buildTokenFilterFactories(AnalysisRegistry.java:177) ~[elasticsearch-6.4.2.jar:6.4.2]
        at org.elasticsearch.index.analysis.AnalysisRegistry.build(AnalysisRegistry.java:159) ~[elasticsearch-6.4.2.jar:6.4.2]
        at org.elasticsearch.index.IndexService.<init>(IndexService.java:162) ~[elasticsearch-6.4.2.jar:6.4.2]
        at org.elasticsearch.index.IndexModule.newIndexService(IndexModule.java:383) ~[elasticsearch-6.4.2.jar:6.4.2]
        at org.elasticsearch.indices.IndicesService.createIndexService(IndicesService.java:475) ~[elasticsearch-6.4.2.jar:6.4.2]
        at org.elasticsearch.indices.IndicesService.createIndex(IndicesService.java:429) ~[elasticsearch-6.4.2.jar:6.4.2]
        at org.elasticsearch.cluster.metadata.MetaDataCreateIndexService$IndexCreationTask.execute(MetaDataCreateIndexService.java:456) ~[elasticsearch-6.4.2.jar:6.4.2]
        at org.elasticsearch.cluster.ClusterStateUpdateTask.execute(ClusterStateUpdateTask.java:45) ~[elasticsearch-6.4.2.jar:6.4.2]
        at org.elasticsearch.cluster.service.MasterService.executeTasks(MasterService.java:639) ~[elasticsearch-6.4.2.jar:6.4.2]
        at org.elasticsearch.cluster.service.MasterService.calculateTaskOutputs(MasterService.java:268) ~[elasticsearch-6.4.2.jar:6.4.2]
        at org.elasticsearch.cluster.service.MasterService.runTasks(MasterService.java:198) [elasticsearch-6.4.2.jar:6.4.2]
        at org.elasticsearch.cluster.service.MasterService$Batcher.run(MasterService.java:133) [elasticsearch-6.4.2.jar:6.4.2]
        at org.elasticsearch.cluster.service.TaskBatcher.runIfNotProcessed(TaskBatcher.java:150) [elasticsearch-6.4.2.jar:6.4.2]
        at org.elasticsearch.cluster.service.TaskBatcher$BatchedTask.run(TaskBatcher.java:188) [elasticsearch-6.4.2.jar:6.4.2]
        at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:624) [elasticsearch-6.4.2.jar:6.4.2]
        at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:244) [elasticsearch-6.4.2.jar:6.4.2]
        at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:207) [elasticsearch-6.4.2.jar:6.4.2]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_171]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_171]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_171]
Caused by: org.elasticsearch.ElasticsearchException: Could not find hunspell dictionary [pl_PL]
        at org.elasticsearch.indices.analysis.HunspellService.loadDictionary(HunspellService.java:168) ~[elasticsearch-6.4.2.jar:6.4.2]
        at org.elasticsearch.indices.analysis.HunspellService.lambda$new$0(HunspellService.java:99) ~[elasticsearch-6.4.2.jar:6.4.2]
        ... 26 more

The problem is that I don't have any reasonable way of knowing where exactly is Elasticsearch looking for the Hunspell dictionaries and if ES_PATH_CONF does have any effect at all. Since I also have Hunspell dictionaries located in /etc/elasticsearch/hunspell, I tried removing ES_PATH_CONF altoghether, but got the same result. When trying to set ES_PATH_CONF to a nonexistent directory, no other error occurred and Elasticsearch surprisingly started normally as well.

I was a little unsure if environment variables are taken into account at all, but it would seem that at least ES_TMPDIR is (since I checked tmp-elasticsearch/tmp and found some signs of activity there).

This is happening with Elasticsearch 6.4.2 on Debian testing (Linux desktop 4.16.0-1-amd64 #1 SMP Debian 4.16.5-1 (2018-04-29) x86_64 GNU/Linux) with OpenJDK 1.8.0_171.

Thanks for help.

@Bherzet
Copy link
Author

Bherzet commented Oct 23, 2018

OK. I just found out that it is, indeed, looking for Hunspell dictionaries under /etc/elasticsearch/hunspell, but I missed pl_PL there. So the good news is that I have at least some solution.

But the question remains: Why doesn't Elasticsearch look for Hunspell dictionaries under ES_PATH_CONF? Documentation states:

Hunspell dictionaries will be picked up from a dedicated hunspell directory on the filesystem (<path.conf>/hunspell).

What is meant by <path.conf> if not ES_PATH_CONF? For testing purposes, I'd still rather have dictionaries directly under the project root.

@colings86 colings86 added the :Search/Search Search-related issues that do not fall into other categories label Oct 24, 2018
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search-aggs

@colings86 colings86 added :Search/Analysis How text is split into tokens and removed :Search/Search Search-related issues that do not fall into other categories labels Oct 24, 2018
@rjernst
Copy link
Member

rjernst commented Oct 24, 2018

ES_PATH_CONF must point to the entire elasticsearch configuration directory. Do you have your elasticsearch log file? This is almost assuredly an environmental issue, but we should be printing out enough info in the log to find where hunspell is looking.

@Bherzet
Copy link
Author

Bherzet commented Oct 25, 2018

I see. I greped the log file for "/etc/elasticsearch" and found that it's actually being started with -Des.path.conf=/etc/elasticsearch. My apologies for not noticing earlier.

I digged a bit deeper at why this happens. Right at the beginning of /usr/share/elasticsearch/bin/elasticsearch is the following line:

source "`dirname "$0"`"/elasticsearch-env

At /usr/share/elasticsearch/bin/elasticsearch-env:

source /etc/default/elasticsearch

if [ -z "$ES_PATH_CONF" ]; then
  echo "ES_PATH_CONF must be set to the configuration path"
  exit 1
fi

And finally, at /etc/default/elasticsearch:

ES_PATH_CONF=/etc/elasticsearch

I simply tried commenting this line out and now it actually complains about my ES_PATH_CONF not containing all configuration files, such as jvm.options.

So, in case I am not missing anything, ES_PATH_CONF cannot actually be set "from the outside", because it gets overwritten each and every time. The solution (not very clean, but working) would be to only set this variable if not set already, so for example replace the line posted above with something like:

if [ -z "$ES_PATH_CONF" ]; then
    ES_PATH_CONF=/etc/elasticsearch
fi

I would consider this a bug, but it largely depends whether or not you want to allow users to run Elasticsearch in this manner (i.e. specifying configuration directory using env variable).

Although documented (I've to admit I overlooked it), the current behavior doesn't seem very intuitive to me. But that might be a personal bias (I expected ES_PATH_CONF in 6.4.x to be a drop-in replacement for --path.conf in 2.3.x, which it's not).

I believe the file I have in /etc/default/elasticsearch comes from this repository file. Should I make a pull request with this little change?

@rjernst
Copy link
Member

rjernst commented Oct 29, 2018

I believe this is a duplicate of #28619. Also see the discussion in #24867. Closing this in favor of the first issue.

@rjernst rjernst closed this as completed Oct 29, 2018
@Bherzet
Copy link
Author

Bherzet commented Oct 29, 2018

Agreed. Thanks for your time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Search/Analysis How text is split into tokens
Projects
None yet
Development

No branches or pull requests

4 participants