Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error cannot initialize Tokenizer: /usr/local/share/tokenizer/dicts #106

Closed
nntruong02069999 opened this issue Jun 30, 2021 · 2 comments
Closed

Comments

@nntruong02069999
Copy link

Em chào anh,
Em run elasticsearch với docker , đã cài coccoc-tokenizer và test command được rồi ạ
Đây là lỗi khi em run images lên :

Cannot open file for reading /usr/local/share/tokenizer/dicts/multiterm_trie.dump
es01     | {"type": "server", "timestamp": "2021-06-30T03:08:14,237Z", "level": "WARN", "component": "r.suppressed", "cluster.name": "docker-cluster", "node.name": "7bd45b102d38", "message": "path: /_template/.management-beats, params: {include_type_name=true, name=.management-beats}", "cluster.uuid": "3KJAmJtuSSmhztSpxzZpJA", "node.id": "l8P14CGCQmmwoSCvFqyD7A" , 
es01     | "stacktrace": ["java.lang.RuntimeException: Cannot initialize Tokenizer: /usr/local/share/tokenizer/dicts",
es01     | "at com.coccoc.Tokenizer.<init>(Tokenizer.java:44) ~[?:?]",
es01     | "at org.apache.lucene.analysis.vi.VietnameseTokenizerImpl.lambda$new$0(VietnameseTokenizerImpl.java:54) ~[?:?]",
es01     | "at java.security.AccessController.doPrivileged(AccessController.java:312) ~[?:?]",
es01     | "at org.apache.lucene.analysis.vi.VietnameseTokenizerImpl.<init>(VietnameseTokenizerImpl.java:53) ~[?:?]",
es01     | "at org.apache.lucene.analysis.vi.VietnameseTokenizer.<init>(VietnameseTokenizer.java:45) ~[?:?]",
es01     | "at org.apache.lucene.analysis.vi.VietnameseAnalyzer.createComponents(VietnameseAnalyzer.java:88) ~[?:?]",
es01     | "at org.apache.lucene.analysis.AnalyzerWrapper.createComponents(AnalyzerWrapper.java:136) ~[lucene-core-8.8.0.jar:8.8.0 b10659f0fc18b58b90929cfdadde94544d202c4a - noble - 2021-01-25 19:07:45]",
es01     | "at org.apache.lucene.analysis.Analyzer.tokenStream(Analyzer.java:199) ~[lucene-core-8.8.0.jar:8.8.0 b10659f0fc18b58b90929cfdadde94544d202c4a - noble - 2021-01-25 19:07:45]",
es01     | "at org.elasticsearch.index.analysis.AnalysisRegistry.checkVersions(AnalysisRegistry.java:637) ~[elasticsearch-7.12.1.jar:7.12.1]",
es01     | "at org.elasticsearch.index.analysis.AnalysisRegistry.produceAnalyzer(AnalysisRegistry.java:601) ~[elasticsearch-7.12.1.jar:7.12.1]",
es01     | "at org.elasticsearch.index.analysis.AnalysisRegistry.build(AnalysisRegistry.java:520) ~[elasticsearch-7.12.1.jar:7.12.1]",
es01     | "at org.elasticsearch.index.analysis.AnalysisRegistry.build(AnalysisRegistry.java:207) ~[elasticsearch-7.12.1.jar:7.12.1]",
es01     | "at org.elasticsearch.index.IndexModule.newIndexService(IndexModule.java:431) ~[elasticsearch-7.12.1.jar:7.12.1]",
es01     | "at org.elasticsearch.indices.IndicesService.createIndexService(IndicesService.java:663) ~[elasticsearch-7.12.1.jar:7.12.1]",
es01     | "at org.elasticsearch.indices.IndicesService.createIndex(IndicesService.java:566) ~[elasticsearch-7.12.1.jar:7.12.1]",
es01     | "at org.elasticsearch.cluster.metadata.MetadataIndexTemplateService.validateTemplate(MetadataIndexTemplateService.java:1199) ~[elasticsearch-7.12.1.jar:7.12.1]",
es01     | "at org.elasticsearch.cluster.metadata.MetadataIndexTemplateService.access$300(MetadataIndexTemplateService.java:80) ~[elasticsearch-7.12.1.jar:7.12.1]",
es01     | "at org.elasticsearch.cluster.metadata.MetadataIndexTemplateService$6.execute(MetadataIndexTemplateService.java:714) ~[elasticsearch-7.12.1.jar:7.12.1]",
es01     | "at org.elasticsearch.cluster.ClusterStateUpdateTask.execute(ClusterStateUpdateTask.java:48) ~[elasticsearch-7.12.1.jar:7.12.1]",
es01     | "at org.elasticsearch.cluster.service.MasterService.executeTasks(MasterService.java:691) ~[elasticsearch-7.12.1.jar:7.12.1]",
es01     | "at org.elasticsearch.cluster.service.MasterService.calculateTaskOutputs(MasterService.java:313) ~[elasticsearch-7.12.1.jar:7.12.1]",
es01     | "at org.elasticsearch.cluster.service.MasterService.runTasks(MasterService.java:208) [elasticsearch-7.12.1.jar:7.12.1]",
es01     | "at org.elasticsearch.cluster.service.MasterService.access$000(MasterService.java:62) [elasticsearch-7.12.1.jar:7.12.1]",
es01     | "at org.elasticsearch.cluster.service.MasterService$Batcher.run(MasterService.java:140) [elasticsearch-7.12.1.jar:7.12.1]",
es01     | "at org.elasticsearch.cluster.service.TaskBatcher.runIfNotProcessed(TaskBatcher.java:139) [elasticsearch-7.12.1.jar:7.12.1]",
es01     | "at org.elasticsearch.cluster.service.TaskBatcher$BatchedTask.run(TaskBatcher.java:177) [elasticsearch-7.12.1.jar:7.12.1]",
es01     | "at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:673) [elasticsearch-7.12.1.jar:7.12.1]",
es01     | "at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:241) [elasticsearch-7.12.1.jar:7.12.1]",
es01     | "at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:204) [elasticsearch-7.12.1.jar:7.12.1]",
es01     | "at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130) [?:?]",
es01     | "at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630) [?:?]",
es01     | "at java.lang.Thread.run(Thread.java:831) [?:?]"] }

Docker file em build image ạ

#Dockerfile
FROM elasticsearch:7.12.1

COPY elasticsearch-analysis-vietnamese-7.12.1.zip /usr/share/elasticsearch/

COPY libcoccoc_tokenizer_jni.so /usr/lib64

RUN cd /usr/share/elasticsearch && \
    bin/elasticsearch-plugin install file:///usr/share/elasticsearch/elasticsearch-analysis-vietnamese-7.12.1.zip && \
    bin/elasticsearch-plugin install analysis-icu
    
@beantoan
Copy link

Copy the following files into directory /usr/local/share/tokenizer/dicts of container:

acronyms alphabetic chemical_comp d_and_gi.txt Freq2NontoneUniFile i_and_y.txt keyword.freq multiterm_trie.dump nontone_pair_freq nontone_pair_freq_map.dump numeric special_token.strong special_token.weak syllable_trie.dump vndic_multiterm

@beantoan
Copy link

There are some redundant files but I am not sure which ones.

@duydo duydo closed this as completed Nov 30, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants