KimbaLtStemmer plugin for ElasticSearch

In order to build KimbaLtStemmer plugin just clone this git repository and initiate maven package build from scratch with this command: mvn clean package.

It's in development stage. (official released version doesn't exist at a time)

Examples:

-----------------------------------------
| Word                 | KimbaLtStemmer |
-----------------------------------------
| taves (singular)     | tav            |
-----------------------------------------
| mūsų (plural)        | mus            |
-----------------------------------------
| namas (singular)     | nam            |
-----------------------------------------
| giedraičiai (plural) | giedraic       |
-----------------------------------------
| geriausias (singular)| geriaus        |
-----------------------------------------
| didysis (singular)   | didys          |
-----------------------------------------

In order to install the latest version of the plugin, simply run:

sudo bin/plugin -url file:elasticsearch-kimba-ltstemmer-0.0.1.zip  -install kimba-ltstemmer

Example usage

Creating index

curl -XPUT http://localhost:9200/test_lt -d '{
  "settings":{
    "analysis":{
      "analyzer":{
        "lt_analyzer":{
          "type":"custom",
          "tokenizer":"standard",
          "filter": ["icu_folding", "stem_lt"]
        }
      },
      "filter": {
        "stem_lt": {
          "type": "KimbaLtStemmer"
        }
      }
    }
  }
}'

Testing analyzer

curl -XGET http://localhost:9200/test_lt/_analyze?analyzer=lt_analyzer&text=Giedraičiai&pretty

And you should get:

tokens: [{
    token: giedraic
    start_offset: 0
    end_offset: 11
    type: <ALPHANUM>
    position: 1
}]

Deleting test index

curl -XDELETE http://localhost:9200/test_lt

Example usage YML configuration

index:
  analysis:
    analyzer:
      lt_analyzer:
        filter: icu_folding, stem_lt
    filter:
      stem_lt:
        type: KimbaLtStemmer

Warning

Input is expected to to be casefolded for Lithuanian, and with diacritics removed. This can be achieved with ICU_FOLDING.

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
bin		bin
src		src
.gitignore		.gitignore
README.md		README.md
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

KimbaLtStemmer plugin for ElasticSearch

Example usage

Creating index

Testing analyzer

Deleting test index

Example usage YML configuration

Warning

About

Releases

Packages

Languages

RobertasV/elasticsearch-kimba-ltstemmer

Folders and files

Latest commit

History

Repository files navigation

KimbaLtStemmer plugin for ElasticSearch

Example usage

Creating index

Testing analyzer

Deleting test index

Example usage YML configuration

Warning

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages