Skip to content

Commit

Permalink
Add Swedish language support
Browse files Browse the repository at this point in the history
  • Loading branch information
Payam Yavari committed Aug 30, 2019
1 parent cbd2a8d commit 0dd5e0e
Show file tree
Hide file tree
Showing 4 changed files with 25 additions and 2 deletions.
1 change: 1 addition & 0 deletions ElasticSearch/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ COPY elasticsearch.yml ./config/elasticsearch.yml
RUN bin/elasticsearch-plugin install http://dl.bintray.com/content/imotov/elasticsearch-plugins/org/elasticsearch/elasticsearch-analysis-morphology/5.6.3/elasticsearch-analysis-morphology-5.6.3.zip
RUN bin/elasticsearch-plugin install analysis-stempel
RUN bin/elasticsearch-plugin install analysis-smartcn
RUN bin/elasticsearch-plugin install analysis-icu

CMD ["elasticsearch"]

Expand Down
1 change: 1 addition & 0 deletions Pipeline/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ RUN apt-get update && apt-get install -y \
tesseract-ocr-spa \
tesseract-ocr-nld \
tesseract-ocr-pol \
tesseract-ocr-swe \
default-jre \
default-jdk \
readpst
Expand Down
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ Ambar defines a new way to implement a full-text document search into yor workfl
* Search By Size (size>1M)
* Search By Tags (tags:ocr)
* Search As You Type
* Supported language analyzers: English `ambar_en`, Russian `ambar_ru`, German `ambar_de`, Italian `ambar_it`, Polish `ambar_pl`, Chinese `ambar_cn`, CJK `ambar_cjk`
* Supported language analyzers: English `ambar_en`, Russian `ambar_ru`, German `ambar_de`, Italian `ambar_it`, Polish `ambar_pl`, Chinese `ambar_cn`, CJK `ambar_cjk`, Swedish `ambar_sv`

### Crawling

Expand All @@ -44,7 +44,7 @@ Crawling is automatic, no schedule is needed since the crawler monitors fs event
* OCR over images
* Email messages with attachments
* Adobe PDF (with OCR)
* OCR languages: Eng, Rus, Ita, Deu, Fra, Spa, Pl, Nld
* OCR languages: Eng, Rus, Ita, Deu, Fra, Spa, Pl, Nld, Swe
* OpenOffice documents
* RTF, Plaintext
* HTML / XHTML
Expand Down
21 changes: 21 additions & 0 deletions ServiceApi/src/services/EsProxy/AmbarFileDataMapping.json
Original file line number Diff line number Diff line change
Expand Up @@ -92,6 +92,18 @@
"german_stemmer": {
"type": "stemmer",
"language": "light_german"
},
"swedish_stemmer": {
"type": "stemmer",
"language": "swedish"
},
"swedish_stop": {
"type": "stop",
"stopwords": "_swedish_"
},
"swedish_icu_folding": {
"type": "icu_folding",
"unicodeSetFilter": "[^åäöÅÄÖ]"
}
},
"analyzer": {
Expand Down Expand Up @@ -164,6 +176,15 @@
"filter": [
"lowercase"
]
},
"ambar_sv": {
"tokenizer": "standard",
"filter": [
"lowercase",
"swedish_icu_folding",
"swedish_stop",
"swedish_stemmer"
]
}
}
}
Expand Down

0 comments on commit 0dd5e0e

Please sign in to comment.