Skip to content
🔍 Ambar: Document Search Engine
Branch: master
Clone or download
fpd4444 Merge pull request #209 from temberature/master
feat: add OCR language Chi-sim.
Latest commit ecb0294 Dec 27, 2018
Type Name Latest commit message Commit time
Failed to load latest commit information.
.github Added bot to close old issues Sep 10, 2018
.vscode Push draft of 2.0.0rc Apr 18, 2018
LocalCrawler Version 2.1.18 Sep 17, 2018
MongoDB Push draft of 2.0.0rc Apr 18, 2018
ServiceApi Version 2.1.18 Sep 17, 2018
WebApi Version 2.1.18 Sep 17, 2018
.gitignore Push draft of 2.0.0rc Apr 18, 2018 Release v2.1.8 Sponsored by IFIC May 16, 2018
docker-compose.yml 2.0.0 rc2 Apr 18, 2018

Version License Blog

🔍 Ambar: Document Search Engine

Ambar Search

Ambar is an open-source document search engine with automated crawling, OCR, tagging and instant full-text search.

Ambar defines the new way to implement a full-text document search into yor workflow:

  • Easily deploy Ambar with a single docker-compose file
  • Perform a Google-like search through your documents and images contents
  • Ambar supports all popular document formats, performs OCR if needed
  • Tag your documents
  • Use a simple REST Api to integrate Ambar into your workflow



Tutorial: Mastering Ambar Search Queries

  • Fuzzy Search (John~3)
  • Phrase Search ("John Smith")
  • Search By Author (author:John)
  • Search By File Path (filename:*.txt)
  • Search By Date (when: yesterday, today, lastweek, etc)
  • Search By Size (size>1M)
  • Search By Tags (tags:ocr)
  • Search As You Type
  • Supported language analyzers: English ambar_en, Russian ambar_ru, German ambar_de, Italian ambar_it, Polish ambar_pl, Chinese ambar_cn, CJK ambar_cjk


Ambar 2.0 only supports local fs crawling, if you need to crawl an SMB share of an FTP location - just mount it using standard linux tools. Crawling is automatic, no schedule is needed since the crawler monitors fs events and automatically processes new files.

Content Extraction

  • Ambar supports large files (>30MB)
  • ZIP archives
  • Mail archives (PST)
  • MS Office documents (Word, Excel, Powerpoint, Visio, Publisher)
  • OCR over images
  • Email messages with attachments
  • Adobe PDF (with OCR)
  • OCR languages: Eng, Rus, Ita, Deu, Fra, Spa, Pl, Nld
  • OpenOffice documents
  • RTF, Plaintext
  • Multithread processing


Notice: Ambar requires Docker to run, it can't run w/o Docker

Just follow the installation instruction

Docker images can be found on Docker Hub


Ambar is fully open-source and free to use, however you can get a dedicated support from our team for a fee:

  • Install & Configure Ambar on your machine - 999$
  • Mount external data source - 99$
  • Add automatic tagging rule - 299$
  • Add password protection to Ambar UI - 299$
  • Add custom file extractor - 599$
  • Dedicated support - 199$/hour
  • Custom features development - 299$/hour


Is it open-source?

Yes, it's fully open-source now.

Is it free?

Yes, it is forever free.

Does it perform OCR?

Yes, it performs OCR on images (jpg, tiff, bmp, etc) and PDF's. OCR is perfomed by well-known open-source library Tesseract. We tuned it to achieve best perfomance and quality on scanned documents. You can easily find all files on which OCR was perfomed with tags:ocr query

Which languages are supported for OCR?

Supported languages: Eng, Rus, Ita, Deu, Fra, Spa, Pl, Nld. If you miss your language please contact us on

Does it support tagging?


What about searching in PDF?

Yes, it can search through any PDF, even badly encoded or with scans inside. We did our best to make search over any kind of pdf document smooth.

What is the maximum file size it can handle?

It's limited by amount of RAM on your machine, typically it's 500MB. It's an awesome result, as typical document managment systems offer 30MB maximum file size to be processed.

I have a problem what should I do?

Request a dedicated support session by mailing us on


Change Log

Change Log

Privacy Policy

Privacy Policy


MIT License

You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.