Skip to content

angelbeshirov/DuoSearch

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DuoSearch

A novel search engine for historical newspapers utilizing ElasticSearch and machine learning methods. Code for the paper https://arxiv.org/abs/2305.19392

Purpose

The purpose of this research is to build a proof of concept search engine which addresses the two issues: mistakes in the OCR and orthographic variety within language reforms in Bulgarian from 1850s till 1945.

Scope

This is a PoC version and can be used for collections of digitised historical documents within the same time span. The tool uses dictionaries for Bulgarian but this can be easily adapted for other languages as well.

Target audience

This research would be useful for anyone who is interested in search tools in collections of historical documents/newspapaers containing errors and/or linguistic variance. The target user of the engine is a library in Bulgaria, but can be adapted and used by external users as well.

Architecture

Architecture

About

Search engine for historical documents, which uses ElasticSearch and deep neural networks to address this problem.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published