This project was part of my Master's Thesis Project during spring 2023
The system takes an image of a handwritten page document as input and segments and aligns the image to a ground truth. In the case that a ground truth is not available, the algorithm allows for manual transcription of the segmentation. Where the segmentation fails to recognize text, it is possible to correct the boxes during the process. Bayesian optimisation is used for automatically setting reasonable parameters. The resulting report from the thesis project can be found at:
http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-506000
To be able to run the pipeline in its entirety, see requirements.txt
for required packages.
- Clone this repository by using:
> git clone https://github.com/PhilipMacCormack/Text_alignment_and_segmentation
- cd into
Text_alignment_and_segmentation
: - Install packages from
requirements.txt
- Open
main.py
- Input parameters in the script
- Run
main.py
> python main.py
- Follow the procedure from the terminal until finish
The output from the algorithm can be found in Text_alignment_and_segmentation/Results/{file}
. The output consists of several saved images from the process as well as an xml file containing the final alignment of the document. Individual line, and word images from the segmentation can also be found, in results/{file}/lines
.
This algorithm is partly based upon work from: