Handwriting recognition of Swedish handwritten text

Project in Embedded Systems (15 hp), Uppsala University

Co-author: Christian Böhme

Supervisor: Ping Wu

The code is compatible with TensorFlow2 and Python3.9

A Handwritten Text Recognition (HTR) system implemented in Python with TensorFlow and Keras. The system uses a CRNN model with CTC loss and is trained on the English IAM dataset and a Swedish selfmade dataset. The model takes segmented images as input and outputs the predicted text. For clearly handwritten text 80-90% of each test sentence is correctly recognised and approximately 78% of the words in the validation dataset are correctly recognized by the model.

The ScrabbleGAN program from https://github.com/Nikolai10/scrabble-gan is used for creating our own Swedish dataset. In this project, we have modified random_words.txt, inference.py and the char vector is updated. Swedish words are added to random_words.txt and inference.py is changed to generate Swedish words. The updated char vector includes the letters å, ä, ö, Å, Ä and Ö.

Data preprocessing through image analysis is implemented in MATLAB and all files regarding preprocessing and data augmentation in MATLAB is found in the matlab folder.

Instructions

swe_segmentation.m segments words from an image with handwritten text. Define the images that should be segmented as well as set dil_size (the dilation factor) to a suiting value.
createWords_swe.py creates correct labels for the segmented words. These labels are to be put into words.txt.
Download the IAM dataset, preprocess the images with IAM_preprocessing.m and put them in a folder named words in the data folder.
New data is segmented and preprocessed with samples_preprocessing.m and put in the words folder as well.
main.pyuses the labels in words.txt as well as the preprocessed images in the words folder.

Name		Name	Last commit message	Last commit date
Latest commit History 62 Commits
data		data
matlab		matlab
scrabble-gan		scrabble-gan
src		src
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Handwriting recognition of Swedish handwritten text

Project in Embedded Systems (15 hp), Uppsala University

Co-author: Christian Böhme

Supervisor: Ping Wu

Instructions

About

Releases

Packages

Languages

License

adinapersson/HTR_CRNN

Folders and files

Latest commit

History

Repository files navigation

Handwriting recognition of Swedish handwritten text

Project in Embedded Systems (15 hp), Uppsala University

Co-author: Christian Böhme

Supervisor: Ping Wu

Instructions

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages