Skip to content

Check your modified Ground Truth files with visual support!

License

Notifications You must be signed in to change notification settings

UB-Mannheim/GTCheck

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GTCheck

Python 3.6 license

Overview

Check changes in your OCR Ground Truth

If the Ground Truth data is version controlled via git repository, you can use "GTCheck" to validate and commit your modification. Therefore GTCheck will display for any line the original text, the modified version as well as the corresponding image. A virtual keyboard supports character replacements and the transcription of missing text.

Installation

This installation is tested with Ubuntu and we expect that it should work for other similar environments.

1. Requirements

  • Python> 3.6

2. Copy this repository

git clone https://github.com/UB-Mannheim/GTCheck.git
cd GTCheck

3. Installation into a Python Virtual Environment

python3 -m venv venv
source venv/bin/activate
pip install --upgrade pip setuptools
pip install -r requirements.txt
python setup.py install

Process steps

Start the server

gtcheck run-server {parameters..}

Add repo

gtcheck add-repo path/to/git-repo {parameters..}

Start single instance

gtcheck run-single path/to/git-repo {parameters..}

Working with page-xml files

Extract page-xml to gt-linepairs The working directory is either the path to a mets.xml file or the folder with page-xml files. It is possible to pass a mets.xml if it already exists, else a temporary mets-file will be created.

gtcheck extract-page path/to/working-directory {-m path/to/mets.xml}  -I {INPUTGROUPNAME} -O {OUTPUTGROUNAME} {parameters...}

Add to server and check! (see above)

Update page-xml files

gtcheck update-page path/to/repo  -I {./ GROUPNAME} {parameters...}

Setup page

In the first page you can set up your git credentials and select the branch or create a new branch for committing the modifications. Setup page

GTCheck page

In this page you can see and edit the modifications and the original text.

The modifications can be committed (with the commit message), skipped (if not clear what to do), added to the stage mode and later can be committed all at once or can be stashed (keep the original version). Edit page

A virtual keyboard supports character replacements and the transcription of missing text Vkeys

FAQ

TIFF images
Not all browser support tif images.
The workaround atm is installing browser extension.
E.g. Firefox you can find tiff viewer: https://addons.mozilla.org/de/firefox/addon/

UTF-8 Foldername:
git config core.quotepath off

Spellchecking:
This app uses the browser spellchecking.
E.g. Firefox:
https://support.mozilla.org/en-US/kb/how-do-i-use-firefox-spell-checker
https://addons.mozilla.org/de/firefox/language-tools/

Copyright and License

Copyright (c) 2020 Universitätsbibliothek Mannheim

Author:

GTCheck is Free Software. You may use it under the terms of the Apache 2.0 License. See LICENSE for details.

About

Check your modified Ground Truth files with visual support!

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages