Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transcription use case: displaying positional information #17

Open
tristanr-cogapp opened this issue Jul 21, 2015 · 1 comment
Open

Comments

@tristanr-cogapp
Copy link

In many cases for typewritten documents, transcriptions are obtained via OCR and are available in a format that contains positional information along with the extracted text (the format I am most familiar with for this is ALTO).

It would be good to be able to support this or similar formats in order to show transcriptions laid out in a similar manner to the source document (e.g. these screenshots from an item from the Qatar Digital Library at http://www.qdl.qa/en/archive/81055/vdc_100023722174.0x00000b#transcription after clicking the 'Apply Page Layout' button in the Transcription section, compared to the original image):

screen shot 2015-07-21 at 12 55 08
screen shot 2015-07-21 at 12 55 27

The other way in which this positional information could be applied would be to provide the transcription as an overlay, or for search term highlighting. E.g. in this screenshot from the Wellcome Player for http://wellcomelibrary.org/player/b18024130#?asi=0&ai=1&z=0.0588%2C0.5962%2C0.944%2C0.5102 where I have searched for the word "Pacific":

screen shot 2015-07-21 at 12 58 15

@edsu
Copy link

edsu commented Jul 21, 2015

Perhaps this should be a separate requirement, but it would be super nice if there was an option and ability to make sure the text of the transcription is made available somehow to search engine bots that execute JavaScript.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants