-
Notifications
You must be signed in to change notification settings - Fork 119
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add hOCR functionality #1006
Add hOCR functionality #1006
Conversation
Co-authored-by: Rosie Le Faive <lefaive@gmail.com>
Is this limited to File and jp2 and tiff, wondering if we can use this with Image and jpg as well? Is there any config we have adjust to do that? |
@Natkeeran I think if you create an Original File that is a JPEG, the hOCR derivative task should run. Right now the config just looks for Service File so that would either need to be changed in the manifest view or make sure there is a service file with the same size as original (I don't know if we ever got around to fixing the problems with an item having both Original File and Service File tags) |
Made 2 changes: Added a null check that @seth-shaw-asu found was necessary. Also fixed a problem with the ISLE-DC branch I noted above for testing this PR, it turns out you can't tell composer to check out a branch on GitHub when using the drupal/ namespace. So I just made make hocr_test delete the islandora repo and check it out from git. This was causing an unpredictability of whether tthe testing branch of islandora or an older release tag was checked out. |
Went to test again and it appears to be getting the hOCR: {
"@type": "sc:Manifest",
"@id": "/node/1/book-manifest",
"label": "IIIF Manifest",
"@context": "http://iiif.io/api/presentation/2/context.json",
"sequences": [
{
"@context": "http://iiif.io/api/presentation/2/context.json",
"@id": "https://islandora.traefik.me/node/1/sequence/normal",
"@type": "sc:Sequence",
"canvases": [
{
"@id": "https://islandora.traefik.me/node/1/canvas/4",
"@type": "sc:Canvas",
"label": "7a9230aa-8-Service File.jpg",
"height": 1920,
"width": 1080,
"images": [
{
"@id": "https://islandora.traefik.me/node/1/annotation/4",
"@type": "oa:Annotation",
"motivation": "sc:painting",
"resource": {
"@id": "https://islandora.traefik.me/cantaloupe/iiif/2/https%3A%2F%2Fislandora.traefik.me%2Fsites%2Fdefault%2Ffiles%2F2024-04%2F7a9230aa-8-Service%2520File.jpg/full/full/0/default.jpg",
"@type": "dctypes:Image",
"format": "image/jpeg",
"height": 1920,
"width": 1080,
"service": {
"@id": "https://islandora.traefik.me/cantaloupe/iiif/2/https%3A%2F%2Fislandora.traefik.me%2Fsites%2Fdefault%2Ffiles%2F2024-04%2F7a9230aa-8-Service%2520File.jpg",
"@context": "http://iiif.io/api/image/2/context.json",
"profile": "http://iiif.io/api/image/2/profiles/level2.json"
}
},
"on": "https://islandora.traefik.me/node/1/canvas/4"
}
],
"seeAlso": {
"@id": "https://islandora.traefik.me/sites/default/files/hocr/2024-04/2-hOCR.html",
"format": "text/vnd.hocr+html",
"profile": "http://kba.cloud/hocr-spec",
"label": "hOCR embedded text"
}
},
...
],
"service": [
{
"@context": "http://iiif.io/api/search/0/context.json",
"@id": "https://islandora.traefik.me/paged-content-search/1",
"profile": "http://iiif.io/api/search/0/search",
"label": "Search inside this work"
}
]
} The search icon appears, but none of my searches show any results... Side note: the text does appear in the regular solar search and I can see them when I open the hOCR files themselves, so the text is there, just not the Mirador search. |
Hi @seth-shaw-asu , Thanks for testing. I'll look into why you might be having problems with searching but since this PR is about just getting the hOCR into the manifest,, I am happy to see that working. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GitHub Issue: (link)
Pulls the relevant hOCR functionality out of #983, which is a PR focused on TIFF width/height calculations
What does this Pull Request do?
A brief description of what the intended result of the PR will be and/or what
problem it solves.
What's new?
A in-depth description of the changes made by this PR. Technical details and
possible side effects.
(i.e. Regeneration activity, etc.)?
How should this be tested?
A description of what steps someone could take to:
Documentation Status
Additional Notes:
Any additional information that you think would be helpful when reviewing this
PR.
Interested parties
Tag (@ mention) interested parties or, if unsure, @Islandora/committers