Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle JP2 files natively when performing OCR #31

Closed
wants to merge 3 commits into from
Closed

Handle JP2 files natively when performing OCR #31

wants to merge 3 commits into from

Conversation

dltj
Copy link

@dltj dltj commented Nov 27, 2013

dltj added 3 commits June 20, 2013 17:37
Tesseract does not natively handle JP2 files, so if the OBJ datastream
is a JP2 we must create a TIFF from that JP2 to pass into tesseract.
@ruebot
Copy link
Member

ruebot commented Jan 11, 2014

👍 for merge.

See my Large Image PR #86 comment

@ruebot
Copy link
Member

ruebot commented Jan 12, 2014

One item to note, I'm note seeing anything show up in Solr, but that might be because things are still going through GSearch, but has been almost a day now. So, not quite sure what's going on.

@dltj
Copy link
Author

dltj commented Jan 12, 2014

I wouldn't think it would take that long. Are the derivatives created in the object as one would expect?

@ruebot
Copy link
Member

ruebot commented Jan 12, 2014

Yep. It looks like it.

screenshot from 2014-01-12 15 41 55

@dltj
Copy link
Author

dltj commented Jan 12, 2014

Hmmm -- well the OCR/HOCR derivative creation code isn't anything special. It commits those datastreams back to the Fedora repository, where GSearch should pick up the message that the object has changed and go get the content. Certain that GSearch is configured to pull content from the OCR/HOCR datastreams? Any error messages from GSearch?

I find it odd that there are 5 versions of the OCR and HOCR datastreams. Can you account for that?

@ruebot
Copy link
Member

ruebot commented Jan 12, 2014

I'm not seeing anything out of the ordinary in fedoragsearch.daily.log.

The 5 revisions really threw me too. I have no idea what that is all about.

@ruebot
Copy link
Member

ruebot commented Jan 13, 2014

After some discussion in #islandora, I've come to the conclusion that the Solr/GSearch issue is a nuance of my particular installation, and not from the two related pull requests.

👍 for merge.

$jp2_file = islandora_ocr_get_uploaded_file($datastream);

// Create JP2 with kakadu.
module_load_include('inc', 'islandora_large_image', 'includes/utilities');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dependency should be noted in the .info file

@ruebot
Copy link
Member

ruebot commented Dec 9, 2014

Sometime, somewhere this already made it into the code base. So, I'll close this.

@ruebot ruebot closed this Dec 9, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants