Handle JP2 files natively when performing OCR #31

dltj · 2013-11-27T21:14:08Z

Depends on Islandora/islandora_solution_pack_large_image#86

Tesseract does not natively handle JP2 files, so if the OBJ datastream is a JP2 we must create a TIFF from that JP2 to pass into tesseract.

ruebot · 2014-01-11T16:21:44Z

👍 for merge.

See my Large Image PR #86 comment

ruebot · 2014-01-12T15:46:38Z

One item to note, I'm note seeing anything show up in Solr, but that might be because things are still going through GSearch, but has been almost a day now. So, not quite sure what's going on.

dltj · 2014-01-12T20:34:15Z

I wouldn't think it would take that long. Are the derivatives created in the object as one would expect?

ruebot · 2014-01-12T20:43:31Z

Yep. It looks like it.

dltj · 2014-01-12T22:23:03Z

Hmmm -- well the OCR/HOCR derivative creation code isn't anything special. It commits those datastreams back to the Fedora repository, where GSearch should pick up the message that the object has changed and go get the content. Certain that GSearch is configured to pull content from the OCR/HOCR datastreams? Any error messages from GSearch?

I find it odd that there are 5 versions of the OCR and HOCR datastreams. Can you account for that?

ruebot · 2014-01-12T22:45:20Z

I'm not seeing anything out of the ordinary in fedoragsearch.daily.log.

The 5 revisions really threw me too. I have no idea what that is all about.

ruebot · 2014-01-13T19:52:30Z

After some discussion in #islandora, I've come to the conclusion that the Solr/GSearch issue is a nuance of my particular installation, and not from the two related pull requests.

👍 for merge.

nigelgbanks · 2014-05-22T19:01:48Z

includes/derivatives.inc

+  $jp2_file = islandora_ocr_get_uploaded_file($datastream);
+
+  // Create JP2 with kakadu.
+  module_load_include('inc', 'islandora_large_image', 'includes/utilities');


Dependency should be noted in the .info file

ruebot · 2014-12-09T16:41:29Z

Sometime, somewhere this already made it into the code base. So, I'll close this.

dltj added 3 commits June 20, 2013 17:37

Create temporary TIFF derivative of OBJ DS is JP2

b321512

Tesseract does not natively handle JP2 files, so if the OBJ datastream is a JP2 we must create a TIFF from that JP2 to pass into tesseract.

Merge commit 'b321512' into 7.x-native-jp2

552a2e3

Modify old code to meet coding standards.

b0f8aaa

nigelgbanks reviewed May 22, 2014
View reviewed changes

ruebot mentioned this pull request Sep 11, 2014

Allow solution pack to accept JP2 files natively Islandora/islandora_solution_pack_large_image#86

Closed

ruebot closed this Dec 9, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handle JP2 files natively when performing OCR #31

Handle JP2 files natively when performing OCR #31

dltj commented Nov 27, 2013

ruebot commented Jan 11, 2014

ruebot commented Jan 12, 2014

dltj commented Jan 12, 2014

ruebot commented Jan 12, 2014

dltj commented Jan 12, 2014

ruebot commented Jan 12, 2014

ruebot commented Jan 13, 2014

nigelgbanks May 22, 2014

ruebot commented Dec 9, 2014

Handle JP2 files natively when performing OCR #31

Handle JP2 files natively when performing OCR #31

Conversation

dltj commented Nov 27, 2013

ruebot commented Jan 11, 2014

ruebot commented Jan 12, 2014

dltj commented Jan 12, 2014

ruebot commented Jan 12, 2014

dltj commented Jan 12, 2014

ruebot commented Jan 12, 2014

ruebot commented Jan 13, 2014

nigelgbanks May 22, 2014

Choose a reason for hiding this comment

ruebot commented Dec 9, 2014