Skip to content

Commit

Permalink
Merge pull request #93 from IU-Libraries-Joint-Development/HPT-989_oc…
Browse files Browse the repository at this point in the history
…r_jp2

HPT-989 Add OCR generation to JP2 derivatives creation code
  • Loading branch information
andjsmit committed Feb 16, 2017
2 parents e2dff82 + ed6367b commit 9a5bc48
Show file tree
Hide file tree
Showing 5 changed files with 407 additions and 1 deletion.
33 changes: 33 additions & 0 deletions .install_tesseract
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
wget "https://github.com/uclouvain/openjpeg/archive/version.2.1.zip"
unzip version.2.1.zip
cd openjpeg-version.2.1
mkdir build
cd build
cmake ..
make
sudo make install
sudo make clean
cd ..
cd ..
wget "http://www.leptonica.com/source/leptonica-1.73.tar.gz"
tar xzf leptonica-1.73.tar.gz
cd leptonica-1.73
sed -i 's/#define HAVE_LIBJP2K 0/#define HAVE_LIBJP2K 1/g' ./src/environ.h
sed -i 's/-ltiff -ljpeg -lpng -lz -lm/-ltiff -ljpeg -lpng -lz -lm -lopenjp2/g' ./prog/makefile.static
./configure
make
sudo make install
cd ..
wget "https://github.com/tesseract-ocr/tesseract/archive/3.04.00.zip"
unzip 3.04.00.zip
cd tesseract-3.04.00
./autogen.sh
./configure
make
sudo make install
sudo ldconfig
cd ..
git clone https://github.com/tesseract-ocr/tessdata.git
sudo cp tessdata/eng.* /usr/local/share/tessdata/
sudo cp tessdata/ita* /usr/local/share/tessdata
tesseract -v
1 change: 1 addition & 0 deletions app/models/file_set.rb
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,7 @@ def create_derivatives(filename)
dst = derivative_path('intermediate_file')
FileUtils.mkdir_p(File.dirname(dst))
FileUtils.cp(filename, dst)
RunOCRJob.perform_later(id) if Plum.config[:store_original_files]
end
super
end
Expand Down
3 changes: 2 additions & 1 deletion circle.yml
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,8 @@ dependencies:
- kakadu
pre:
- npm install -g eslint
- sudo apt-get install libmagickwand-dev imagemagick redis-server tesseract-ocr tesseract-ocr-ita tesseract-ocr-eng sqlite3 libsqlite3-dev
- sudo apt-get install libmagickwand-dev imagemagick redis-server sqlite3 libsqlite3-dev
- bash ./.install_tesseract
post:
- sudo sh bin/ci_kakadu_install.sh
- bundle exec rake rubocop
Expand Down
292 changes: 292 additions & 0 deletions spec/fixtures/contentdm_xml/Irish_People_Short.xml

Large diffs are not rendered by default.

79 changes: 79 additions & 0 deletions spec/fixtures/contentdm_xml/Irish_People_Short.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
---
:resource: MultiVolumeWork
:attributes:
:default:
state: final_review
viewing_direction: left-to-right
rights_statement: http://rightsstatements.org/vocab/NKC/1.0/
visibility: open
:local:
source_metadata_identifier: Irish People <br> http://indiamond6.ulib.iupui.edu/cdm/search/collection/IP
viewing_direction: left-to-right
:source_metadata:
:thumbnail_path: http://indiamond6.ulib.iupui.edu:2012/cgi-bin/thumbnail.exe?CISOROOT=/IP&CISOPTR=5044
:collections: []
:volumes:
- :title:
- 1974-11-16 Irish People
:structure:
:nodes:
- :label: page1
:proxy: page1
- :label: page2
:proxy: page2
- :label: page3
:proxy: page3
:files:
- :id: page1
:mime_type: image/jp2
:attributes:
:title:
- page1
:thumbnail: http://indiamond6.ulib.iupui.edu:2012/cgi-bin/thumbnail.exe?CISOROOT=/IP&CISOPTR=5044
:path: http://indiamond6.ulib.iupui.edu:2012/cgi-bin/showfile.exe?CISOROOT=/IP&CISOPTR=5044
:file_opts: {}
- :id: page2
:mime_type: image/jp2
:attributes:
:title:
- page2
:thumbnail: http://indiamond6.ulib.iupui.edu:2012/cgi-bin/thumbnail.exe?CISOROOT=/IP&CISOPTR=5045
:path: http://indiamond6.ulib.iupui.edu:2012/cgi-bin/showfile.exe?CISOROOT=/IP&CISOPTR=5045
:file_opts: {}
- :id: page3
:mime_type: image/jp2
:attributes:
:title:
- page3
:thumbnail: http://indiamond6.ulib.iupui.edu:2012/cgi-bin/thumbnail.exe?CISOROOT=/IP&CISOPTR=5046
:path: http://indiamond6.ulib.iupui.edu:2012/cgi-bin/showfile.exe?CISOROOT=/IP&CISOPTR=5046
:file_opts: {}
- :title:
- 1982-03-27 Irish People
:structure:
:nodes:
- :label: page1
:proxy: page1
- :label: page2
:proxy: page2
:files:
- :id: page1
:mime_type: image/jp2
:attributes:
:title:
- page1
:thumbnail: http://indiamond6.ulib.iupui.edu:2012/cgi-bin/thumbnail.exe?CISOROOT=/IP&CISOPTR=8153
:path: http://indiamond6.ulib.iupui.edu:2012/cgi-bin/showfile.exe?CISOROOT=/IP&CISOPTR=8153
:file_opts: {}
- :id: page2
:mime_type: image/jp2
:attributes:
:title:
- page2
:thumbnail: http://indiamond6.ulib.iupui.edu:2012/cgi-bin/thumbnail.exe?CISOROOT=/IP&CISOPTR=8154
:path: http://indiamond6.ulib.iupui.edu:2012/cgi-bin/showfile.exe?CISOROOT=/IP&CISOPTR=8154
:file_opts: {}
:sources:
- :title:
- Contentdm XML
:file: spec/fixtures/contentdm_xml/Irish_People_Short.xml

0 comments on commit 9a5bc48

Please sign in to comment.