This project provides an interface (driven by a JSON API) for managing review and pagination of jobs prior to their import into VuDL.
Workflow / Assumptions
This software assumes a two-tiered set of directories, in which the top tier represents categories and the second tier represents jobs.
Each category folder is expected to contain a batch-params.ini file that controls parameters for the category. For example:
[collection] ; ID of holding area object in Fedora 3: destination = vudl:5 [ocr] ; Should we perform OCR on the jobs in this category? ocr = 'true' [pdf] ; Should we generate a PDF if none is already in the folder? generate = 'true'
Each job folder is expected to contain TIFF images of a multi-page item. For example:
/usr/local/holding /category1 /batch-params.ini /job1 0001.TIF 0002.TIF
The software provides functionality for automatically generating JPEG derivatives of these TIFFs as well as assigning labels to the pages within the jobs. When all of this work has been completed, the finished data can be published to a repository. (Currently, this is designed for the VuDL Fedora3-based repository, but additional options will be added eventually).
$ bundle install $ rake db:migrate
Then edit config/vudl.yml and config/initializers/omniauth.rb as needed.
Most of the functionality of the system can be accessed through the project's web interface – spin this up as you would any other Rails application.
This application uses Resque for queueing; be sure to run
$ rake resque:work
to start a worker process to build derivatives.
A custom rake task has been defined to ingest jobs published from this system into the repository; to run this process:
$ rake ingest