This module extends the Islandora Batch framework to facilitate the ingestion of a ZIP or directory filled with one or more PDFs and associated xml metadata files into paged content and individual page objects.
The ingest is a two-step process:
- Preprocessing: The data is scanned and a number of entries are created in the Drupal database. There is minimal processing done at this point, so preprocessing can be completed outside of a batch process.
- Ingest: The data is actually processed and ingested. This happens inside of a Drupal batch.
This module requires the following modules/libraries:
Install as usual, see this for further information.
The base ZIP preprocessor can be called as a drush script (see
drush help islandora_paged_content_pdf_batch_preprocess for additional parameters):
Drush made the
target parameter reserved as of Drush 7. To allow for backwards compatability this will be preserved.
target option requires the full path to your archive from root directory. e.g. /var/www/drupal/sites/archive.zip
Drush 7 and above:
drush -v -u 1 --uri=http://localhost islandora_paged_content_pdf_batch_preprocess --scan_target=/path/to/archive.zip --content_model=islandora:bookCModel --parent=islandora:bookCollection
Drush 6 and below:
drush -v -u 1 --uri=http://localhost islandora_paged_content_pdf_batch_preprocess --target=/path/to/archive.zip --content_model=islandora:bookCModel --parent=islandora:bookCollection
This will populate the queue (stored in the Drupal database) with base entries.
The queue of preprocessed items can then be processed:
drush -v -u 1 --uri=http://localhost islandora_batch_ingest
Custom ingests can be written by extending any of the existing preprocessors and batch object implementations.
Having problems or solved a problem? Contact discoverygarden.
This project has been sponsored by: