A Quick Path to Web Deposit via the Workbench

gregjan edited this page Feb 24, 2012 · 6 revisions

Overview

The idea is to create simple web forms for purposes like self-deposit of scholarly works, separating form presentation and validation from the concerns of XML and particular field encoding issues. We can map fields to XML via the workbench and in some cases, such as date fields, we can normalize their encoding. Therefore the web form need only be concerned with authentication, required fields and pick lists. Each form will be its own drop box for submissions.

Goals

  • Facilitate deposit of files with metadata through a simple web forms.
  • Support an arbitrary range of metadata fields through crosswalks.
  • Define a clean way to record every submission for mapping batch ingest later. (delimiters, multi-value fields?)
  • Support periodic reuse of the same crosswalk to submit batches from a form. (#17 Templates)
  • Support aggregate works and distinguish between main and supporting files.
  • For simplicity, have the workbench distinguish the single file works from aggregates
  • Generate checksums as soon as possible.
  • Have the workbench verify checksums.

Data Layout

We can use the Bag-It folder structure, unzipped, as that will make web forms more reusable and may allow us to use third-party software. Each submission of a web form can add data to the same bag. Later someone can import the bag into the workbench and process it. (Bag-It Layout)

mydepositform-bag/
|-- data
|   |-- submission-A-uuid
|   |   |-- Bird Calls of the Mid-Atlantic.doc
|   |   \-- supplemental
|   |       \-- bird-call-examples.mp3
|   \-- submission-B-uuid
|       \-- Shark attack findings.pdf 
|-- manifest-md5.txt
|     49afbd86a1ca9f34b677a3f09655eae9 data/submission-A-uuid/Bird Calls of the Mid-Atlantic.doc
|     408ad21d50cef31da4df6d9ed81b01a7 data/submission-A-uuid/supplemental/bird-call-examples.mp3
|     408ad21d50cef31da4df6d9ed81b01a7 data/submission-B-uuid/Shark attack findings.pdf 
|-- bagit.txt
|     BagIt-version: 0.96
|     Tag-File-Character-Encoding: UTF-8
\-- form-data.txt (must be UTF-8, as per bagit.txt)
      "uuid","ColumnName1","ColumnName2","ColumnName3","ColumnName4"
      "submission-A-uuid","John","Doe","Bird Calls of the Mid-Atlantic","2012/01/01"
      "submission-B-uuid","Jane","Doe","Shark Attack Findings","2011/02/12"

It would be convenient if the curator could simply remove the bag folder for processing (mydepositform-bag in our example). The deposit form software can then start a new bag with the next deposit.