Manifests

David Moles edited this page Jul 18, 2018 · 6 revisions

There are a variety of ways to submit digital objects to Merritt; the method you choose will depend on the nature of the digital objects and how many objects you have to submit. One option is to use a manifest to add either complex objects consisting of many files or a large batch of objects. A manifest is a simple pipe-delimited text file containing basic information about the files being submitted; just post the objects to a web server and use the Add Object page to submit the manifest that points to them.

CDL has created Excel macros that will transform an Excel worksheet into a properly formatted checkm manifest file. The merrittManifest.xls file contains the macros for generating a manifest as well as worksheets with sample data.

This guide will show you how to prepare manifest files to submit objects to Merritt, either individually or in batches. The guide assumes that you are using MS Excel 2007 to record information about the objects you're submitting, and running a macro to produce a manifest.

You can also use a text editor to create manifest files if you prefer; details are included in the Tips for Creating Text-based Manifests section of this guide.

Why Use a Manifest?

IF YOU HAVE: THE BEST SUBMISSION OPTION IS:
Just a few simple objects, each consisting of a single file Upload directly from the Add Objects page
Just a few complex objects, each consisting of multiple files (these may include metadata files Create a container file (.zip or .tar) for each object, then upload each one directly from the Add Object page. When you upload a .zip or .tar file, the object's component files will be extracted and made accessible through the Display Object and Display Version pages.
OR
Create an object manifest file, then upload the manifest from the Add Object page. When you use an object manifest, every component file must be posted on a web server; the manifest must include each file's URL.
A large number of either simple or complex objects Create a batch manifest file, then upload that file from the Add Object page.

A batch manifest can point to either single-file simple objects, container files, or to object manifest files--but not to all three. If you have all of these, you will need to create three manifests--one for each type of object.

All of the files in your manifest must be posted on a web server; the manifest must include each file's URL.

For a single object: In most cases it will be easiest to submit single objects via the Merritt user interface rather than using an object manifest. There are two reasons you might prefer to use an object manifest to submit a single object to Merritt:

  1. You have checksum values for every file in a complex object, and you would like each file-level checksum to be verified by Merritt upon ingest.
  2. You have a complex object consisting of many files, but do not have access to a utility to create a .zip or .tar container file.

For a batch of objects: If you have a large number of objects to submit, it may be more efficient to use a batch manifest. If each object consists of many files, you can create container (.zip or .tar) files for each object, post those container files to a web server, then create a batch manifest with URLs for the container files. You will also be able to supply metadata about each object in the batch manifest.

A batch manifest will be an especially good option if you already have information about the objects available in a spreadsheet format, or can easily export a spreadsheet report from another system.

You can also submit a batch manifest that points to object manifests for complex objects. This is a good option if you have in-house scripting expertise and a way to derive object metadata. This is also the option to choose if you have checksum values for every individual component file of every object, and you want Merritt to validate each checksum upon ingest.

The Excel Macro

The merrittManifest.xls Excel file contains four macros that allow you to create manifests for individual objects and batches of single files, container files or manifest files. This spreadsheet also includes sample worksheets showing the information needed for the batch and object manifests.

When you open the merrittManifest.xls file, you may see a security warning to let you know that the file contains macros, and that they have been disabled. Click Options, select Enable this content, then click OK.

-- Screen shot 1 --

-- Screen shot 2 --

You can use this Excel file directly and simply create a new worksheet for each object or each batch. It's important that each worksheet includes exactly the same columns in exactly the same order, so you may want to copy and paste the column headings from a sample worksheet into your own worksheet. (If you are only entering the bare minimum elements, you will still need headers for the columns you don't use). Note that the macro will ignore any text that is bold, so the header row must be in bold, and no other text should be bold. The headings for required columns are highlighted in orange.

Workflow: A Single Object

  1. Place the object file(s) on a web server.
  2. Download the merrittManifest.xls file.
  3. Create a new worksheet for the object manifest. There can be only one object per object manifest.
  4. Provide a URL and file name for each file in the object. You may optionally provide checksum and file size information; object manifest section of this guide for details.
  5. Choose View > Macros > View Macros > ThisWorkbook.CreateObjectManifest > Run
  6. Name your object manifest file. A file extension of .checkm will automatically be added to the file.
  7. Login to Merritt. If you work with multiple collections, choose the collection this object will be submitted to.
  8. Choose Add Object.
  9. Choose Select File, and browse to select the object manifest you just created.
  10. You will receive an email to acknowledge that the object submission has been received. You will receive another email to tell you whether the object was added successfully or not.

Workflow: A Batch of Container Files

  1. If your digital objects are composed of many files, create a container (.zip or .tar) file for each object.
  2. Place the container files on a web server.
  3. Download the merrittManifest.xls file.
  4. Create a new worksheet for your batch manifest.
  5. Provide a URL and file name for each container file in the batch. Provide any optional metadata (details in The Batch Manifest section).
  6. Choose View > Macros > View Macros > ThisWorkbook.BatchOfContainerFiles > Run
  7. Name your batch manifest file. A file extension of .checkm will automatically be added to the file.
  8. Login to Merritt. If you work with multiple collections, choose the collection this batch will be submitted to.
  9. Choose Add Object.
  10. Choose Select File, and browse to select the batch manifest you just created.
  11. You will receive an email to acknowledge that the batch submission has been received. You will receive another email to tell you whether the object was added successfully or not.

Workflow: A Batch of Single Files

  1. Place the files on a web server.
  2. Download the merrittManifest.xls file.
  3. Create a new worksheet for your batch manifest.
  4. Provide a URL and file name for each file in the batch. Provide any optional metadata (details in the Batch Manifest section).
  5. Choose View > Macros > View Macros > ThisWorkbook.BatchOfSingleFiles > Run
  6. Name your batch manifest file. A file extension of .checkm will automatically be added to the file.
  7. Login to Merritt. If you work with multiple collections, choose the collection this batch will be submitted to.
  8. Choose Add Object.
  9. Choose Select File, and browse to select the batch manifest you just created.
  10. You will receive an email to acknowledge that the batch submission has been received. You will receive another email to tell you whether the object was added successfully or not.

Workflow: A Batch of Object Manifest Files

  1. Place the object files on a web server.
  2. Download the merrittManifest.xls file.
  3. Create a new worksheet for each object manifest. There can be only one object per object manifest.
  4. Provide a URL and file name for each file in the object. You may optionally provide checksum and file size information; see the Object Manifest section for details.
  5. Choose View > Macros > View Macros > ThisWorkbook.CreateObjectManifest > Run
  6. Name your object manifest file. A file extension of .checkm will automatically be added to the file.
  7. Post each resulting object manifest to a web server.
  8. Create a new worksheet for your batch manifest.
  9. Provide a URL and file name for each object manifest you created. Provide any optional metadata (details in the Batch Manifest section).
  10. Choose View > Macros > View Macros > ThisWorkbook.BatchOfManifestFiles > Run
  11. Name your batch manifest file. A file extension of .checkm will automatically be added to the file.
  12. Login to Merritt. If you work with multiple collections, choose the collection this batch will be submitted to.
  13. Choose Add Object.
  14. Choose Select File, and browse to select the batch manifest you just created.
  15. You will receive an email to acknowledge that the batch submission has been received. You will receive another email when the submission has been processed. You will receive another email to tell you whether the object was added successfully or not.

The Object Manifest

The object manifest contains a separate row for each file that is considered part of a single object. Each object worksheet or manifest should only include information about one object. Object components can include files of metadata pertaining to the object in any format (METS, marc etc.). The Object Manifest Specification for Merritt is available in a plain text file (to make columns and rows more clear).

The information you can provide in an object manifest is: fileURL | hashAlgorithm | hashValue | fileSize | filename

Only fileURL and fileName are required.

The hashAlgorithm column specifies what kind of checksum you are providing, if you have a checksum value for a component file in the object. (Accepted checksum algorithms are: Adler-32, CRC-32, MD2, MD5,SHA-1, SHA-256, SHA-384, and SHA-512). If you provide a hashAlgorithm, you must also provide a hashValue, and vice-versa. If provided, Merritt will validate any checksum values provided for each file. If the value provided does not match that value that Merritt calculates, the object submission will fail. You will be notified by email that the object was not submitted because the object did not pass a fixity check.

The fileSize column contains the file size in bytes, and can be left blank.

There are no columns for object-level metadata such as title, creator etc. These can be supplied when you upload the manifest by filling out the form on the Add Object screen, or by also submitting a batch manifest.

Workflow: The Batch Manifest

The batch manifest contains a separate row for each object, and can contain only one row for any complex, multi-file object. Rows for complex object should point either to container (.zip or .tar) files or to object manifest files. The Batch Manifest Specification for Merritt is available in a plain text file (to make columns and rows more clear).

The information you can provide in a batch manifest is:

fileUrl | hashAlgorithm | hashValue | fileSize | fileName | primaryIdentifier | localIdentifier | creator | title | date

Only fileURL and fileName are required.

The hashAlgorithm and hashValue columns refer to the checksum of whatever file is referenced in the fileUrl column. If you provide URLs pointing to object manifest files, the hashValue would be the checksum of the object manifest itself. The hashAlgorithm column specifies what kind of checksum you are providing, if you have a checksum value for a component file in the object. (Accepted checksum algorithms are: Adler-32, CRC-32, MD2, MD5,SHA-1, SHA-256, SHA-384, and SHA-512). If you provide a hashAlgorithm, you must also provide a hashValue, and vice-versa. If provided, Merritt will validate any checksum values provided for each file. If the value provided does not match the value that Merritt calculates, the object submission will fail. You will be notified by email that the object did not pass a fixity check.

The fileSize column is optional and is expressed in bytes.

primaryIdentifier is the identifier that Merritt uses to track the object. If you are submitting new objects, you will very likely not have a primary identifier. Primary identifiers must be ARK format identifiers. You will generally only use this column if you are using a manifest to edit an existing object. You will be able to see the Merritt-supplied primary identifier for any object in Merritt when you display it.

localIdentifier is any identifier you already use to refer to the object. You can provide multiple local identifiers by separating them with a semicolon. The contents of this column will be searchable in Merritt. You will also be able to edit the object by referring to the local identifier in a manifest. This identifier must be unique among all of the objects in all of your collections.

creator is the author or creator of the object itself. There are no format requirements for expressing named persons or entities. Merritt will display the creator exactly as entered. This column will be searchable in Merritt.

title is the title of the object itself. Merritt will display the title exactly as entered. This column will be searchable in Merritt.

date is the publication date of the object itself. If you provide the date in a standard excel format, it will be submitted to in Web UTC datetime format. If you enter a non-standard date format, the date will be submitted as plain text. Merritt will display the date exactly as entered. This column will be searchable in Merritt.

Tips for Creating Text-based Manifests

  • You do not have to use Excel to create a manifest file; you can also use a text editor or create a script to write manifest files. You can use the sample batch and object manifests and simply edit the area for conveying rows of object information.
  • The contents of the manifest are listed on separate lines, with each column delimited by " | " [space] [pipe] [space]. Empty columns are indicated by a space between column delineators: " | | ". (There may be one or two blank spaces in the column).
  • The column heading text in the excel spreadsheet is slightly different than in the manifest files. The meaning and order of the columns is the same, but the headings in the text of a manifest file will begin with either "nfo:" or "mrt:" (example: nfo:fileUrl).
  • Merritt manifests contain placeholders for two columns that do not appear in the spreadsheet described above: nfo:fileLastModified and mrt:mimetype. These columns are not yet implemented in Merritt, but they do need to be included in any manifest that you type by hand, with the columns left empty.
  • Batch manifests must be identified as a batch of containers, batch of single files or a batch of object manifests. The profile line of the manifest identifies the type of batch:

Batch of container files:

#%profile | http://uc3.cdlib.org/registry/ingest/manifest/mrt-container-batch-manifest

Batch of single files:

#%profile | http://uc3.cdlib.org/registry/ingest/manifest/mrt-single-file-manifest

Batch of object manifests:

#%profile | http://uc3.cdlib.org/registry/ingest/manifest/mrt-batch-manifest

When you've finished editing a sample manifest, be sure to save it with a unique name and a ".checkm" file extension. In any batch manifest, empty cells up to the nfo:fileName column must be identified with " | | ". After the nfo:fileName column, empty cells should be identified ONLY if they are followed by another cell that has a value. Examples:

Special Considerations

  • If you are using the Excel spreadsheet, any text in bold will be ignored. Make sure that all of your content is in plain text.
  • If you are using the Excel spreadsheet, and if you need to delete a row, use the [right click] [delete] approach. You need to delete the entire row, rather than just the text in the row cells.
  • If you are using either the spreadsheet or a text editor to produce a manifest, you cannot use a pipe (vertical bar) character "|" in any of your fields. If you need a pipe character to appear anywhere in an object record, replace it with: %7C

Optimizing the Macro: Create a Trusted Location (Optional for PC)

If you plan to create lot of Merritt manifests, you may want to create a trusted location for the merrittManifest.xls file on your computer. This will allow you to run the macro without having to chose enable this content when you open the excel file. It will also enable you to run the macro from other excel files as long as you have the merrittManifest.xls file open. The following instructions are taken from Microsoft documentation for Office 2007 users (http://office.microsoft.com/en-us/excel-help/create-remove-or-change-a-trusted-location-for-your-files-HA010031999.aspx)

  1. Open MS Excel.
  2. Click the Microsoft Office Button , and then click Excel Options.
  3. Click Trust Center, click Trust Center Settings, and then click Trusted Locations.
  4. Click Add new location. Important: We recommended that you do not make your entire Documents or My Documents folder a trusted location; doing so increases your security risk. Create a subfolder within Documents or My Documents, and make only that folder a trusted location.
  5. In the Path box, type the name of the folder that you want to use as a trusted location, or click Browse to locate the folder.
  6. If you want to include subfolders as trusted locations, select the Subfolders of this location are also trusted check box.
  7. In the Description box, type what you want to describe the purpose of the trusted location.
  8. Click OK.
  9. Store the merrittManifest.xls spreadsheet in this directory.
  10. When you open the merrittManifest.xls file, the macros will be accessible to any other excel file you open in the trusted directory.
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.
Press h to open a hovercard with more details.