Skip to content

1. METS documentation for OpenEdition

CarolineTerrier edited this page Jul 4, 2019 · 17 revisions

METS / TEI import for OpenEdition Journals and OpenEdition Books

Errata

Revision 11/13/2018: correction on description of the METS file describing the book.

Description of the ZIP archive containing the files

At its root, the ZIP archive must contain:

  • a “files” directory containing the illustrations used in the articles/chapters, as well as the cover image of the volume.
    Note: for images, the formats accepted by Lodel are JPG and PNG. The cover images must also have a resolution of 300 dpi and be at least 1400 pixels wide;
  • a “sources” directory containing the files of each article/chapter in XML/TEI format, or Word if applicable, and PDF. These PDFs (named “facsimiles” in Lodel) must be in low definition if the volume has been produced by digitisation; in other cases, it must be a publisher PDF in text mode. This directory can also contain the complete PDF of the volume;
  • a METS file named “MANIFEST.xml”.
    The rest of this documentation presents the contents of the METS file.

Description of the METS file describing the book

  1. Three sections are required:
  • <structMap>
  • <filSec>
  • <dmdSec>
  1. One section is optional: <amdSec>

Statement of the METS and MODS schemas in the root element

<?xml version="1.0" encoding="utf-8"?>
    <mets:mets
xsi:schemaLocation="http://www.loc.gov/METS/ http://lodel.org/ns/mets/mets.openedition.1.3/mets.openedition.1.3.xsd 
http://www.loc.gov/mods/v3 http://lodel.org/ns/mods/mods.openedition.1.2/mods.openedition.1.2.xsd"
           xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
           xmlns:xsd="http://www.w3.org/2001/XMLSchema"
           xmlns:xlink="http://www.w3.org/1999/xlink"
           xmlns:mods="http://www.loc.gov/mods/v3"
           xmlns:mets="http://www.loc.gov/METS/"
           xmlns:marcrel="http://www.loc.gov/loc.terms/relators">

<structMap>

The <structMap> describes the tree structure of the volume.

I. The <mets:div> tags must be nested to describe the volume/subparts/text.

  1. For each <mets:div> tag, the required attributes are:
  • TYPE: see the list of allowed types below (the METS XML schema restricts the use of possible types);
  • ORDER: sequence number, which must represent the order of appearance of the element in its parent element (volume or subpart).
  1. For each <mets:div> tag, the required attributes are:
  • DMDID: must be equal to the identifier of the <dmdSec> describing the /mets:mets/mets:dmdSec/@ID (if there is no <dmdSec> element relating to a <div> element, this attribute should not be indicated);
  • LABEL: title of the document (optional but very useful to facilitate the reading of the <structMap> and to spot errors).
  1. The different versions of the documents (.xml, .pdf, .doc) must be described in the <structMap> with the tag <mets:fptr>. The only mandatory attribute of <mets:fptr> is:
  • FILEID: identifier of the file used in the section /mets:mets/mets:fileSec/mets:fileGrp/mets:file/@ID.

II. The <structMap> must reflect the complete structure of the volume

This is not an encoding of the table of contents.

Volumes often have inconsistencies between the table of contents and the body of the book. For example: for chapters/articles, titles in the table of contents may differ from those listed at the beginning of chapters/articles. You must therefore use the titles of the documents and sections available in the body of the volume, and not those in the table of contents. The table of contents is used to understand the structure of the volume, but in the event of an inconsistency between the table of contents and the contents of the volume, the organisation of the contents of the volume must be maintained. Some degree of interpretation may be needed to make choices and describe this structure correctly.

III. Authorised types in the <structMap>

A. For OpenEdition Journals

1. “textes” class types

  • editorial
  • article
  • compterendu (this type is used for reviews)
  • chronique (this type is used for columns)

2. “fichiers” class types

  • couverture1 (this type is used for the frontcover)
  • imageaccroche (this type is used for the snapshot image)
  • facsimile (this type is used for facsimiles)

Note: at the volume level, OpenEdition Journals allows the types “couverture1”, “imageaccroche” and “facsimile”. At the level of editorial units, OpenEdition Journals allows the types “imageaccroche” and “facsimile”.

3. “publications” class types

  • numero (this type is used for issues)
  • souspartie (this type is used for subparts)

B. For OpenEdition Books

1. “textes” class types

  • avantpropos (this type is used for forewords)
  • preface
  • chapitre (this type is used for chapters)
  • source
  • postface (this type is used for afterwords)
  • bibliographie (this type is used for bibliographies)
  • index
  • annexe (this type is used for appendices)
  • adressebibliographique (this type is used for the bibliographic address)
  • pageliminaire (this type is used for prelim pages)

Note: the types “index” and “adressebibliographique” are reserved for books produced by digitisation.
Note: the “adressebibliographique” type does not appear in HTML in the parts of OpenEdition Books sites that are accessible to the public. On the other hand, this type is not retrieved in the detachable formats (ePub and PDF) generated by OpenEdition.

2. “fichiers” class types

  • couverture1 (this type is used for the frontcover)
  • couverture4 (this type is used for the back cover)
  • facsimile
  • tdm (this type is used for the table of contents)

Note: the type “tdm” is reserved for works produced by digitisation. Note: at the volume level, OpenEdition Books allows the types “couverture1”, “couverture4”, “tdm” and “facsimile”. At the level of editorial units, OpenEdition Books allows the type “facsimile”.

3. “publications” class types

  • livre (this type is used for books)
  • souspartie (this type is used for subsections)

<fileSec>

Each file contained in the ZIP archive must be described in this section in a <mets:file> tag.

1. The mandatory attributes of <mets:file> are:

  • ID: a unique identifier;
  • MIMETYPE: the MIME type of the file;
  • GROUPID: an identifier common to all the files representing the same document and to all the images contained in this document (for example: the same GROUPID for the PDF, Word and XML/TEI versions, as well as for the images referred to in the XML/TEI file).

2. The optional attributes are:

  • CHECKSUM: the value of the MD5 checksum of the file;
  • CHECKSUMTYPE: MD5.

3. Each <mets:file> tag must contain a <mets:FLocat> tag pointing to the file. Two attributes are required:

  • LOCTYPE: URL;
  • xlink: href. This is the relative path to the file in the ZIP archive.

Example: facsimile of an article (for OpenEdition Journals):

<mets:fileSec>
  <mets:fileGrp ID="pdf_files">
    <mets:file ID="ID7-24-pdf1" MIMETYPE="application/pdf" GROUPID="7_24" CHECKSUM="0780cf4909d721838276a2e812859cf9" CHECKSUMTYPE="MD5">
      <mets:FLocat LOCTYPE="URL" xlink:href="sources/7-24.pdf"/>
    </mets:file>
  </mets:fileGrp>
</mets:fileSec>

The file described here is a PDF named “7-24”. The identifier of the section <fileSec> in this PDF (i.e. “ID7-24-pdf1”) is present in <structMap>:

<mets:structMap>
  <mets:div TYPE="numero" DMDID="ID_2001_05_1">
    <mets:div TYPE="souspartie" ORDER="2" LABEL="title of subpart" DMDID="ID_2001_05_1-section1">
      <mets:div TYPE="article" ORDER="1" LABEL="article title">
        <mets:fptr FILEID="ID7-24-tei1"/>
        <mets:fptr FILEID="ID7-24-pdf1"/>
      </mets:div>
    </mets:div>
  </mets:div>
</mets:structMap>

<amdSec>

You have to provide an <amdSec> tag if you want to indicate the source used for encoding. This source is either “ocr” or any other value (which indicates that it is not OCR).

The following must be added:

in the first case:

<mets:amdSec>
  <mets:digiprovMD ID="AMDID_XXX">
    <mets:mdWrap MDTYPE="MODS" MIMETYPE="text/xml">
      <mets:xmlData>
        <mods:note type="sourcetype">ocr</mods:note>
      </mets:xmlData>
    </mets:mdWrap>
  </mets:digiprovMD>
</mets:amdSec>

or in the second case:

<mets:amdSec>
  <mets:digiprovMD ID="AMDID_XXX">
    <mets:mdWrap MDTYPE="MODS" MIMETYPE="text/xml">
      <mets:xmlData>
        <mods:note type="sourcetype">publisher pdf</mods:note>
      </mets:xmlData>
    </mets:mdWrap>
  </mets:digiprovMD>
</mets:amdSec>

The <dmdSec> of this volume should reference, using the AMDID attribute, the ID specified in the tag <amdSec>.

Example:
<mets:dmdSec ID="IDBOOK" ADMID="AMDID_XXX">


<dmdSec>

I) Presentation

Each <div> element used in the <strucMap> can be described in a <dmdSec> in MODS (http://www.loc.gov/standards/mods).

The tag <mets:dmdSec> must contain a mandatory attribute:

  • ID: a unique identifier corresponding to the DMDID attribute used in the <structMap>

The <dmdSec> elements are necessary for:

  • all objects in the “fichiers” class: types “imageaccroche”, “couverture1" and “facsimile” for OpenEdition Journals / “couverture1”, “couverture4”, “facsimile”, “tdm” for OpenEdition Books
  • and all publications: “numero” and “souspartie” for OpenEdition Journals / “livre” and “souspartie” for OpenEdition Books

which are present in <structMap> and to which metadata are associated.

In contrast, the <dmdSec> elements may be omitted for elements of the “textes” class (“editorial”, “article”, “compterendu”, “chronique” types for OpenEdition Journals / “chapitre”, “preface”, etc. for OpenEdition Books) or for objects without metadata.

All descriptive information about the volume must be presented in MODS format in the following METS element: /mets:mets/mets:dmdSec[@ID="IDNUMERO"]/mets:mdWrap/mets:xmlData

Example : the facsimile of the volume for OpenEdition Books:

<mets:dmdSec ID="book1-facsimile">
  <mets:mdWrap MDTYPE="MODS" MIMETYPE="text/xml">
    <mets:xmlData>
      <mods:titleInfo>
        <mods:title>Fac-simile title</mods:title>
      </mods:titleInfo>
    </mets:xmlData>
  </mets:mdWrap>
</mets:dmdSec>

Where the ID ="XXXX" of this section refers in the <structMap> to <mets:div DMDID>:

For example:

<mets:structMap>
  MT<mets:div TYPE="facsimile" ORDER="2" LABEL="fac-simile" DMDID="book1-facsimile">
    <mets:fptr FILEID="book1-pdf"/>
  </mets:div>
</mets:structMap>

The identifier of this PDF (FILEID = "YYYY") is also found in <mets:fileSec>:

<mets:fileSec>
  <mets:fileGrp ID="pdf_files">
    <mets:file ID="book1-pdf" MIMETYPE="application/pdf" GROUPID="book1" CHECKSUM="527373ff4d089fdf38d4d2794ecd8787" CHECKSUMTYPE="MD5">
      <mets:FLocat LOCTYPE="URL" xlink:href="book1.pdf" />
    </mets:file>
  </mets:fileGrp>
</mets:fileSec>

II) Elements of the MODS format

All the descriptive elements of the volume must be placed in MODS format in <mets:dmdSec>.

The following rules apply to items common to volumes on both OpenEdition Books and OpenEdition Journals:

Volume title

xpath : ./mods:titleInfo[not(@type)]/mods:title
To allow XML validation of the METS file, the HTML code must be placed in a CDATA. Example: <![CDATA[ <p>Lorem <em>Ipsum</em></p> ]]>

Volume subtitle

xpath : ./mods:titleInfo/mods:subTitle
To allow XML validation of the METS file, the HTML code must be placed in a CDATA. Example: <![CDATA[ <p>Lorem <em>Ipsum</em></p> ]]>

Volume translated title(s)

xpath: ./mods:titleInfo[@type="translated"]/mods:title[@xml:lang="LANG-ISO639-1"]
Where LANG-ISO639-1 corresponds to the language of the title translated according to ISO639-1 (fr, en, de, etc.). The element <mods: title> can be repeated with different xml: lang values if the title is translated into multiple languages.

ISBN

Rarely used on OpenEdition Books: for books, the ISBN is registered in the Freemium space and not on Lodel.
xpath: ./mods:identifier[@type="isbn"]

Director(s) of the publication

  • Surname:
    xpath: ./mods:name[mods:role/mods:roleTerm/text()="director"]/mods:namePart[@type="family"]
  • First name:
    xpath: ./mods:name[mods:role/mods:roleTerm/text()="director"]/mods:namePart[@type="given"]
  • Description:
    xpath: ./mods:name[mods:role/mods:roleTerm/text()="director"]/mods:description
    This field can contain plain text or simple HTML elements (<p>, <em>, <strong>, <br/>, <i>, <sub>, <sup>, <span style="font-variant:small-caps;">). To allow XML validation of the METS file, the HTML code must be placed in a CDATA. Example: <![CDATA[ <p>Lorem <em>Ipsum</em></p> ]]>

Physical description of the work

Contains the physical description of the print edition.
xpath: ./mets:xmlData/mods:physicalDescription/mods:note

Note: On OpenEdition Books, this information is not displayed.

Indexing

Indexing elements must be placed in <mods: subjects> tags, distinguished by an “authority” and sometimes “xml: lang” attribute.

  • Keywords in French:
    xpath: ./mods:subject[@authority="motsclesfr"]/mods:topic
  • Keywords:
    xpath: ./mods:subject[@authority="motsclesen"]/mods:topic
  • Parole chiave:
    xpath: ./mods:subject[@authority="motsclesit"]/mods:topic
  • Schlagwortindex:
    xpath: ./mods:subject[@authority="motsclesde"]/mods:topic
  • Palabras claves:
    xpath: ./mods:subject[@authority="motscleses"]/mods:topic
  • Palavras chaves:
    xpath: ./mods:subject[@authority="motsclespt"]/mods:topic

Note: on OpenEdition Journals, the index entries do not appear in the issue table of contents, but they are visible on the index pages.

Metadata of editorial units (article, chapter, etc.)

They are not necessary because they are already included in the TEI files of these editorial units.

Subparts

Metadata of the type “subpart” are necessary but reduced. In most cases there will only be the title.
xpath: //mods:titleInfo/mods:title

For example, for a subpart in a magazine:_

<mets:dmdSec ID="Z">
  <mets:mdWrap MDTYPE="MODS" MIMETYPE="text/xml">
    <mets:xmlData>
      <mods:titleInfo>
        <mods:title>Subpart title</mods:title>
      </mods:titleInfo>
    </mets:xmlData>
  </mets:mdWrap>
</mets:dmdSec>

When an element of the TYPE “souspartie” is stated in the <StructMap>:

<mets:structMap>
  <mets:div TYPE="numero" DMDID="X">		
    <mets: div TYPE="souspartie" ORDER="1" LABEL="Subpart title" DMDID="Z">
      <mets:div TYPE="article" ORDER="1" LABEL="Title of first article in subpart">
        <mets:fptr FILEID="Y-tei1"/>
        <mets:fptr FILEID="Y-pdf1"/>
      </mets:div>
    </mets:div>
  </mets:div>			
</mets:structMap> 

Note: On OpenEdition Journals, a subpart can have its own introduction (see the MODS documentation specific to OpenEdition Journals). On the other hand, on OpenEdition Books, any untitled text element at the beginning of a subpart must be treated as a different documentary unit with an ad hoc title (the type of document recommended is “avantpropos”).

Annex files: frontcover, back cover, table of contents, snapshot image and facsimile

The metadata of the types “couverture1”, “couverture4”, “tdm”, “imageaccroche” and “facsimile” are necessary but reduced. In most cases there will only be the title.

xpath: //mods:titleInfo/mods:title

For example:

For the cover:

<mets:dmdSec ID="X">
  <mets:mdWrap MDTYPE="MODS" MIMETYPE="text/xml">
    <mets:xmlData>
      <mods:titleInfo>
        <mods: title>Cover image title</ mods: title>
      </mods:titleInfo>
    </mets:xmlData>
  </mets:mdWrap>
</mets:dmdSec>

When an element of the TYPE “couverture1” is stated in the <StructMap>:

<mets:structMap>
  ...
  MT<mets:div TYPE="frontcover" ORDER="Z" LABEL="Title of cover image" DMDID="X">
    <mets:fptr FILEID="Y" />
  </mets:div>
  ...
<mets:structMap>

And which is also found in the <mets:fileSec>:

<mets:fileSec>
  ...
  <mets:fileGrp ID="img_files">
    <mets:file ID="Y" MIMETYPE="image/png" CHECKSUM="W" CHECKSUMTYPE="WW">
      <mets:FLocat LOCTYPE="URL" xlink:href="files/Y.png" />
    </mets:file>
  </mets:fileGrp>
  ...
</mets:fileSec>