A script that uses pandoc to convert markdown to TEI-Lite conforming XML.
Lua XSLT Shell
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
markdown2tei.sh
postprocess.xsl
readme.md
tei-lite.template
tei.lua

readme.md

markdown2tei

This is, essentially, a bash wrapper around a custom pandoc writer and template, a simple regular expression (using sed), and an XSL script. It will convert a markdown file to an XML file conforming to the TEI Lite standard. Issues / pull requests, welcomed.

Requirements

In order to run, this script depends on:

Header Fields

For now, this script recognizes a limited subset of elements for a TEI header. These are all essentially translated into fields in the tei-lite.template file using the pandoc template system. (Links below will take one to the documentation for TEI Lite.) The fields currently implemented privilege metadata related to document transcription---they provide fields, therefore, for author/title of the electronic file as well as fields for a bibliographic citation of its source, a list of editors, and information about sources.

Currently, it requires only:

  • title: A title for the document. (For the titleStmt.)
  • author: at least one author. Each author's name is stored as two variables: forename and surname. titleStmt.)

Additionally, it also recognizes the optional fields:

  • editor: One or more "editors."
  • publicationStmt: Some prose describing the publication/distribution, contained in the publicationStmt. If no publicationStmt is provided, the template inserts simply, "Generated by pandoc.

The following (optional) fields are all stored as part of a bibliographic entry (bibl) under the source description (sourceDesc).

  • citation.title: Stored as <title level='a'>, that is, as an analytic title.
  • citation.container-title: For works (essays, articles, etc) which originally appeared as part of a larger work, container-title contains the name of the larger work. It is stored in the TEI header as <title>.
  • citation.date: A date, presumably of publication. Format is not specified.
  • citation.publisher: A publisher.
  • citation.publisher-place: Place of publication, stored as pubPlace.
  • citation.page: A page range, stored as biblScope.

Any sources used for a document or transcription can be described as one or more sources. These will be stored in a list.

Finally, one can describe the source for a document in unstructured prose in the citation.note field, which is converted to a <p> under the sourceDesc.

Additional metadata fields in the YAML header will simply be ignored. There is currently validation done on the header, so invalid field names or other problems will simply be passed over (unless they generate a YAML error). In principle, anything possibile in a TEI Lite header should be capable of being represented in YAML.