Skip to content

Paging in TEI and similar media types #34

@jonathanrobie

Description

@jonathanrobie

TEI is a primary data type for much of our data. Some of our data comes from traditional print books that have pages with page numbers, and a server may provide the data from such a book one page at a time. When working with larger documents, many clients will choose to set an upper limit on the size of the data returned in any one request. Both cases call for paging.

What is a page?

In TEI, printed document page numbers are often identified by pb milestones, which may not align naturally with the boundaries of a document or div. The data found between two pb milestones may not be a well-formed XML structure. For instance:

<div2 type="chapter" n="14">
   <docAuthor lang="la">
      <persName lang="la" key="tlg-0261">
         <foreign lang="la">simonides</foreign>
      </persName>
   </docAuthor>
   <label>ἐπὶ ἑξαμέτρῳ πεντάμετρον καὶ δύο τρίμετροι, εἶτα ἑξάμετρον</label>
   <l n="1">Ἀργεῖος Δάνδης σταδιοδρόμος ἐνθάδε κεῖται,</l>
   <l n="2">νίκαις ἱππόβοτον πατρίδʼ ἐπευκλεΐσας,</l>
   <pb id="v.5.p.10"/>
   <l n="3">Ὀλυμπίᾳ δίς, ἐν δὲ Πυθῶνι τρία,</l>
   <l n="4">δύω δʼ ἐν Ἰσθμῷ, πεντεκαίδεκʼ ἐν Νεμέᾳ·</l>
   <l n="5">τὰς δʼ ἄλλας νίκας οὐκ εὐμαρές ἐστʼ ἀριθμῆσαι.</l>
</div2>

Similarly, if a client chooses an upper limit on the page size, that may not correspond to any well-formed XML structure.

How does the server choose what to provide on each page? To what extent does well-formedness determine this? Should the server transform such data in any way before returning the response?

How do we provide paging link relations?

TEI documents are unlikely to provide links that correspond to the paging required here. How do we provide these links? One possibility is to use the paging strategy used in the GIthub API, based on the Link response header:

Link: <https://api.github.com/search/code?q=addClass+user%3Amozilla&page=15>; rel="next",
  <https://api.github.com/search/code?q=addClass+user%3Amozilla&page=34>; rel="last",
  <https://api.github.com/search/code?q=addClass+user%3Amozilla&page=1>; rel="first",
  <https://api.github.com/search/code?q=addClass+user%3Amozilla&page=13>; rel="prev"

Metadata

Metadata

Assignees

No one assigned

    Labels

    Document EndpointIssues that deal with the Document Endpoint

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions