Fragment identifier API for ALTO #33

Closed
jpmoreux opened this Issue Oct 26, 2015 · 5 comments

Comments

Projects
None yet
4 participants
@jpmoreux
Member

jpmoreux commented Oct 26, 2015

The ALTO Fragment Identifier API is a proposal for a web service that, in response to a standard HTTP or HTTPS request:

  • references arbitrary content within an ALTO file through the use of fragment identifiers (referencing),
  • returns the XML contents referenced by such identifiers (dereferencing).

This service aims to facilitate reuse of ALTO resources in digital librairies (bookmarks, annotations...). It could be used to embody the concept of hyperlinking within ALTO documents, and to access to the content itself.

The URI could specify any portion of ALTO file (paragraph, string, illustration...) referenced by various mechanisms (ID, spatial offset, order...), range of contents (paragraphs 2 to 5), etc.

Note : the ALTO schema is not impacted. The whole idea is to edit a specification to be implemented by digital libraries (if they are willing to).

Use cases

See: http://prezi.com/6fvgzri_z3b3/?utm_campaign=share&utm_medium=copy

a. A digital library user wants to reference a specific marginalia on a specific page of a digital document, given its spatial position:
-> http://gallica.bnf.fr/ark:/12148/bpt6k96006893/f20.alto/id/@89:485
RETURNS a list of block IDs : ("PAG_00000020_TB000010")

-> http://gallica.bnf.fr/ark:/12148/bpt6k96006893/f20.alto/xml/TextBlock[ID=PAG_00000020_TB000010]
RETURNS: the TextBlock XML element
<TextBlock ID="PAG_00000020_TB000010" WIDTH="1386" HEIGHT="287" VPOS="1090" HPOS="1303" STYLEREFS="TXT_18" LANG="fr"
<TextLine ID="PAG_00000020_TL000016" WIDTH="1383" HEIGHT="63" VPOS="1090" HPOS="1304" STYLEREFS="TXT_18" <String ID="PAG_00000020_ST000071" ...

b. An application wants to list all the images on a specific page of a digital document:
-> http://gallica.bnf.fr/ark:/12148/bpt6k96128443/f26.alto/id/Illustration
RETURNS a list of block IDs: ("PAG_00000026_IL000001")

-> http://gallica.bnf.fr/ark:/12148/bpt6k96128443/f26.alto/xml/Illustration[ID=PAG_00000026_IL000001]
RETURNS the XML element:
<Illustration ID="PAG_00000026_IL000001" HPOS="744" VPOS="707" HEIGHT="3410" WIDTH="819"/

From this XML content, the application can then extract the illustration using IIIF:
-> http://gallica.bnf.fr/iiif/ark:/12148/bpt6k96128443/f26/744,707,819,3569/full/0/native.jpg

c. An application wants to extract all the text within the print space of a specific page:
-> http://gallica.bnf.fr/ark:/12148/bpt6k96128443/f26.alto/id/PrintSpace/*[@CONTENT]
RETURNS a list of block IDs: ("PAG_00000026_TB000002","PAG_00000026_TB000003","PAG_00000026_TB000004"...)

From this IDs, the application can then extract the XML elements and filter the text blocks to access the text itself.

Inspiration

IIIF Image API (http://iiif.io/api/image/2.0) specifies a web service that returns an image. The HTTP request can specify the region, size, rotation, quality characteristics and format of the requested image
-> http://gallica.bnf.fr/iiif/ark:/12148/bpt6k65372641/f1/1165.4351015801358,833.7189616252821,969.8363431151238,964.1647855530472/171,170/0/native.jpg

EPUB format as a recommended specification on Fragment Identifiers ( http://www.idpf.org/epub/linking/cfi/epub-cfi.html) that helps to express paths to specific locations within the content:
->
book.epub#epubcfi(/6/4[chap01ref]!/4[body01]/10[para05]/3:10)

Related work:
http://pro.europeana.eu/blogpost/europeana-aligns-with-the-international-image-interoperability-framework-iiif
http://pro.europeana.eu/files/Europeana_Professional/Projects/Project_list/Europeana_Cloud/Deliverables/D4.4%20Recommendations%20For%20Enhancing%20EDM%20to%20Support%20Research%20Oriented%20Content.pdf

Actions

  1. Use cases survey
  2. Contact with IIIF ?
  3. Syntax specs

@jpmoreux jpmoreux self-assigned this Oct 26, 2015

@cneud

This comment has been minimized.

Show comment
Hide comment
@cneud

cneud Jul 19, 2016

Member

In IIIF Presentation API, segments of XML files may be extracted with URL-embedded XPath expressions.
See http://iiif.io/api/presentation/2.1/#segments

Member

cneud commented Jul 19, 2016

In IIIF Presentation API, segments of XML files may be extracted with URL-embedded XPath expressions.
See http://iiif.io/api/presentation/2.1/#segments

@altomator

This comment has been minimized.

Show comment
Hide comment
@altomator

This comment has been minimized.

Show comment
Hide comment
@altomator

altomator May 4, 2017

IIIF Newspaper Implementation Notes: http://bit.ly/2a63PR6

IIIF Issues: https://github.com/IIIF/iiif-stories/issues
See #77, #78, #79, #80

IIIF Newspaper Implementation Notes: http://bit.ly/2a63PR6

IIIF Issues: https://github.com/IIIF/iiif-stories/issues
See #77, #78, #79, #80

@cowboyMontana

This comment has been minimized.

Show comment
Hide comment
@cowboyMontana

cowboyMontana May 4, 2017

Member

Issue renamed and repurposed. Closed.

Member

cowboyMontana commented May 4, 2017

Issue renamed and repurposed. Closed.

@cneud cneud removed the 1 submitted label Apr 24, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment