Recommendation for link to ALTO in iiif manifest #40

Open
cneud opened this Issue Jul 21, 2016 · 11 comments

Comments

Projects
None yet
4 participants
@cneud
Member

cneud commented Jul 21, 2016

The iiif defines a Presentation API that allows the representation of - where available - OCR results in ALTO as annotations, linked by a manifest.

Example:

seeAlso: {
@id: "http://wellcomelibrary.org/service/alto/b19956435/0?image=0",
format: "application/alto+xml", 
profile: "http://www.loc.gov/standards/alto/",
label: "ALTO"\
}

It would be good to have a recommendation from the ALTO board on the values for two fields, format and label. The format should resemble a MIME-type, e.g. application/xml or text/xml, while the later can be a simple text like "ALTO XML", "ALTO OCR" or similar.

@cneud cneud added the 1 submitted label Jul 21, 2016

@cneud

This comment has been minimized.

Show comment
Hide comment
@cneud

cneud Jul 27, 2016

Member

Also, should the profile refer to the XSD, namespace or other?

Member

cneud commented Jul 27, 2016

Also, should the profile refer to the XSD, namespace or other?

@kba

This comment has been minimized.

Show comment
Hide comment
@kba

kba Jul 27, 2016

Since version information can be important for data consumers, a reference that indicates the version would make sense for the profile. If there are no breaking changes between minor versions with regards to how OCR text is expressed in ALTO, the namespace would suffice.

kba commented Jul 27, 2016

Since version information can be important for data consumers, a reference that indicates the version would make sense for the profile. If there are no breaking changes between minor versions with regards to how OCR text is expressed in ALTO, the namespace would suffice.

@Jo-CCS

This comment has been minimized.

Show comment
Hide comment
@Jo-CCS

Jo-CCS Jul 28, 2016

Member

First of all I appreciate the initiative of iif and glad alto is considered on this api as one standard format.
Due to the case that ALTO is not containing appliciation specific information than containing text content, the format should be "text/xml". This is according to what was has been used on MIMETYPE attribute in METS on existing METS profiles and as done on the Europeana newspaper project.
I agree regarding the "profile" to statement of "kba".
Regarding the "label" I suppose this is only used for display purpose and spacing is no issue on this.

So I would recommend as followed for an ALTO file of version 3:

seeAlso: {
@id: "http://wellcomelibrary.org/service/alto/b19956435/0?image=0",
format: "text/xml", 
profile: "http://www.loc.gov/standards/alto/v3",
label: "ALTO XML"\
}
Member

Jo-CCS commented Jul 28, 2016

First of all I appreciate the initiative of iif and glad alto is considered on this api as one standard format.
Due to the case that ALTO is not containing appliciation specific information than containing text content, the format should be "text/xml". This is according to what was has been used on MIMETYPE attribute in METS on existing METS profiles and as done on the Europeana newspaper project.
I agree regarding the "profile" to statement of "kba".
Regarding the "label" I suppose this is only used for display purpose and spacing is no issue on this.

So I would recommend as followed for an ALTO file of version 3:

seeAlso: {
@id: "http://wellcomelibrary.org/service/alto/b19956435/0?image=0",
format: "text/xml", 
profile: "http://www.loc.gov/standards/alto/v3",
label: "ALTO XML"\
}
@cneud

This comment has been minimized.

Show comment
Hide comment
@cneud

cneud Jul 28, 2016

Member

I wonder whether it might be worth considering the registration of a MIME type "application/alto+xml", similar to what RFC6207 specifies for METS/MODS/MADS/MARC21/SRU.

Member

cneud commented Jul 28, 2016

I wonder whether it might be worth considering the registration of a MIME type "application/alto+xml", similar to what RFC6207 specifies for METS/MODS/MADS/MARC21/SRU.

@cneud

This comment has been minimized.

Show comment
Hide comment
@cneud

cneud Jul 28, 2016

Member

@Jo-CCS Yes, "label" is a free text field and only used for orientation.

Member

cneud commented Jul 28, 2016

@Jo-CCS Yes, "label" is a free text field and only used for orientation.

@Jo-CCS

This comment has been minimized.

Show comment
Hide comment
@Jo-CCS

Jo-CCS Aug 8, 2016

Member

Yes, also a registration of MIME type "application/alto+xml" makes sense to me.

Member

Jo-CCS commented Aug 8, 2016

Yes, also a registration of MIME type "application/alto+xml" makes sense to me.

@cneud cneud self-assigned this Oct 21, 2016

@altomator

This comment has been minimized.

Show comment
Hide comment
@altomator

altomator Oct 27, 2016

"application/alto+xml" sounds great to me. IIIF documentation has already some samples with "application/tei+xml"

"application/alto+xml" sounds great to me. IIIF documentation has already some samples with "application/tei+xml"

@cneud

This comment has been minimized.

Show comment
Hide comment
@cneud

cneud Oct 27, 2016

Member

Note that "application/tei+xml" also has RFC6129 supporting it. We should therefore check whether "application/alto+xml" can be included in an update to RFC6207 and how, or whether a new RFC must be prepared (by whom?)

Member

cneud commented Oct 27, 2016

Note that "application/tei+xml" also has RFC6129 supporting it. We should therefore check whether "application/alto+xml" can be included in an update to RFC6207 and how, or whether a new RFC must be prepared (by whom?)

@altomator

This comment has been minimized.

Show comment
Hide comment
@altomator

altomator Jan 10, 2017

To register alto+xml, we need to write a RFC and submit it to iana.org.
-> tei+xml : https://tools.ietf.org/html/rfc6129

My BnF colleagues argue that it's not mandatory. Eg: application/warc
isn't declared at IANA but it's an ISO standard.
-> http://www.iso.org/iso/catalogue_detail.htm?csnumber=44717

To register alto+xml, we need to write a RFC and submit it to iana.org.
-> tei+xml : https://tools.ietf.org/html/rfc6129

My BnF colleagues argue that it's not mandatory. Eg: application/warc
isn't declared at IANA but it's an ISO standard.
-> http://www.iso.org/iso/catalogue_detail.htm?csnumber=44717

@cneud

This comment has been minimized.

Show comment
Hide comment
@cneud

cneud Jan 10, 2017

Member

Certainly one can also live without the RFC, but note that due to this, WARC is also not currently considered a registered MIME-type, cf. https://kris-sigur.blogspot.de/2016/05/warc-mime-type.html
"if we wish to have this standardized then going through this process is the only option"

Member

cneud commented Jan 10, 2017

Certainly one can also live without the RFC, but note that due to this, WARC is also not currently considered a registered MIME-type, cf. https://kris-sigur.blogspot.de/2016/05/warc-mime-type.html
"if we wish to have this standardized then going through this process is the only option"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment