Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

parameters to the TEI media type (generalizing their usage + a repository) #1483

Open
bansp opened this issue Jul 13, 2016 · 7 comments
Open

Comments

@bansp
Copy link
Member

bansp commented Jul 13, 2016

This ticket is probably the first of a group of tickets -- I am trying to split my thinking into issue-able chunks. It should be seen as somewhat related to issue #564, and in more than one way. It is to some extent parallel, but later on I am going to ask about points of contact.


My thoughts have recently been circling, sometimes whirling, around RFC6129 "The 'application/tei+xml' Media Type" and the consequences of its introduction. In this ticket, I would like to fly two questions/issues by the Council, but first let me provide more context:

  • TEI is one of the recommended encoding formats (or toolkits for defining formats, we know the rote...) within CLARIN ERIC, the language-resource-oriented European Research Infrastructure.
  • CLARIN offers various web services, and among them is the system that you again might have heard about, namely the configurable pipeline system called WebLicht. Weblicht can ingest 'plain' TEI and output an annotated version, and you decide what exactly is chained in this pipeline and in what order.
  • CLARIN features various encoding formats, and among them is the most recent product of ISO-TEI cooperation, namely the format for encoding speech transcription. It’s a TEI ‘flavour’.
  • Another format in wide use is that of the German Text Archive (Deutsches Text Archiv, DTA). It’s also a TEI ‘flavour’.
  • Both formats are ingested by WebLicht pipelines and operated on, in different ways. I have summarized the basis for the current practice of "flavour detection" in a wiki article, with a title not restricted to CLARIN, for a good reason (and while you're looking there, allow your gaze not to rest on the tokenized parameter; it might make an appearance in another ticket). WebLicht knows which TEI flavour it is ingesting thanks to an extension of the TEI media type, namely parameters, and currently it looks like this:
    • application/tei+xml;format-variant=tei-iso-spoken for speech transcription
    • application/tei+xml;format-variant=tei-dta for DTA

Now the promised two questions:

  • wouldn’t TEI Simple (together with TEI-for-jTEI, and others) qualify as yet another flavour that could/should be recognized by services in an analogous way? (please note: this not meant to solve issue create <schemaRef> element to give pointing to ODD. #564 -- it's merely a point of contact)
  • a bit independently of the decision on the issue above, wouldn't it be cool if the TEI had a central repository of such parameters to its definition of media type, so that there exists a reference point, and a place where this can get registered and kept together for minimally, verification and curation?

(And I itch to mention that this information should reside either in ODD or in the header, or in both. But let's keep that for a separate issue or for discussion inside issue #564.)

@bansp
Copy link
Member Author

bansp commented Jul 20, 2016

To be sure, I can think of several other TEI annotation formats ('flavours') that may benefit from this, whether or not Simple gets the stamp. It's just that my direct expertise is limited to linguistic annotation formats but it's obvious that the solution is generalizable across the TEI.

@laurentromary
Copy link
Contributor

This seems like a sensible move. If council agrees with this, we could think of revising the TEI media type document accordingly, since your proposal seems to impact (parameter?) on it.

@bansp
Copy link
Member Author

bansp commented Jul 20, 2016

Indeed, a sensible move would be to first gather the information and gauge the interest of the community. I'd rather do that with the Council's blessing or at least approval.
If Sigfrid is not available for the IETF task then I will be glad to help out.

@hcayless
Copy link
Member

hcayless commented Aug 8, 2016

I'm tempted just to say "go for it, @bansp". This seems a potentially useful thing, but I've no idea what the level of interest might be, and it would be good to start there. Any objections or input from other Council members?

@emylonas
Copy link
Contributor

Similar discussion took place with respect to identifying schemas and resulted in proposal for schemaRef. This proposal provides a way for applications to avoid looking in the file for for schema information, which is very useful.
Council thinks Piotr should go ahead.

@bansp
Copy link
Member Author

bansp commented Oct 10, 2016

Thanks! :-)

@bansp
Copy link
Member Author

bansp commented Mar 23, 2017

I have opened a project for this issue at https://github.com/orgs/LingSIG/projects/5
There is also a project at https://github.com/clarin-eric/standards/projects/2
(apologies if they are inaccessible -- it has been reported to me that one apparently has to be a member of a team in order to see projects, even in open repositories)

@martinascholger martinascholger added this to the Guidelines 4.1.0 milestone Jun 14, 2020
@ebeshero ebeshero modified the milestones: Guidelines 4.6.0, Guidelines 4.7.0 Apr 3, 2023
@ebeshero ebeshero modified the milestones: Guidelines 4.7.0, Guidelines 4.8.0 Nov 10, 2023
@ebeshero ebeshero self-assigned this Nov 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants