Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FR: TEI Features for CMC #1955

Open
luengen opened this issue Dec 31, 2019 · 5 comments · May be fixed by #2537
Open

FR: TEI Features for CMC #1955

luengen opened this issue Dec 31, 2019 · 5 comments · May be fixed by #2537

Comments

@luengen
Copy link

luengen commented Dec 31, 2019

The TEI lacks models for encoding features of Computer-Mediated Communication (CMC). Since 2013, an international community of CMC researchers within the TEI SIG CMC have developed several TEI customisations for the encoding of CMC data of different genres (chat and whatsapp logfiles, threads in discussion forums usenet and on wikipedia talk pages, sequences of tweets, etc.) for the integration of CMC data into corpus infrastructures. These customisations (the ‘DeRiK schema’ 2012, the ‘CoMeRe schema’ 2014 and the ‘CLARIN-D schema’ 2016) have been used in different corpus projects; the experiences made with using these schemas have been documented and discussed in the work of the SIG, in TEI- and CLARIN-related events and on conferences and workshops dedidcated to the analysis of computer-mediated communication and to building and annotating CMC corpora. The customisations and resources have been made available via the wiki space of the CMC-SIG in the TEIWiki.

In 2019, we have distilled a „reduce to the max“ customisation from the previous customisations and the experiences made with these. We dubbed that customisation CMC-core and hererby submit it as a Feature Request to the TEI Council and community. It contains, in our view, the minimum extensions to the TEI needed for the encoding of textual data from CMC genres.

CMC-core introduces in a nutshell

  1. The post element as the basic unit of CMC encoding. It is defined to be a member of model.divPart.cmc;
  2. the model model.divPart.cmc which allows to use and combine occurrences of <post>, <u>, <kinesic>, <incident> and further elements within one and the same <div>;
  3. the attributes @mode, @replyTo, and @indentLevel for <post>
  4. the optional global attribute @creation which may indicate for any TEI element how its content was created in a CMC environment, i.e. directly by a human user, by the system, via a template, or other

CMC-core encoding example: Discussion thread on a Wikipedia talk page (Astonomical object)

<div type="thread">
   <head>Naturally occurring?</head>
   <post mode="written" xml:id="p4" indentLevel="0" who="#u005" synch="#t005">
      <p>I'm not sure that this is a proper criterium, or even what this means. What if we set an explosion that breaks a comet into two pieces? What if we build a moon? Cheers, <signed creation="template"><ref target="/wiki/User:Greenodd">Greenodd</ref> (<ref target="/wiki/User_talk:Greenodd">talk</ref>) <time>01:00, 21 July 2011 (UTC)</time></signed> </p>
   </post>
   <post mode="written" xml:id="p5" indentLevel="1" replyTo="#p4" who="#u006" synch="#t006">
      <p>Those haven't happened. If they do, we can revisit the concern. <signed creation="template"><ref target="/wiki/User:Praemonitus" >Praemonitus</ref> (<ref target="/wiki/User_talk:Praemonitus" >talk</ref>) <time>01:15, 1 April 2015 (UTC)</time></signed>  </p>
   </post>
</div>

Detailed documentation on CMC-core

The CMC-core customisation (ODD and sample annotations of CMC corpus files from our projects) can be found at the TEI SIG CMC Wiki.

The rationale of CMC-core is described in detail in an accepted paper for a special edition of the Corpus journal. We kindly ask you to consult this paper for a detailed rationale for the suggestion of the abovementioned additions and modifications to the TEI.
We ask the council to consider the inclusion of the abovementioned additions and modifications as official features of the TEI encoding framework.

Wishing you all a happy new year 2020 and looking forward to your comments and replies!

Michael Beißwenger
Harald Lüngen
(on behalf of the TEI SIG CMC)

@bansp
Copy link
Member

bansp commented Apr 16, 2020

This should be extremely interesting for the CLARIN community, given the overall interest in CMC nowadays. Is there a chance for a customization of CMC TEI to be served from Roma as one of the presets, perhaps?

@lujessica lujessica self-assigned this May 2, 2020
@ebeshero
Copy link
Member

ebeshero commented May 3, 2020

VF2F2020: Council decides those assigned to this ticket will meet to review the proposal carefully, see how it best fits into the Guidelines—perhaps as a new chapter—and return to the working group with a proposal for how to proceed.

@luengen
Copy link
Author

luengen commented Jun 2, 2020

Dear Elisa, dear colleagues, many thanks for this. Please note that we are available for more exchange and discussion, also for instance in a zoom meeting, if desired.
Best - Harald (with Michael Beißwenger, for the TEI SIG CMC)

@peterstadler
Copy link
Member

Just an update that a sub group (@luengen , @Beisswenger, @lujessica, @sydb, and @peterstadler) is working on this for already some months. Development is based on https://github.com/TEI-CMC-SIG/cmc-core and we now shifted towards a fork of the Guidelines at https://github.com/TEI-CMC-SIG/TEI/tree/cmc-features (branch 'cmc-features').

The HTML version—for facilitating review—is built at the TEI's Jenkins server: https://jenkins.tei-c.org/job/TEIP5-CMC-features/lastSuccessfulBuild/artifact/P5/release/doc/tei-p5-doc/en/html/CMC.html

@sabineseifert
Copy link
Contributor

Related PR #2537

@peterstadler peterstadler linked a pull request May 24, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants