Skip to content

a-rios/CorefMT

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CorefMT

Annotated corpora used in experiments in papers:

https://aclweb.org/anthology/E/E17/E17-2104.pdf

@inproceedings{zora136447,
       booktitle = {15th Conference of the European Chapter of the Association for Computational Linguistics},
           month = {April},
           title = {Co-reference Resolution of Elided Subjects and Possessive Pronouns in Spanish-English Statistical Machine Translation},
          author = {Annette Rios and Don Tuggener},
       publisher = {Association for Computational Linguistics},
            year = {2017},
           pages = {657--662},
             url = {http://dx.doi.org/10.5167/uzh-136447}
}

and

https://aclweb.org/anthology/E/E17/E17-2100.pdf

@inproceedings{zora136594,
       booktitle = {15th Conference of the European Chapter of the Association for Computational Linguistics},
           month = {April},
           title = {Machine Translation of Spanish Personal and Possessive Pronouns Using Anaphora Probabilities},
          author = {Ngoc Quang Luong and Andrei Popescu-Belis and Annette Rios and Don Tuggener},
       publisher = {Association for Computational Linguistics},
            year = {2017},
           pages = {631--636},
             url = {http://dx.doi.org/10.5167/uzh-136594}
}

Annotated News Commentary Corpus (v11) see: http://opus.lingfil.uu.se/download.php?f=News-Commentary11/News-Commentary11.tar.gz

Contents:

  • es-en: pure text, sentence aligend
  • es-en-trees: English text, Spanish dependency trees in Moses XML (binarized)
  • es-en-posscoref: pure text, Spanish with dummies for null subjects and annotated possessive pronoun 'su/sus' and annotated relative pronoun 'que'
  • es-en-posscoref-trees: English text, Spanish dependency trees in Moses XML, with dummies for null subjects and annotated possessive pronoun 'su/sus' and annotated relative pronoun 'que'

nc11.es.coref.conll.tar.bz2

  • annotated Spanish corpus, conll with entities

nc11.es.coref.chains.tar.bz2

  • *.mables: extracted markables
  • *.mables.chains: co-reference chains
  • *.ante_scores: scores of possible antecedent for each pronoun

The format of the mentions is as follows: [MentionID, SentenceID, MentionStartToken, MentionEndToken, PoS, Person, Gender, Number, Gram. Funct., Animate, Dependency Head Token, Gov. Verb, NE class, *, Head lemma]