UD_Middle_French-PROFITEROLE is the Middle French section of the PROFITEROLE corpus, the Old French section is UD_OLD_FRENCH-PROFITEROLE.
UD_Middle_French-PROFITEROLE includes Middle French texts that were annotated during the PROFITEROLE funded project (Projet ANR- 16-CE38-0010, 2017-2022; supervised by Sophie Prévost; https://www.lattice.cnrs.fr/projets/projets-passes/projets-anr/projet-anr-profiterole) Texts were automatically annotated with part-of-speech and dependencies (with the SRCMF corpus <srcmf.org> as a training corpus), and are currently running a process of correction. Texs will be released in UD as they are corrected. Old French texts that were annotated in the PROFITEROLE project are to be found in UD_Old_French-PROFITEROLE.
Main development happens on the GitLab of the Profiterole Project.
UD_Middle_French-PROFITEROLE is meant to include texts spanning from the early 14th to the late 15th C. At the present it includes 3 (extracts of) texts dating from the late 14th and from the late 15th C. It includes XXX sentences and XXX tokens.
Sentences are annotated with the following metadata:
sent_id
: a unique id for each sentence in the treebanktext
: the sentencenewdoc_id
: a unique id for each of the texts. This id can be split on underscores to get back :- name of the text
- date
- form : verse and/or prose
The following table lists the texts used in this treebank [A REMETTRE DANS L'ORDRE CHRONOLOGIQUE]:
ID | Name of the text | Author | Tokens | Trees |
---|---|---|---|---|
grchronj2c5_1381_prose | Chroniques des règnes de Jean II et de Charles V | anonymous | 2710 | 103 |
Jehpar_1494_prose | jehan de Paris | anonymous | 5893 | 291 |
Commyn_1497_prose | Mémoires, Livre 1 | Philippe de Commynes | 3422 | 118 |
Total | 12025 | 512 |
Texts with less than about 40,000 words were entirely annotated, while texts with more than 40,000 words were sampled in three parts (beginning, middle and end of the text) to reach a total amount of about 40000 words.
At the moment, UD_Middle_French-PROFITEROLE includes 3 extracts of full texts, the remainig parts of which will be released in May 2024.
The treebank is currently not split. .. as follows (in number of tokens) :
We added some more specific relations (subtypes), either to specify a relation, or in the case of tokens entering a double dependency relation (typically : relative pronouns and contracted forms) :
acl:relcl
: relative clauseadvmod:obl
: contractedadvmod
+obl
(e.g. sin = si + en)aux:pass
: passive auxiliarycase:det
: contractedcase
+det
(e.g. del = de + le)cc:nc
: non coordinating conjunction (e.g. et at the beginning of a sentence)mark:advmod
:mark
andadvmod
(e.g. coment at the beginning of a subordinate clause)nsubj:advmod
: contractednsubj
+advmod
(e.g. jon = jo + en)nsubj:obj
: contractednsubj
+obj
(e.g. quil = qui + le)obj:advmod
: contractedadvmod
+obj
(e.g. sis = si + les)obj:advneg
: contractednegation
+obj
(e.g. nes = ne + les)obj:obl
: contractedobl
+obj
(e.g. oul = ou + le)obl:advmod
: the double labelling accounts for the difficulty to decide between obl and advmod relations (en
andi
).
We added some features:
Morph=VFin
: finite verbMorph=VInf
: non-finite verbMorph=VPar
: verbal participle
Consult the language specific documentation for further details.
UD_Middle_French-PROFITEROLE results from the automatic annotation (PROFITEROLE project, 2017-2022) of Middle French texts (with the PROFITEROLE/SRCMF Old French corpus being used as a training corpus), which were/are then manually corrected along with the UD guidelines. The contributors to the syntactic part of the PROFITEROLE project were: Prévost, Sophie; Villemonte de la Clergerie, Eric; Regnault, Mathilde; Grobol, Loïc; Crabbé, Benoît; Dehouck, Mathieu; Lavrentiev, Alexei.
Any deviations from the original annotation available on the GitLab of the Profiterole Project, especially regarding any potential errors introduiced during the process of fixing the treebank to fit UD requirements are the sole responsability of Mathieu Dehouck.
- Prévost, Sophie, Mathieu Dehouck, Alexei Lavrentiev, Serge Heiden et Loïc Grobol. To appear. ['Profiterole : un corpus morpho-syntaxique et syntaxique de français médiéval'], Corpus
-
2023-11-15 v2.13
- Initial release in Universal Dependencies.
-
2022-05-15 v2.10
- Initial repository creation in Universal Dependencies.
=== Machine-readable metadata (DO NOT REMOVE!) ================================ Data available since: UD v2.13 License: CC BY-NC-SA 4.0 Includes text: yes Genre: fiction nonfiction Lemmas: manual native UPOS: converted with corrections XPOS: manual native Features: automatic Relations: automatic with corrections Contributors: Prévost, Sophie; Villemonte de la Clergerie, Eric; Regnault, Mathilde; Grobol, Loïc; Crabbé, Benoît; Dehouck, Mathieu; Lavrentiev, Alexei Contributing: elsewhere Contact: sophie.prevost@ens.psl.eu ===============================================================================