Modéliser le changement: Les voies de français (MCVF) and Penn-BFM Parsed Corpus of Historical French (PPCHF) ===================================================================
The mcvf-plus-ppchf repository is part of an overarching project to make available parsed texts of historical French for linguistic research. It includes two morphosyntactically annotated corpora of Old and Middle French, which together contain over 1.6 million words of running text.
- Modéliser le changement: Les voies de français (MCVF), versions 1.0 and 2.0 - 843,427 words
- Penn-BFM Parsed Corpus of Historical French (PPCHF), version 1.0 - 762,814 words
Note: Here and throughout the documentation, "number of words" excludes punctuation and metadata and reflects word tokenization in accordance with the annotation guidelines (<https://www.ling.upenn.edu/~beatrice/corpus-ling/annotation-french>).
The repository also contains:
- Information concerning the sources of the texts
- Guidelines concerning the morphosyntactic annotation
The text encoding for the corpora is UTF-8.
The files in this repository are distributed under Creative Commons License Attribution-NonCommercial-ShareAlike 4.0 International CC BY-NC-SA 4.0 (<https://creativecommons.org/licenses/by-nc-sa/4.0>).
Beatrice Santorini (beatrice DOT santorini AT gmail DOT com)
Please do not hesitate to report errors of any sort or suggestions for improvement.
- We thank France Martineau, who directed the MCVF project (2005-2010), for entrusting us with the ongoing curation and distribution of the resulting MCVF Corpus.
- The philological infrastructure provided by the Base Français Médiéval (BFM) project (<http://txm.bfm-corpus.org>) has been invaluable, especially in the construction of the PPCHF, more than half of which is based on BFM's online editions. We thank the BFM administrators, Céline Guillot-Barbance and Alexei Lavrentiev, for permission to distribute the BFM-based files and for their quick and cheerful assistance with the documentation concerning the BFM sources.
- Achim Stein pointed us to the BFM.
- Anne Carlier and Paul Hirschbühler provided practical and moral support.
- A special thank you goes to Alexandra Simonenko for her disciplined error reports over many years and her inspiring commitment to the project.
The POS-tagged and parsed files in this repository are annotated according to guidelines developed by Beatrice Santorini and Rodica Diaconescu, which extend guidelines for annotating historical English (<https://www.ling.upenn.edu/~beatrice/corpus-ling/annotation>). The guidelines for historical French are also available at <https://www.ling.upenn.edu/~beatrice/corpus-ling/annotation-french>. All of the parsed files in MCVF v2.0 and in PPCHF have been brought to the same level of consistency and adherence to the guidelines.
The parsed files in the repository can be searched with CorpusSearch 2, a Java program developed by Tony Kroch and Beth Randall for searching, revising, and coding parsed corpora. The program can be downloaded at <https://sourceforge.net/projects/corpussearch>.
<p> The original users guide site is no longer being maintained. Please refer instead to the corrected and revised live version (<https://www.ling.upenn.edu/~beatrice/corpus-ling/CS-users-guide>).