Lemmatizing medieval French with Pie & RNNTagger

This repository contains a couple of notebooks created to easily generate a langague model for Pie and RNNTagger for tagging and lemmatizing medieval French with no prior local installation and fastened training. Any training dataset can be used if the language-specific parameters are properly configured for each tool's parameters file. Also, it includes the possibility of tagging files from our Drive with the generated models.

Short description of the task

We've trained a model for medieval French with RNNTagger and Pie in order to tag a number of texts with the Cattex09 morphosyntactic labels. Two different corpora are used for training :

BFMGOLDLEM corpus, fully annotated in parts of speech (UD and Cattex POS tags) including a number of morphological labels. The lemmas in this corpus were previously standarized in a previous work. This corpus consists of 431,144 tokens distributed in 20 texts (36.1MB).
BFMGOLD corpus, where only a small number of texts include all lemmas. It contains 1,187,061 tokens distributed in 42 texts (75.4MB).

A complete description can be found here [French]

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
README.md		README.md
old_french_pie.ipynb		old_french_pie.ipynb
old_french_rnntagger.ipynb		old_french_rnntagger.ipynb
pie_train_dev_data.ipynb		pie_train_dev_data.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Lemmatizing medieval French with Pie & RNNTagger

Short description of the task

About

Releases

Packages

Languages

CristinaGHolgado/old-french-with-pie-rnntagger

Folders and files

Latest commit

History

Repository files navigation

Lemmatizing medieval French with Pie & RNNTagger

Short description of the task

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages