Skip to content
Permalink
master
Switch branches/tags
Go to file
 
 
Cannot retrieve contributors at this time

Summary

The corpus UD_French-FQB is an automatic conversion of the French QuestionBank v1, a corpus entirely made of questions.

Introduction

The original French QuestionBank is described in Hard Time Parsing Questions: Building a QuestionBank for French.. It was converted to UD with the conversion system described in the chapter 3 of the book Application of Graph Rewriting to Natural Language Processing and available on Inria Gitlab.

The original annotation scheme versions (phrase-structure, surface dependencies following the FTB scheme, Deep syntax annotations following the Deep Sequoia scheme are available at the following URL.

Recommended splits

Due to the UD constraints on the test set size (at least 10k tokens), we recommend to simply concatenate this treebank to the Sequoia and FTB treebanks in order to get a robust, less domain sensitive, parser. Those 3 treebanks are perfectly compatible and were converted by the same team.

In our own experiments, we either used the UD_French-FQB in a 10-folds cross-validation scenario or in a train/dev/test scenario with the i_th sentence in train, i_th+1 in dev, i_th+2 in test.

Statistics

  • sentences: 2289
  • words: 23236
  • Average sentence length: 10.15

Domains

  • TREC 08-11: 1893 sents.
  • French Government/NGOs FAQs: 196 sents.
  • CLEF 03: 200 (sents.)

Note that the TREC domain questions are a translation of the corresponding questions in the English Question Bank (Judge et al, 2006).

Acknowledgments

References

Changelog

  • 2020-11-15 v2.7
    • New conversion from original treebank
  • 2019-11-15 v2.5
    • Update the conversion process to improve consistency with other French treebanks:
      • expletive annotation with relations expl:subj, expl:comp and expl:pass
      • aux -> aux:tense
      • MWEPOS -> EXTPOS
  • 2019-05-15 v2.4
    • Initial release in Universal Dependencies.
=== Machine-readable metadata (DO NOT REMOVE!) ================================
Data available since: UD v2.4
License: LGPL-LR
Includes text: yes
Genre: nonfiction news
Lemmas: converted from manual
UPOS: converted from manual
XPOS: manual native
Features: converted from manual
Relations: converted from manual
Contributors: Seddah, Djamé; Candito, Marie; Guillaume, Bruno
Contributing: elsewhere
Contact: djame.seddah@gmail.com
===============================================================================