Skip to content
Branch: master
Find file Copy path
Find file Copy path
3 contributors

Users who have contributed to this file

@dseddah @dan-zeman @bguil
64 lines (42 sloc) 3.5 KB


The corpus UD_French-FQB is an automatic conversion of the French QuestionBank v1, a corpus entirely made of questions


The original French QuestionBank is described in Hard Time Parsing Questions: Building a QuestionBank for French.. It was converted to UD with the conversion system described in the chapter 3 of the book Application of Graph Rewriting to Natural Language Processing and available on Inria Gitlab.

The original annotation scheme versions (phrase-structure, surface dependencies following the FTB scheme, Deep syntax annotations following the Deep Sequoia scheme are available at the following URL.

Recommended splits

Due to the UD constraints on the test set size (at least 10k tokens) , we recommand to simply concatenate this treebank to the Sequoia and FTB treebanks in order to get a robust, less domain sensitive, parser. Those 3 treebanks are perfectly compatible and were converted by the same team.

In our own experiments, we either used the UD_French-FQB in a 10-folds cross-validation scenario or in a train/dev/test scenario whith the i_th sentence in train, i_th+1 in dev, i_th+2 in test.


  • sentences: 2289
  • words: 23236
  • Average sentence lenght: 10.15


  • TREC 08-11: 1893 sents.
  • French Government/NGOs FAQs: 196 sents.
  • CLEF 03: 200 (sents.)

Note that the TREC domain questions are a translation of the corresponding questions in the English Question Bank (Judge et al, 2006).



=== Machine-readable metadata (DO NOT REMOVE!) ================================
Data available since: UD v2.4
License: LGPL-LR
Includes text: yes
Genre: nonfiction news
Lemmas: converted from manual
UPOS: converted from manual
XPOS: manual native
Features: converted from manual
Relations: converted from manual
Contributors: Seddah, Djamé; Candito, Marie; Guillaume, Bruno
Contributing: elsewhere
You can’t perform that action at this time.