Skip to content

Latest commit

 

History

History
48 lines (31 loc) · 2.64 KB

README.md

File metadata and controls

48 lines (31 loc) · 2.64 KB

Summary

ParlaMint-It is a collection of transcriptions of parliamentary sessions of the Italian Senate annotated in Universal Dependencies. The corpus is part of a larger multilingual collection of parliamentary transcripts built during the ParlaMint project (https://www.clarin.eu/parlamint).

Introduction

ParlaMint-It is a sub-section of the Italian section of the ParlaMint corpus (Agnoloni et al., 2022) and includes sentences automatically annotated in the UD annotation scheme which were also manually revised. The internal composition reflects the original one of ParlaMint corpus covering debates collected during two time periods: the COVID-19 pandemic period (November 2019 - November 2020) and a previous period (March 2013 - October 2019) to be used as reference.

Sentence ids explicitly mark the source of the sentence in the whole ParlaMint corpus.

Corpus splitting

The Corpus (701 sentences; 20460 tokens) has been randomly split as follows:

  • ParlaMint-It-train.conllu: 10026 tokens (326 sentences)
  • ParlaMint-It-dev.conllu: 10434 tokens (375 sentences)

References

Tommaso Agnoloni, Roberto Bartolini, Francesca Frontini, Carlo Marchetti, Simonetta Montemagni, Valeria Quochi, Manuela Ruisi, Giulia Venturi. 2022. Making Italian Parliamentary Records Machine-Actionable: The Construction of the ParlaMint-IT Corpus. In “Proceedings of LREC 2022, Workshop of ParlaCLARIN III”, Marseille, 20 June 2022, pp. 117-124.

Tomaž Erjavec, Maciej Ogrodniczuk, Petya Osenova, Nikola Ljubešić, Kiril Simov, Andrej Pančur, Michał Rudolf, Matyáš Kopp, Starkaður Barkarson ‪Steinþór Steingrímsson, Çağrı Çöltekin, Jesse de Does, Katrien Depuydt, Tommaso Agnoloni, Giulia Venturi, María Calzada Pérez, Luciana D. de Macedo, Costanza Navarretta, Giancarlo Luxardo, Matthew Coole, Paul Rayson, Vaidas Morkevičius, Tomas Krilavičius, Roberts Darģis, Orsolya Ring, Ruben van Heusden, Maarten Marx, and Darja Fišer. 2022. The ParlaMint corpora of parliamentary proceedings. In “Language Resources and Evaluation”, https://doi.org/10.1007/s10579-021-09574-0

Changelog

  • 2022-11-15 v2.11
    • Initial release in Universal Dependencies.
=== Machine-readable metadata (DO NOT REMOVE!) ================================
Data available since: UD v2.11
License: CC BY-SA 4.0
Includes text: yes
Genre: government legal
Lemmas: manual native
UPOS: manual native
XPOS: not available
Features: manual native
Relations: manual native
Contributors: Alzetta, Chiara; Sartor, Marta; Montemagni, Simonetta; Venturi, Giulia
Contributing: here
Contact: chiara.alzetta@ilc.cnr.it
===============================================================================