Summary

SiMoNERo is a medical corpus of contemporary Romanian.

Introduction

SiMoNERo contains texts from three medical subdomains: cardiology, diabetes, endocrinology. The texts come from scientific books, journal articles and blog posts, but predominant are those coming from books. The texts display the following levels of annotation: tokenization, POS tagging, lemmatization, syntactic parsing and medical Named Entities (of the following types: ANAT (body parts), CHEM (Chemicals and Drugs), DISO (disorders), and PROC (procedures)). All levels, except for the syntactic one, are hand validated. The description of the corpus creation (excluding the syntactic annotation) is presented in Mitrofan et al. (2019). The syntactic parsing was made with the NLP Cube (https://github.com/adobe/NLP-Cube) system.

Acknowledgments

We are grateful to the following texts providers: http://federatiaromanadiabet.ro (accessed November 2016), https://rmj.com.ro/ (accessed November 2016), https://societate-diabet.ro/ (accessed November 2016), http://pentrudiabet.ro (accessed November 2016).

References

Maria Mitrofan, Verginica Barbu Mititelu, Grigorina Mitrofan, MoNERo: a Biomedical Gold Standard Corpus for the Romanian Language, in Proceedings of the BioNLP workshop, Florence, Italy, 1 August 2019, p. 71-79, Association for Computational Linguistics (https://www.aclweb.org/anthology/W19-5008).

Changelog

2019-11-15 v2.5
- Initial release in Universal Dependencies.

=== Machine-readable metadata (DO NOT REMOVE!) ================================
Data available since: UD v2.5
License: CC BY-SA 4.0
Includes text: yes
Genre: medical
Lemmas: converted from manual
UPOS: converted from manual
XPOS: manual native
Features: converted from manual
Relations: converted from manual
Contributors: Mitrofan, Maria, Barbu Mititelu, Verginica
Contributing: elsewhere
Contact: maria@racai.ro vergi@racai.ro
===============================================================================

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Summary

Introduction

Acknowledgments

References

Changelog

Files

README.md

Latest commit

History

README.md

File metadata and controls

Summary

Introduction

Acknowledgments

References

Changelog