Skip to content


Switch branches/tags


Corpus of Spanish Golden-Age Sonnets


This corpus comprises sonnets written in Spanish between the 16th and 17th centuries.

Each sonnet has been annotated in XML in accordance with the TEI standard. Besides the header and structural information, each sonnet includes the formal representation of each verse’s particular metrical pattern.

The pattern consists of a sequence of unstressed syllables (represented by the "-" sign) and stressed syllables ("+" sign). Thus, each verse’s metrical pattern is represented as follows:

<l n="1" met="---+---+-+-">Cuando me paro a contemplar mi estado,</l>


With the purpose of having a corpus as representative as possible, every author from the 16th and 17th centuries with more than 10 digitalized and available sonnets has been included.

All texts have been taken from the Biblioteca Virtual Miguel de Cervantes.

Currently, the corpus comprises more than 5,000 sonnets (more than 71,000 verses).


The metrical pattern annotation has been carried out in a semi-automatic way. Firstly, all sonnets have been processed by an automatic metrical scansion system which assigns a distinct metrical pattern to each verse. Secondly, a part of the corpus has been manually checked and errors have been corrected.

Currently the corpus is going through the manual validation phase, and each sonnet includes information about whether it has already been manually checked or not.

How to cite this corpus

If you would like to cite this corpus for academic research purposes, please use this reference:

Navarro-Colorado, Borja; Ribes Lafoz, María, and Sánchez, Noelia (2015) "Metrical annotation of a large corpus of Spanish sonnets: representation, scansion and evaluation" 10th edition of the Language Resources and Evaluation Conference 2016 Portorož, Slovenia. (PDF)

Further Information

This corpus is part of the ADSO project, developed at the University of Alicante and funded by Fundación BBVA.

If you require further information about the metrical annotation, please consult the Annotation Guide (in Spanish) or the following papers:


The metrical annotation of this corpus is licensed under a Creative Commons Attribution-Non Commercial 4.0 International License.

About the texts, "this digital object is protected by copyright and/or related rights. This digital object is accessible without charge, but its use is subject to the licensing conditions set by the organization giving access to it. Further information available at ".


Corpus of Spanish Golden-Age Sonnets (with metrical annotation) / Corpus de Sonetos del Siglo de Oro (con anotación métrica)






No packages published