Skip to content

Multilingual Corpus of Survey Questionnaires (MCSQ) compiling. This repository contains all the necessary code used in the corpus compilation,

Notifications You must be signed in to change notification settings

dsorato/MCSQ_compiling

Repository files navigation

Multilingual Corpus of Survey Questionnaires (MCSQ) Compiling

DOI

The Multilingual Corpus of Survey Questionnaires (MCSQ) is the first publicly available corpus of survey questionnaires, comprising survey items from large-scale comparative survey projects that provide cross-national and cross-cultural data to the Social Sciences and Humanities. Namely, the European Social Survey (ESS)S1 (rounds 1 to 9), the European Values Study (EVS)2 (waves 2, 3, 4 and 5), the Survey of Health Ageing and Retirement in Europe (SHARE)3 (waves 7, 8 and COVID questionnaires) and, the Wage Indicator 4 (wave 1 and COVID questionnaires). The questionnaires are available in the English (from Great Britain) source language and their translations into Catalan, Czech, French, German, Norwegian, Portuguese, Spanish and Russian, adding to 30 language-country combinations. Additionaly, some English questionnaires are available with localizations for Ireland and Malta.

This repository contains the scripts that were used in the compilation steps of the MCSQ, which was implemented as a Entity-Relationship (ER) database. In the preprocessing directory there are scripts to preprocess data. It is important to notice that the scripts differ concerning the format of the input source file and the survey project. In the DB directory there are the files concerning the database structure. Alignment folder has the scripts to align a given target questionnaire in respect to its source using an heuristic that leverages metadata information. Finally, the annotation folder has the scripts that call the annotation methods. The Figure below depicts the framework applied to compile and publish the MCSQ.

alt text

The [MCSQ]: Multilingual Corpus of Survey Questionnaires is an open-access research resource. The MCSQ is permanently preserved in the CLARINO repository5, where it can be freely downloaded.

If you use part of the code, datasets, and/or findings to inspire your own scientific work, please cite the article:

Zavala-Rojas, D., Sorato, D., Hareide, L., & Hofland, K. (forthcoming 2021). [MCSQ] Multilingual Corpus of Survey Questionnaires. Meta: Journal Des Traducteurs. @article{Zavala-Rojas,author = {Zavala-Rojas, Diana and Sorato, Danielly and Hareide, Lidun and Hofland, Knut},journal = {Meta: Journal des traducteurs},title = {{[MCSQ] Multilingual Corpus of Survey Questionnaires}}}

The MCSQ was developed in the Social Sciences and Humanities Open Cloud (SSHOC)6 project. SSHOC has received funding from the European Union's Horizon 2020 project call H2020-INFRAEOSC-04-2018, grant agreement #823782.

1: https://www.europeansocialsurvey.org/ 2: https://europeanvaluesstudy.eu/ 3: http://www.share-project.org/home0.html 4: https://wageindicator.org/Wageindicatorfoundation/researchlab/wageindicator-survey-and-data 5: https://repo.clarino.uib.no/xmlui/handle/11509/142 6: https://sshopencloud.eu/

About

Multilingual Corpus of Survey Questionnaires (MCSQ) compiling. This repository contains all the necessary code used in the corpus compilation,

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages