This repository collects some public domain sentences in Catalan language.
Data file | Description | Source | Import date |
---|---|---|---|
common-short-sentences.txt | Very common short sentences found in different corpora with at least 10 occurrences | Different corpora | 2018 |
proverbs.txt | 8K proverbs | Popular knowledge | 2018 |
tocqueville.txt | Selected sentences by Tocqueville translated into Catalan | Translator himself | 2018 |
dogc.txt | Selected sentences from Diari Oficial de la Generalitat de Catalunya (Catalan official publication journal) | dogc.gencat.cat | 2018 |
dogv.txt | Selected sentences from Diari Oficial de la Generalitat Valenciana (Valencian official publication journal) | dogv.gva.es | 2018 |
riuraueditors.txt | Selected sentences from works published by Riurau Editors | Publisher itself | 2018 |
softcatala.txt | Selected sentences from Softcatalà's web page | Softcatalà | 2018 |
programari-lliure-llibre.txt | Selected sentences from the book 'Programari lliure: tècnicament viable, econòmicament sostenible i socialment just' | Jordi Mas | 2018 |
common-voice-sentences.txt | Senteces written specifically for Common Voice | Montserrat Nadal et alii | 2018 |
muni-bal.txt | Balear town names | Public domain | 2018 |
muni-cat.txt | Catalan town names | Public domain | 2018 |
muni-val.txt | Valencian town names | Public domain | 2018 |
Files from data directory are released under CC0 license.
The following table is necessary for this dataset to be indexed by search engines such as Google Dataset Search.
property | value |
---|---|
name | This repository collects some public domain sentences in Catalan language. |
description | This repository collects some public domain sentences in Catalan language used in the project Common Voice. |
sameAs | https://github.com/Softcatala/ca-text-corpus |