Skip to content
Public domain corpus of Catalan text
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
data
incoming
.gitignore
README.md

README.md

ca-text-corpus

Description

This repository collects some public domain sentences in Catalan language.

Data files

Data file Description Source Import date
common-short-sentences.txt Very common short sentences found in different corpora with at least 10 occurrences Different corpora 2018
proverbs.txt 8K proverbs Popular knowledge 2018
tocqueville.txt Selected sentences by Tocqueville translated into Catalan Translator himself 2018
dogc.txt Selected sentences from Diari Oficial de la Generalitat de Catalunya (Catalan official publication journal) dogc.gencat.cat 2018
dogv.txt Selected sentences from Diari Oficial de la Generalitat Valenciana (Valencian official publication journal) dogv.gva.es 2018
riuraueditors.txt Selected sentences from works published by Riurau Editors Publisher itself 2018
softcatala.txt Selected sentences from Softcatalà's web page Softcatalà 2018
programari-lliure-llibre.txt Selected sentences from the book 'Programari lliure: tècnicament viable, econòmicament sostenible i socialment just' Jordi Mas 2018
common-voice-sentences.txt Senteces written specifically for Common Voice Montserrat Nadal et alii 2018
muni-and.txt Andorran town names Public domain 2018
muni-bal.txt Balear town names Public domain 2018
muni-cat.txt Catalan town names Public domain 2018
muni-val.txt Valencian town names Public domain 2018
cities2.txt Andorran, Balear, Catalan, and Valencian town names repeated, but with different verb Public domain 2019

License

Files from data directory are released under CC-0 license

You can’t perform that action at this time.