Skip to content

Softcatala/ca-text-corpus

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 

Repository files navigation

ca-text-corpus

Description

This repository collects some public domain sentences in Catalan language.

Data files

Data file Description Source Import date
common-short-sentences.txt Very common short sentences found in different corpora with at least 10 occurrences Different corpora 2018
proverbs.txt 8K proverbs Popular knowledge 2018
tocqueville.txt Selected sentences by Tocqueville translated into Catalan Translator himself 2018
dogc.txt Selected sentences from Diari Oficial de la Generalitat de Catalunya (Catalan official publication journal) dogc.gencat.cat 2018
dogv.txt Selected sentences from Diari Oficial de la Generalitat Valenciana (Valencian official publication journal) dogv.gva.es 2018
riuraueditors.txt Selected sentences from works published by Riurau Editors Publisher itself 2018
softcatala.txt Selected sentences from Softcatalà's web page Softcatalà 2018
programari-lliure-llibre.txt Selected sentences from the book 'Programari lliure: tècnicament viable, econòmicament sostenible i socialment just' Jordi Mas 2018
common-voice-sentences.txt Senteces written specifically for Common Voice Montserrat Nadal et alii 2018
muni-bal.txt Balear town names Public domain 2018
muni-cat.txt Catalan town names Public domain 2018
muni-val.txt Valencian town names Public domain 2018

License

Files from data directory are released under CC0 license.

Metadescription

The following table is necessary for this dataset to be indexed by search engines such as Google Dataset Search.

property value
name This repository collects some public domain sentences in Catalan language.
description This repository collects some public domain sentences in Catalan language used in the project Common Voice.
sameAs https://github.com/Softcatala/ca-text-corpus

About

Public domain corpus of Catalan text

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published