The Poio Corpus is a freely available collection of language resources for the lesser-used languages. The data is extracted from free sources like Wikipedia, dictionaries, documents, websites and others.
Python
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
build
cidles
langinfo
ngrams
openadaptxt
prediction
semantics
stopwords
wikipedia
.gitignore
LICENSE
README.rst
REQUIREMENTS.txt
config.ini
s3cmd.py
s3sync.py
update_all.py

README.rst

Poio Corpus

The Poio Corpus is a freely available collection of language resources for the lesser-used languages. The data is extracted from free sources like Wikipedia, dictionaries, documents, websites and others.

The official Poio Corpus website is: https://www.poio.eu

Poio Corpus is part of the Poio project: http://media.cidles.eu/poio/

License

Poio Corpus source code is distributed under the Apache 2.0 License.

Poio Corpus documentation is distributed under the Creative Commons Attribution 3.0 Unported.

Poio Corpus data packages are distributed under different licenses. Please check the LICENSE files in the data packages available under the menu option Corpus on the Poio corpus website (https://www.poio.eu) for more information.