Skip to content
Python WRAPPER for the CETEM Publico Corpus
Python
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
CETEMPublico
.gitignore
LICENSE.rst
README.rst
setup.py

README.rst

cetem-publico

cetem-publico is a Python wrapper for the CETEMPublico corpus. It takes care of downloading, storing and importing the corpus into NLTK.

THIS IS STILL A WORK IN PROGRESS, API MIGHT BREAK WITHOUT WARNING.

Installing

Install and update using pip:

pip install [--user] cetem-publico

A Simple Example

import CETEMPublico

cp = CETEMPublico.load() # loads a small 10KB sample
# or
cp = CETEMPublico.load(full=True) # loads the full 12GB

print(cp.tagged_sents())

Acknowledgements

This module only exists thanks to the Publico newspaper and the team responsible for the CETEMPublico corpus.

Bugs and stuff

Open a GitHub issue or, preferably, send me a pull request.

License

MIT

You can’t perform that action at this time.