Skip to content
master
Switch branches/tags
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
 
 
 
 
 
 
 
 

cetem-publico

cetem-publico is a Python wrapper for the CETEMPublico corpus. It takes care of downloading, storing and importing the corpus into NLTK.

THIS IS STILL A WORK IN PROGRESS, API MIGHT BREAK WITHOUT WARNING.

Installing

Install and update using pip:

pip install [--user] cetem-publico

A Simple Example

import CETEMPublico

cp = CETEMPublico.load() # loads a small 10KB sample
# or
cp = CETEMPublico.load(full=True) # loads the full 12GB

print(cp.tagged_sents())

Acknowledgements

This module only exists thanks to the Publico newspaper and the team responsible for the CETEMPublico corpus.

Bugs and stuff

Open a GitHub issue or, preferably, send me a pull request.

License

MIT

About

Python WRAPPER for the CETEM Publico Corpus

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages