Switch branches/tags
Nothing to show
Find file History
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
..
Failed to load latest commit information.
HAREM
Paramopama
Peres2017
leNER-Br
README.MD

README.MD

NER-datasets for Portuguese

HAREM

HAREM was an evaluation contest for named entity recognition in Portuguese. There were two editions:

First HAREM

Second HAREM

NOTE: the XML format might be painfull to parse, check the Paramopama corpus which includes the HAREM data in CoNNL format.

WikiNER

A NER-corpus based on exploration of inter-document links in Wikipedia.

Paramopama

Extends the PtBR version of WikiNER corpus, revising incorrect assigned tags in order to improve corpus quality. In the experiments the authors also produced a CoNNL format version of the HAREM corpus, which made publicly available:

leNER-Br

A dataset for named entity recognition in Brazilian legal documents is, unlike other Portuguese language datasets, this dataset is composed entirely of legal documents. In addition to tags for persons, locations, time entities and organizations, the dataset contains specific tags for law and legal cases entities.

Peres 2017

A dataset for named entity recognition in Brazilian Portuguese (#noisydata #twitter)