This python library helps you with augmenting text data for named entity recognition.
Reference from An Analysis of Simple Data Augmentation for Named Entity Recognition
To install the library:
pip install neraug
One of the example algorithms: DictionaryReplacement
:
>>> from neraug.augmentator import DictionaryReplacement
>>> from neraug.scheme import IOBES
>>> ne_dic = {'Tokyo Big Sight': 'LOC'}
>>> augmentator = DictionaryReplacement(ne_dic, str.split, IOBES)
>>> x = ['I', 'went', 'to', 'Tokyo']
>>> y = ['O', 'O', 'O', 'S-LOC']
>>> x_augs, y_augs = augmentator.augment(x, y, n=1)
>>> x_augs
[['I', 'went', 'to', 'Tokyo', 'Big', 'Sight']]
>>> y_augs
[['O', 'O', 'O', 'B-LOC', 'I-LOC', 'E-LOC']]
The library supports the following algorithms:
- DictionaryReplacement
- LabelWiseTokenReplacement
- MentionReplacement
- ShuffleWithinSegment
and supports the following scheme:
- IOB2
- IOBES
- BILOU
Appreciate for the following research:
@misc{neraug,
title={neraug: A data augmentation tool for named entity recognition},
author={Hiroki Nakayama},
url={https://github.com/Hironsan/neraug},
year={2021}
}