NERAr

Named Entity Recognizer for Arabic

Overview

This tool preprocess and annonate arabic texts : predict part-of-speech tags, splits off words to separe the basic form and proclitics and annonate arabic named entities. Our tool predict LOC, PERS and ORG entities. Details of this work are presented in:

Souhir Gahbiche-Braham, Hélène Bonneau-Maynard, François Yvon. Traitement automatique des entités nommées en arabe : détection et traduction. TAL (Traitement Automatique des Langues), 54(2):101-132, 2014. [www.atala.org/IMG/pdf/Gahbiche-Braham-TAL54-2.pdf]

If you use NERAr for research purpose, please use the following citation:

@article{gahbiche2014traitement,
	title={Traitement automatique des entit{\'e}s nomm{\'e}es en arabe: d{\'e}tection et traduction},		
	author={Gahbiche-Braham, Souhir and Bonneau-Maynard, H{\'e}l{\`e}ne and Yvon, Fran{\c{c}}ois},
	journal={TAL},
	volume={54},
	number={2},
	pages={101--132},
	year={2014}
}

For any information, bug reports or comments, contact: souhir.gahbiche[at]limsi.fr or souhir.gahbiche[at]gmail.com

Requirements

To use NERAr:

1- Please install the Wapiti toolkit http://wapiti.limsi.fr/

2- Place the NERAr directory in the Wapiti directory.

3- Unzip the two models modelPOS+SEG-final-0.1.crf.zip and modelsAner+afp+geo-0.2.crf.zip

Run NERAr

1- cd ~/wapiti/NERAr

2- ./segment+detect.geo+pretrad.geo.sh arabic_filename

If NERAr cannot run correctly or the loading of model does not succeeds, please unzip the model modelPOS+SEG.light.crf.zip and replace "modelPOS+SEG-final-0.1.crf" by "modelPOS+SEG.light.crf" in the file "segment+detect.geo+pretrad.geo.sh"

Resulting files

arabic_filename.norm.pos contains the part-of-speech tags for each word in the resulting file.result

arabic_filename.bw is the transliterated text using Buckwalter scheme.

arabic_filename.wap contains the predictions for each word in the text

arabic_filename.tagsENen annotated file using IOB annotation

arabic_filename.TOdecode is a resulting preprocessed file with proposed translations extracted from bilingual dictionaries. This file can be processed with the Moses decoder using exclusive and inclusive options.

arabic_filename.sansbalises is a resulting preprocessed file without any annotation

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
dics		dics
LICENSE		LICENSE
README		README
README.md		README.md
annoter_corpus_avec_dico_rapide+geonames.OK+parts_final-norm.pl		annoter_corpus_avec_dico_rapide+geonames.OK+parts_final-norm.pl
cherche_dans_dics_all.pl		cherche_dans_dics_all.pl
clean_after_split.pl		clean_after_split.pl
convertUTF8toBW.pl		convertUTF8toBW.pl
format_entree_wapiti.pl		format_entree_wapiti.pl
modelPOS+SEG-final-0.1.crf.zip		modelPOS+SEG-final-0.1.crf.zip
modelPOS+SEG.light.crf.zip		modelPOS+SEG.light.crf.zip
modelsAner+afp+geo-0.2.crf.zip		modelsAner+afp+geo-0.2.crf.zip
normalize.pl		normalize.pl
segment+detect.geo+pretrad.geo.sh		segment+detect.geo+pretrad.geo.sh
separatepunc.pl		separatepunc.pl
split.pl		split.pl
split_sanssp.pl		split_sanssp.pl

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NERAr

Overview

Requirements

Run NERAr

Resulting files

About

Releases 1

Packages

Languages

License

SouhirG/NERAr

Folders and files

Latest commit

History

Repository files navigation

NERAr

Overview

Requirements

Run NERAr

Resulting files

About

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages