Skip to content

A Java-based Library for the Processing of Arabic Text

License

Notifications You must be signed in to change notification settings

Maha-J-Althobaiti/AraNLP

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AraNLP

AraNLP library is a Java-based toolkit for the processing of Arabic text. It supports the most important preprocessing steps, such as diacritic and punctuation removal, tokenization, sentence segmentation, part-of-speech tagging, root stemming, light stemming, and word segmentation. These tools are usually required to prepare the text for more advanced NLP tasks.

The goal of AraNLP is to gather most of the vital Arabic text preprocessing tools into one library that can be accessed easily. Therefore, We incorporated missing tools and included existing algorithmic resources.

AraNLP has already been used in many experiments to prepare the Arabic text and it successfully preprocessed the corpus.

Paper

Available in http://www.lrec-conf.org/proceedings/lrec2014/pdf/621_Paper.pdf.

Citation

Please cite our paper in any published work using this resource:

@inproceedings{Althobaiti14AraNLP,
  title={{AraNLP: a Java-Based Library for the Processing of Arabic Text}},  
  author={M. Althobaiti and U. Kruschwitz and M. Poesio},
  booktitle={Proceedings of the 9th Language Resources and Evaluation Conference (LREC)},
  year={2014},
  address = {Reykjavik}
}