Assem's Arabic Light Stemmer is a snowball-based stemming algorithm for Arabic aimed mainly to improve search.
Switch branches/tags
Nothing to show
Clone or download
Latest commit e230480 Jun 30, 2018

README.md

Assem's Arabic Stemmer Gitter

This is an algorithm for Arabic stemming written on Snowball framework language. If offers light stemming and text normalization. voc

Requirements:

    $ make download
  • Install python requirements
    $ sudo pip install -r requirements.txt

or manually by:

  • extracting snowball into the root folder {Root}/snowball
  • extracting snowball-data/arabic/voc.txt.gz into {Root}/test_data/voc.txt

Build:

  • light stemming
      $ make build
  • root-based stemming
      $ make build_root_based_stemmer

Run:

  • Light Stemmer
  	 $ make run
  	  الطالب
  	  طالب
  • Root-Based Stemmer
      $ make run_root
      الطالب
      طلب

Test:

We configured tests to run against snowball-data arabic sample.

  • time:
      $ make time
  • grouping effect:
      $ make grouping
  • all:
      $ make test
  • Test SAS with golden arabic corpus:
      $ make test_arabicstemmer
  • Test ISRI Stemmer with golden arabic corpus:
     $ make test_isri

Distributions:

  • dist light stemmer to available languages:
    $ make dist
  • dist root-based stemmer to available languages:
    $ make dist_rooter

Results:

Snowball Arabic (Stemmer & rooter) Results

Word Stem root
طفل طفل طفل
اطفال اطفال طفل
الاطفال اطفال طفل
اطفالكم اطفال طفل
فأطفالكم اطفال طفل
اطفالهم اطفال طفل
والاطفال اطفال طفل
فاطفالهم اطفال طفل
وطفل طفل طفل
الطفولة طفول طفل
والطفلتين طفل طفل
طفلتان طفل طفل