Skip to content

Latest commit

 

History

History
74 lines (62 loc) · 1.74 KB

README.md

File metadata and controls

74 lines (62 loc) · 1.74 KB

Assem's Arabic Stemmer DOI

This is an algorithm for Arabic stemming written on Snowball framework language. If offers light stemming and text normalization.

@article{Chelli2018,
author = "Assem Chelli",
title = "{Assem's Arabic Stemmer}",
year = "2018",
month = "11",
url = "https://figshare.com/articles/Assem_s_Arabic_Stemmer/7295690",
doi = "10.6084/m9.figshare.7295690.v1"
}

This is a sample of results:

Word Light Stemmer Root-Based Stemmer
طفل طفل طفل
اطفال اطفال طفل
الاطفال اطفال طفل
اطفالكم اطفال طفل
فأطفالكم اطفال طفل
اطفالهم اطفال طفل
والاطفال اطفال طفل
فاطفالهم اطفال طفل
وطفل طفل طفل
الطفولة طفول طفل
والطفلتين طفل طفل
طفلتان طفل طفل

Requirements:

They are already attached as git submodules so just run:

$ git submodule update --init --recursive

Build:

$ make build

Run:

  • Light Stemmer
$ make run
الطالب
طالب
  • Root-Based Stemmer
$ make run_root
الطالب
طلب

Test:

We configured tests to run against snowball-data arabic sample to test speed, grouping factor and precision.

$ make test

Distributions:

  • dist light stemmer to available languages:
$ make dist