Skip to content

arabic-gec

Latest
Compare
Choose a tag to compare
@balhafni balhafni released this 03 Nov 08:28
· 18 commits to master since this release

Release of the morphological database used for morphological preprocessing:

  • calima-msa-s31_0.4.2.db.muddled: a muddled version of the extended Standard Arabic Morphological Analyzer database (SAMA). This file has to be unmuddled before it can be used to reproduce our results. To unmuddle this file, you would first need to obtain the original SAMA 3.1 database from the LDC along with muddler. Once you do that, you'd run the following command:
    muddler unmuddle -s LDC2010L01.tgz -m calima-msa-s31_0.4.2.db.muddled calima-msa-s31_0.4.2.db. This will produce our extended database which we use in our experiments (i.e., calima-msa-s31_0.4.2.db).