Skip to content

gender-rewriting-models

Latest
Compare
Choose a tag to compare
@balhafni balhafni released this 22 Apr 08:35
· 2 commits to master since this release
ee7d50e

Release of the pretrained models used by the gender rewriting multi-step system:

  • bert-base-camel-bert-msa-apgcv2.1.zip: is the fine-tuned CAMeLBERT MSA by using the MLM objective.

  • gender-id-apgcv2.1.zip: the word-level gender identification model based on fine-tuning CAMeLBERT MSA on the APGCv2.1.

  • gender-id-apgcv2.1-aug.zip: the word-level gender identification model based on fine-tuning CAMeLBERT MSA on the APGCv2.1 with the additional augmented data.

  • gender-id-apgcv1.0.zip: the word-level gender identification model based on fine-tuning CAMeLBERT MSA on the APGCv1.0.

  • calima-msa-s31_0.4.2.db.muddled: a muddled version of the extended Standard Arabic Morphological Analyzer database (SAMA). This file has to be unmuddled before it can be used to reproduce our results. To unmuddle this file, you would first need to obtain the original SAMA 3.1 database from the LDC along with muddler. Once you do that, you'd run the following command:
    muddler unmuddle -s LDC2010L01.tgz -m calima-msa-s31_0.4.2.db.muddled calima-msa-s31_0.4.2.db. This will produce our extended database which we use in our experiments (i.e., calima-msa-s31_0.4.2.db).