Skip to content

alingwist/Moroccan-plurals-corpus

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 

Repository files navigation

Moroccan Arabic Plurals Corpus

Overview

This corpus contains 1166 singular-plural pairs in Moroccan Arabic (Darija), extracted from the Darija Open Dataset (DODa, Outchakoucht and Es-Samaali, 2021). Each entry includes:

  • Singular form (in IPA).
  • Plural form (in IPA).
  • Gloss (English translation).
  • Singular and plural patterns (or template).
  • Classification of the plural as "sound" or "broken."

Purpose

This corpus is designed for linguistic research, particularly in the study of Moroccan Arabic morphology.

File Structure

The corpus is provided as a .csv file with the following columns:

  • singular: Singular form of the noun.
  • plural: Plural form of the noun.
  • gloss: English translation of the noun.
  • singular_pattern: Pattern of the singular form (all consonants indicated by the letter 'C').
  • plural_pattern: Pattern of the plural form (all consonants indicated by the letter 'C').
  • plural_type: "sound" or "broken."

Usage

To use the corpus, download the .csv file and open it with any spreadsheet software or programming language (e.g., Python, R).

License

This corpus is licensed under the Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0). You are free to share and adapt the corpus for non-commercial purposes, as long as you provide appropriate attribution.

Attribution

This corpus is derived from the Darija Open Dataset (DODa, Outchakoucht and Es-Samaali, 2021), which is licensed under the Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0).

If you use this corpus, please cite both this work and the original DODa dataset:

  • Nirheche, A. (2025). Moroccan Arabic Plurals Corpus [Data set]. Zenodo. https://doi.org/10.5281/zenodo.14642330
  • Outchakoucht, A. and Es-Samaali, H. (2021). Darija Open Dataset (DODa). github.com/darija-open-dataset

Non-Commercial Use

This corpus is licensed under the Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0). This means:

  • You are free to share and adapt the corpus for non-commercial purposes.
  • You must give appropriate credit to the original authors.
  • You may not use this corpus for commercial purposes without permission from the copyright holders of the original DODa dataset.

Contact

For questions or feedback, please contact anirheche@umass.edu.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors