For easier communication, posting, or commenting on each others posts, people use their dialects. In Africa, various languages and dialects exist. One of the African languages is Bambara, used by citizens in different countries. Our dataset is the first Bamabara Dataset including more than 3K sentences, covering different topics, preprocessed and annotated as positive, negative, and neutral.
our common-crawl-based dataset is composed of 1663 positive, 579 negative, and 804 neutral sentences. Data was collected by the iCompass team (http://www.icompass.tn).
BAMBARA was preprocessed by removing links, emoji symbols and punctuation. Annotation was then performed by TWO Malian native speakers, who are engineering students. Sentences are annotated as positive (1), negative(-1), or neutral (0).
@inproceedings{Bambara2021,
title={Bambara Language Dataset for Sentiment Analysis},
author={Diallo, Mountaga and Fourati, Chayma and Haddad, Hatem},
booktitle={Practical ML for Developing Countries Workshop. ICLR 2021, Virtual Event},
year = {2021},
}