Skip to content

chaymafourati/BAMBARA-LANGUAGE-DATASET-FOR-SENTIMENT-ANALYSIS

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 

Repository files navigation

Context and Topics

For easier communication, posting, or commenting on each others posts, people use their dialects. In Africa, various languages and dialects exist. One of the African languages is Bambara, used by citizens in different countries. Our dataset is the first Bamabara Dataset including more than 3K sentences, covering different topics, preprocessed and annotated as positive, negative, and neutral.

Collection Process

our common-crawl-based dataset is composed of 1663 positive, 579 negative, and 804 neutral sentences. Data was collected by the iCompass team (http://www.icompass.tn).

Preprocessing and annotation

BAMBARA was preprocessed by removing links, emoji symbols and punctuation. Annotation was then performed by TWO Malian native speakers, who are engineering students. Sentences are annotated as positive (1), negative(-1), or neutral (0).

Paper citation

@inproceedings{Bambara2021,

title={Bambara Language Dataset for Sentiment Analysis},

author={Diallo, Mountaga and Fourati, Chayma and Haddad, Hatem},

booktitle={Practical ML for Developing Countries Workshop. ICLR 2021, Virtual Event},

year = {2021},

}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published