Context and Topics

For easier communication, posting, or commenting on each others posts, people use their dialects. In Africa, various languages and dialects exist. One of the African languages is Bambara, used by citizens in different countries. Our dataset is the first Bamabara Dataset including more than 3K sentences, covering different topics, preprocessed and annotated as positive, negative, and neutral.

Collection Process

our common-crawl-based dataset is composed of 1663 positive, 579 negative, and 804 neutral sentences. Data was collected by the iCompass team (http://www.icompass.tn).

Preprocessing and annotation

BAMBARA was preprocessed by removing links, emoji symbols and punctuation. Annotation was then performed by TWO Malian native speakers, who are engineering students. Sentences are annotated as positive (1), negative(-1), or neutral (0).

Paper citation

@inproceedings{Bambara2021,

title={Bambara Language Dataset for Sentiment Analysis},

author={Diallo, Mountaga and Fourati, Chayma and Haddad, Hatem},

booktitle={Practical ML for Developing Countries Workshop. ICLR 2021, Virtual Event},

year = {2021},

}

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
Bambara_v2_dataset.txt		Bambara_v2_dataset.txt
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bambara_v2_dataset.txt

Bambara_v2_dataset.txt

README.md

README.md

Repository files navigation

Context and Topics

Collection Process

Preprocessing and annotation

Paper citation

About

Releases

Packages

chaymafourati/BAMBARA-LANGUAGE-DATASET-FOR-SENTIMENT-ANALYSIS

Folders and files

Latest commit

History

Bambara_v2_dataset.txt

Bambara_v2_dataset.txt

README.md

README.md

Repository files navigation

Context and Topics

Collection Process

Preprocessing and annotation

Paper citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages