CIR developed a lexicon comprised of inflammatory keywords across four languages (Amharic, Afaan Oromo, Tigrigna, and English) which may be indicative of hate speech along gendered, ethnic, and religious lines. CIR believes that this is the most comprehensive lexicon at present for the Ethiopian context.
For more information on the Lexicon development, see CIR's report "Normalised and Invisible: An Analysis of gendered hate-speech on social media in Ethiopia"
and conference paper "Resources for Annotating Hate Speech in Social Media Platforms Used in Ethiopia: A Novel Lexicon and Labelling Scheme"
.
It is important to note that terms on their own, may not constitute hate speech. The keywords were used to obtain content from social media which could contain hate speech; however, human annotators then analysed whether the content was/wasn't hate speech, as per the detailed annotation protocol.