Skip to content
GSoC 2019 project on cross language analysis
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
.idea
helper_functions all media based corpora are annotated Jul 4, 2019
vk_collector new non_sexist corpus is added Jul 12, 2019
.gitignore new non_sexist corpus is added Jul 12, 2019
LICENSE Initial commit May 30, 2019
README.md new corpus (nonsex) added Jul 10, 2019

README.md

Multilingual hate speech detection (with a focus on sexism)

Google Summer of Code ' 2019, CLiPS

The goal of this still ongoing project is to reach improvement in hate speech detection in Russian language and additionaly work with multilingual approaches. The repository is going to include:

First Header Status
Annotated corpora for Russian Language (guidelines can be found below) In process (could be found here)
List of summaries of previously conducted hate speech research (in different languages) In process
Functions to collect the corpus from several Russian websites (more details below) Done, can be found here
Code to train a model for sexism detection In process

Annotation guidelines and problems

You can’t perform that action at this time.