Skip to content
master
Switch branches/tags
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time

Harassment-Corpus

Publishing a Quality Context-aware Annotated Corpus andLexicon for Harassment Research.

Identifying profane or offensive words are a standard way of starting the investigation over cyberbullying incident. For this reason, initially we created a lexicon form the profane words and we divided our dictionary into the six context;1) Sexual 2) Appearance-related 3) Intellectual 4) Political 5) Racial 6) Combined. We utilized the first five categories of our lexiconas seed terms for collecting tweets from Twitter. Using at least one offensive word,we collected 10,000 tweets for each contextual type for a total of 50,000. Using offensive words in a given tweet does not assure that thetweet is harassing because individuals might utilize the offensivewords in a friendly manner or quotes. Therefore, we rely on human judged annotations for discriminating harassing tweets fromnot-harassing tweets. We acknowledge support from the National Science Foundation (NSF) award CNS 1513721: Context-Aware Harassment Detection on Social Media. Wiki page of this project: http://wiki.knoesis.org/index.php/Context-Aware_Harassment_Detection_on_Social_Media To getting our annotated tweets in five context, please contact the authors via these emails: Mohammadreza Rezvan: mohammadrezarezvan94@gmail.com Saeedeh Shekarpour: sshekarpour1@udayton.edu

About

Harassment Lexicon and Corpus

Resources

Releases

No releases published

Packages

No packages published