Skip to content

Offensive Language Identification and Categorization

Notifications You must be signed in to change notification settings

arunavsk/OffenseEval2019

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Identifying and Categorizing Offensive Language on Twitter

Offensive language, hate speech and cyberbullying have become increasing more pervasive in social media. Individuals frequently take advantage of the perceived anonymity on social media platforms, to engage in brash and disrespectful behaviour that many of them would not consider in real life. The goal of this project is to use a hierarchical model to not only identify tweets/messages with offensive language but categorize the type and the target of offensive messages on social media.

How To

  • Create virutal env and install dependencies
conda create -n [ENV] python=3.7
conda activate [ENV]
pip install -r requirements.txt
wget http://nlp.stanford.edu/data/glove.twitter.27B.zip
  • Visit the following notebooks

    • EDA : Exploratory analysis and visualizations
    • Preprocessing: Data Cleaning, Feature Engineering and more
    • NBSVM: NBSVM classifier for all 3 sub-tasks
    • LSTM: LSTM classifier for all 3 sub-tasks
    • CNN Text (Simplified): Simplified version of CNN Text proposed by Kim with one single input channel
    • CNN Text (OG): Original architecture of CNN Text by Kim with multichannel inputs
  • Full report for implementation details, results, conclusion here

FAQ

Please reach out to arsaikia@iu.edu for feedback and suggestions

Releases

No releases published

Packages

No packages published