Hate crime detection and entity extraction

This repo contains code and data for the following paper: "Reporting the unreported: Event Extraction for Analyzing the Local Representation of Hate Crimes". EMNLP 2019

The method includes a Mulit-instance Learning models for detecting hate crimes in local news articles and a RNN model for extracting the entities of each hate crime. The annotated datasets for three types of crimes (hate, kidnapping and homicide) is included in the Data folder.

Getting Started

In order to run the code you need to download the Data and embeddings directory.

Prerequisites

This project uses Python 3.6.2. The following libraries must be installed:

sklearn 0.19.1
tensorflow-gpu 1.11.0
pandas 0.23.0
nltk 3.2.5
tqdm 4.28.1
numpy 1.14.3

Parameters

All the parameters of the code are denoted in params.json. The parameters are defined as following:

  "hidden_size": 100 # hidden size of the LSTM
  "art_filter_sizes": [2, 3, 4] # filter sizes in detect model
  "art_num_filters": 10 # number of different filters in detect model
  "pretrain": true # if set to true, uses Glove embeddings
  "embedding_size": 300 # size of the embedding
  "learning_rate": 0.00001 # learning rate for the detect task
  "keep_ratio": 0.75 # keep ratio for the detect task
  "epochs": 30 # number of epochs
  "entity_keep_ratio": 0.75 # keep ratio in extract task
  "entity_learning_rate": 0.001 # learning rate in extract task
  "batch_size": 5 # size of batches, shows the number of articles in each batch

Running the detection code

In order to run the detection code, use the following script:

python3 run_detect.py --model <MODEL_NAME> --goal <GOAL> --dataset <DATASET> --params <PARAMS_FILE>

substitude the following tokens according to the task in mind:

<MODEL_NAME>: you can either use MICNN (the model used in the paper) or ATTN (the hierarchical attention baseline)
<GOAL>: the goal of the task is either train or predicts
<DATASET>: use one of the three datasets (hate, kidnap or homicide) to perform the detection
<PARAM_FILE>: is the .json file that includes all the model parameters. The model uses params.json as default.

Running the extraction code

In order to run the detection code, use the following script:

python3 run_extract.py --goal <GOAL> --params <PARAMS_FILE>

substitude the following tokens according to the task in mind:

<GOAL>: the goal of the task is either train or predicts
<PARAM_FILE>: is the .json file that includes all the model parameters. The model uses params.json as default.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
Data		Data
Entity.py		Entity.py
Hi_Attn.py		Hi_Attn.py
MICNN.py		MICNN.py
README.md		README.md
params.json		params.json
preprocess.py		preprocess.py
run_detect.py		run_detect.py
run_extract.py		run_extract.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Hate crime detection and entity extraction

Getting Started

Prerequisites

Parameters

Running the detection code

Running the extraction code

About

Releases

Packages

Languages

aidamd/HateCrime

Folders and files

Latest commit

History

Repository files navigation

Hate crime detection and entity extraction

Getting Started

Prerequisites

Parameters

Running the detection code

Running the extraction code

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages