Explainable Hate Speech Detection with Chain-of-Thought Reasoning and Post-Hoc Explanations

This paper is an assignment for the Advanced Topics in Computational Semantics course at the University of Amsterdam. We aim to explore the role of bias and stereotypes in online language, explicitly focusing on hate speech detection. We leverage large language models (LLMs) trained with in-context learning, enhanced by incorporating explanations into the training process. These explanations can be integrated as pre-answer reasoning or post-answer explanations, each offering different benefits.

Our research questions involve investigating whether we can prompt LLMs to reason about a potentially hateful post before or after making a prediction. We also examine the effectiveness of inductive versus deductive approaches on hate speech datasets, primarily using the SocialBiasFrames dataset. Our goal is to improve our understanding of hate speech detection in LLMs, ultimately contributing to developing more effective and accountable AI systems.

You can read our full report here.

Requirements

The code is written in Python 3.10. The requirements can be installed using pip install -r requirements.txt or with the conda environment file conda env create -f environment.yml.

Evaluation

The evaluation of the models is done using the eval.py script. The script takes the following arguments:

python3 src/eval.py --model <model> --prompt_name <name-of-prompt>

The model argument can be any of the T5 or GPT models from the HuggingFace Transformers library. The prompt name argument can be any of the following:

default_0shot_no_cot: Default prompt for no CoT reasoning
default_0shot_cot_expl_first: Default prompt for CoT reasoning with explanation generation first
default_4shot_cot_ans_first: Default prompt 4-shot CoT reasoning with answer generation first
default_4shot_cot_expl_first: Default prompt 4-shot CoT reasoning with explanation generation first

If you want to evaluate the model using a custom prompt, you can add the prompt to the data/prompts folder and use the name of the file as the prompt name argument.

By default, the generations of the models will be cached in the data/caches folder. If you happened to cancel the evaluation script, the next time you run it, it will use the cached generations instead of generating them again.

The model is evaluated on the SocialBiasFrames test set by default. You can change the split by adding the --split argument. The possible values are train, validation, and test. The dataset will be downloaded automatically on the first run.

Name		Name	Last commit message	Last commit date
Latest commit History 81 Commits
data		data
notebooks		notebooks
slurm/job		slurm/job
src		src
.gitignore		.gitignore
ATCS_Report_2023.pdf		ATCS_Report_2023.pdf
README.md		README.md
environment.yml		environment.yml
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

notebooks

notebooks

slurm/job

slurm/job

src

src

.gitignore

.gitignore

ATCS_Report_2023.pdf

ATCS_Report_2023.pdf

README.md

README.md

environment.yml

environment.yml

pyproject.toml

pyproject.toml

requirements.txt

requirements.txt

Repository files navigation

Explainable Hate Speech Detection with Chain-of-Thought Reasoning and Post-Hoc Explanations

Requirements

Evaluation

About

Releases

Packages

Contributors 3

Languages

SebastiaanJohn/explainable-hate-speech-detection

Folders and files

Latest commit

History

Repository files navigation

Explainable Hate Speech Detection with Chain-of-Thought Reasoning and Post-Hoc Explanations

Requirements

Evaluation

About

Topics

Resources

Stars

Watchers

Forks

Languages