Skip to content

yilingchung/Towards_KN_CN_Generation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 

Repository files navigation

Towards Knowledge-Grounded Counter Narrative Generation for Hate Speech

pipeline_new

This work aims at generating knowledge-bound counter narratives, using 2 modules, knowledge retrieval module and counter narrative generation module.

Requirements:

Java 1.8+
Solr
Keyphrase digger

transformers
rouge_score
spaCy

Knowledge Retrieval Module

Under KN_CONAN_final_data, we provide final CONAN dataset paired with corresponding silver knowledge. If you wish to prepare your own knowledge repository, check the steps below. Otherwise, skip this section.

  1. Download CONAN dataset and knowledge repository
  2. Prepare queries
  3. Retrieve relevant knowledge
  4. Select knowledge sentences

1. Download Data

1.1 Hate countering dataset

1.2. Knowledge Repository

We use the following datasets for creating relevant knowledge.

2. Prepare Queries

2.1. Query extraction

We use Keyphrase Digger to extract keyphrase queries for both hate speech and counter narratives in CONAN.

    1. create a txt file for each HS and CN in CONAN, run create_text_file.py
    1. Make sure that the resulting files from i. are stored in the same directory of run_kd.sh and KD.jar from your keyphrase Digger repository after compiling (e.g. KD/KD-Runner/data/CN/ if run_kd.sh and KD.jar are under KD/KD-Runner/)
    1. Retrieve keyphrases for HS and CN using Keyphrase Digger, store and run run_kd.sh.
    1. Extract retrieved keyphrases from iii. and add them in CONAN data using extract_keyphrase.py

2.2. Query generation

We use transformer implementation to train and generate keyphrase queries.

3. Retrieve relevant knowledge

Retrieve relevant knowledge using Solr, run retrieve_kn_solr.py)

Solr is used to index articles in knowledge repository and retrieve relevant knowledge given a query.

Some solr commands:

  • Launch solr: run solr-8.8.1/bin/solr restart or ./bin/solr restart

  • Index data (e.g., index all articles under datasets/wikitext/ to knowledge repository called knowledgecollection): bin/post -c knowledgecollection -p 8989 datasets/wikitext/*

  • An example of searching information about islamic faith in the field content from knowledge repository called knowledgecollection: curl "http://localhost:8989/solr/knowledgecollection/select?q=(content:islamic faith)&rows=10&wt=json"

Check this tutorial on how to install solr, index data and advanced methods for searching data in detail.

4. Select knowledge sentences

  1. Apply knowledge sentence selector to get the top-N knowledge sentences and save it in a single file, 1 entry per line, run kn_sentence_retriever.py
  2. Create train, valid, and test data, run create_modelling_data.py.

Counter Narrative Generation Module

Multi-domain Knowledge-grounded hate countering dataset

The Gold Knowledge Test Set can be downloaded here, containing hate speech, counter-narrative pairs coupled with relevant backgroud knowledge. It consists of 195 pairs covering multiple hate targets (islamophobia, misogyny, antisemitism, racism, and homophobia).

Citation

For more details on data partition procedure, please see our paper.

@inproceedings{chung-etal-2021-towards,
    title = "Towards Knowledge-Grounded Counter Narrative Generation for Hate Speech",
    author = "Chung, Yi-Ling  and
      Tekiro{\u{g}}lu, Serra Sinem  and
      Guerini, Marco",
    booktitle = "Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021",
    month = aug,
    year = "2021",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2021.findings-acl.79",
    doi = "10.18653/v1/2021.findings-acl.79",
    pages = "899--914",
}

About

Knowledge-bound counter speech generation to challenge hate speech

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages