Skip to content

dxlong2000/ToXCL

 
 

Repository files navigation

arXiv

ToXCL: A Unified Framework for Toxic Speech Detection and Explanation

Nhat M. Hoang1*Xuan Long Do2,3*  Duc Anh Do1Duc Anh Vu1Luu Anh Tuan1
1Nanyang Technological University 
2National University of Singapore 
3Institute for Infocomm Research (I2R), A*STAR 
*Equal Contribution

Setup Environment

This code was tested on Python 3.8 and CUDA 11.6

conda create -n toxcl python=3.8
conda activate toxcl
pip install -r requirements.txt

Baselines

Train Encoder-only model (HateBert, BERT, ELECTRA, RoBERTa)
# `model_checkpoint` used in paper: GroNLP/hateBERT, bert-base-uncased, google/electra-base-discriminator, roberta-base
python -m train_encoder_arch \
    --model_name {model_checkpoint} \
    --output_dir {output_dir} \
    --dataset_name {IHC | SBIC}
Train Decoder-only model (GPT-2)
python -m train_decoder_arch \
    --model_name_or_path gpt2 \
    --output_dir {output_dir} \
    --dataset_name {IHC | SBIC} \
    --per_device_train_batch_size 8 \
    --per_device_eval_batch_size 8 \
    --max_train_steps 20000 \
    --learning_rate 1e-4 \
    --text_column {raw_text | text} \
    --summary_column {explanations | output}
Train Encoder-Decoder model (BART, T5, Flan-T5)
# `model_checkpoint` used in paper: facebook/bart-base, t5-base, google/flan-t5-base
python train_encoder_decoder_arch.py \
    --model_name_or_path {model_checkpoint} \
    --output_dir {output_dir} \
    --dataset_name {IHC | SBIC} \
    --text_column {raw_text | text} \
    --summary_column {explanations | output} \
    --do_train --do_eval \
    --source_prefix "summarize: " \
    --per_device_train_batch_size 16 \
    --per_device_eval_batch_size 16 \
    --gradient_accumulation_steps 1 \
    --predict_with_generate True \
    --max_source_length 256 \
    --learning_rate 0.00001 \
    --num_beams 4 \
    --max_steps 20000 \
    --save_steps 500 --eval_steps 500 \
    --evaluation_strategy steps \
    --load_best_model --report_to none
Zero-shot inference with LLM (ChatGPT, Mistral-7b)
python test_llm.py mistral IHC saved/llm/mistral_ihc_result.csv

Argument notes:

  • text_column:
    • Use raw_text for the original input, format: "{raw_text}"
    • Use text for input with target groups, format: "Target: {TG} Post: {raw_text}"
  • summary_column:
    • Use explanations for group G2, format: "{explanations}
    • Use output for E2E generation, format: "{class} {explanations}"

ToXCL

# Train Target Group Generator
# After training, run inference on the desired dataset. For convenient, we have included the revised datasets in folder `data`
python train_encoder_decoder_arch.py \
    --model_name_or_path t5-base \
    --output_dir saved/T5-TG \
    --dataset_name TG \
    --text_column raw_text \
    --summary_column target_groups \
    --do_train --do_eval \
    --source_prefix "summarize: " \
    --per_device_train_batch_size 16 \
    --per_device_eval_batch_size 16 \
    --gradient_accumulation_steps 1 \
    --predict_with_generate True \
    --max_source_length 256 \
    --learning_rate 0.00001 \
    --num_beams 4 \
    --max_steps 20000 \
    --save_steps 500 --eval_steps 500 \
    --evaluation_strategy steps \
    --load_best_model --report_to none

# Train teacher model
python -m train_encoder_arch \
    --model_name roberta-large \
    --output_dir saved/roberta-L \
    --dataset_name IHC \
    --text_column_num 1     # 1 is with Target Groups, 0 otherwise

# Train ToXCL
python -m train_ToXCL \
    --model_name_or_path google/flan-t5-base \
    --teacher_name_or_path saved/roberta-L \
    --output_dir saved/ToXCL \
    --dataset_name IHC

Development

This is a research implementation and, in general, will not be regularly updated or maintained long after release.

Citation

If you find our work useful for your research and development, please consider citing the paper:

@misc{hoang2024toxcl,
      title={ToXCL: A Unified Framework for Toxic Speech Detection and Explanation}, 
      author={Nhat M. Hoang and Xuan Long Do and Duc Anh Do and Duc Anh Vu and Luu Anh Tuan},
      year={2024},
      eprint={2403.16685},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

About

[NAACL 2024] ToXCL: A Unified Framework for Toxic Speech Detection and Explanation

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%