Skip to content

ZetangForward/CMD-Context-aware-Model-self-Detoxification

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

50 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CMD

This is the official code for paper "CMD: a framework for Context-aware Model self-Detoxification"

Overview

Logo

Highlights

  • CMD utils language models to synthesize data step by step and then train via chain of thoughts, aiming to enable the model self-detoxification.
  • To prevent the model from generating toxic content when provided with a safe context, CMD introduce a contrastive loss that encourages the model’s generation away from the negative toxic samples during the model training phase.

Experiments

Logo

Logo

Quick Start

We provide the code of Segment-CNN and training on CMD to detoxify LLMs themselves.

Environment

conda env create -f environment.yaml

CMD

Preprocess

Here we will create the span dataset for training Segment-CNN.

cd utils

python csv_to_json.py \
--input path/to/your/jigsaw/train.csv \
--json_save ../dataset/total.json \
--train_span_json_save ../dataset/segment_cnn_train.json \
--test_span_json_save ../dataset/segment_cnn_test.json

sh perspective_api.sh

Train Segment-CNN

cd segment_cnn

python -u run_glue_no_trainer.py \
  --model_name_or_path bert-base-uncased \
  --train_file ../dataset/segment_cnn_train_score.json \
  --validation_file ../dataset/segment_cnn_test_score.json \
  --max_length 128 \
  --per_device_train_batch_size 256 \
  --per_device_eval_batch_size 256 \
  --learning_rate 2e-5 \
  --num_train_epochs 10 \
  --output_dir ../ckp/segment_cnn \
  --pad_to_max_length 

Mask Toxic Span

Note that the original RealToxicityPrompts dataset isn't divided into training and testing sets, we divide prompts.jsonl of RealToxicityPrompts dataset into rtp_train.json and rtp_test.json.

cd segment_cnn

python ../utils/mask_toxic_span.py \
--input path/to/your/RealToxicityPrompts/rtp_train.json \
--output ../dataset/rtp_mask_span.json \
--model_path ../ckp/segment_cnn

Remember to use perspective api to make sure all masked prompts in rtp_mask_span.json are non-toxic!

Rephrase Masked Prompts

cd utils

python rephrase.py \
--file ../dataset/rtp_mask_span.json \
--save ../dataset/rtp_rephrase.json

Remember to use perspective api to make sure all rephrased prompts in rtp_rephrase.json are non-toxic!

Continual Generation

cd utils

python continuation_inference.py \
--model path/to/your/corresponding_model \
--file ../dataset/rtp_rephrase.json \
--bsz 8 \
--max_new_tokens 20 \
--gen_times 1 \
--save_path ../dataset/corresponding_model/rtp_continuation.json

python perspective_api_dataset.py \
--file ../dataset/corresponding_model/rtp_continuation.json \
--output ../dataset/corresponding_model/rtp_continuation_api.json \
--api_key <your_perspective_api_key> 

Make Training Set

python ../utils/make_train_set.py \
--input ../dataset/corresponding_model/rtp_continuation_api.json \
--output ../dataset/corresponding_model/rtp_cmd.json

LLMs self-detoxification

cd ../train_cmd

sh train.sh

Data Release

We provide the download link for all the original data used in our paper:

Dataset Samples Download Link
Real Toxicity Prompts ~100k download
Jigsaw Toxic Comment Classification Challenge ~160k(Train) download