This is the official code for paper "CMD: a framework for Context-aware Model self-Detoxification"
- CMD utils language models to synthesize data step by step and then train via chain of thoughts, aiming to enable the model self-detoxification.
- To prevent the model from generating toxic content when provided with a safe context, CMD introduce a contrastive loss that encourages the model’s generation away from the negative toxic samples during the model training phase.
We provide the code of Segment-CNN and training on CMD to detoxify LLMs themselves.
conda env create -f environment.yaml
Here we will create the span dataset for training Segment-CNN.
cd utils
python csv_to_json.py \
--input path/to/your/jigsaw/train.csv \
--json_save ../dataset/total.json \
--train_span_json_save ../dataset/segment_cnn_train.json \
--test_span_json_save ../dataset/segment_cnn_test.json
sh perspective_api.sh
cd segment_cnn
python -u run_glue_no_trainer.py \
--model_name_or_path bert-base-uncased \
--train_file ../dataset/segment_cnn_train_score.json \
--validation_file ../dataset/segment_cnn_test_score.json \
--max_length 128 \
--per_device_train_batch_size 256 \
--per_device_eval_batch_size 256 \
--learning_rate 2e-5 \
--num_train_epochs 10 \
--output_dir ../ckp/segment_cnn \
--pad_to_max_length
Note that the original RealToxicityPrompts dataset isn't divided into training and testing sets, we divide prompts.jsonl of RealToxicityPrompts dataset into rtp_train.json and rtp_test.json.
cd segment_cnn
python ../utils/mask_toxic_span.py \
--input path/to/your/RealToxicityPrompts/rtp_train.json \
--output ../dataset/rtp_mask_span.json \
--model_path ../ckp/segment_cnn
Remember to use perspective api to make sure all masked prompts in rtp_mask_span.json are non-toxic!
cd utils
python rephrase.py \
--file ../dataset/rtp_mask_span.json \
--save ../dataset/rtp_rephrase.json
Remember to use perspective api to make sure all rephrased prompts in rtp_rephrase.json are non-toxic!
cd utils
python continuation_inference.py \
--model path/to/your/corresponding_model \
--file ../dataset/rtp_rephrase.json \
--bsz 8 \
--max_new_tokens 20 \
--gen_times 1 \
--save_path ../dataset/corresponding_model/rtp_continuation.json
python perspective_api_dataset.py \
--file ../dataset/corresponding_model/rtp_continuation.json \
--output ../dataset/corresponding_model/rtp_continuation_api.json \
--api_key <your_perspective_api_key>
python ../utils/make_train_set.py \
--input ../dataset/corresponding_model/rtp_continuation_api.json \
--output ../dataset/corresponding_model/rtp_cmd.json
cd ../train_cmd
sh train.sh
We provide the download link for all the original data used in our paper:
Dataset | Samples | Download Link |
---|---|---|
Real Toxicity Prompts | ~100k | download |
Jigsaw Toxic Comment Classification Challenge | ~160k(Train) | download |