Skip to content

Jayfeather1024/Backdoor-Enhanced-Alignment

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 

Repository files navigation

Backdoor-Enhanced-Alignment

This is the official code repository for the paper Mitigating Fine-tuning Jailbreak Attack with Backdoor Enhanced Alignment.

For now, we only realease a demo example for GPT-3.5 experiments through OpenAI API. The complete evaluation pipelines and support for more open source LLMs will be released soon.

Citation

Please cite the following preprint when referencing our paper:

@misc{wang2024mitigating,
        title={Mitigating Fine-tuning Jailbreak Attack with Backdoor Enhanced Alignment}, 
        author={Jiongxiao Wang and Jiazhao Li and Yiquan Li and Xiangyu Qi and Junjie Hu and Yixuan Li and Patrick McDaniel and Muhao Chen and Bo Li and Chaowei Xiao},
        year={2024},
        eprint={2402.14968},
        archivePrefix={arXiv},
        primaryClass={cs.CR}
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published