Skip to content

NY1024/RACE

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation


Reasoning-Augmented Conversation for Multi-Turn Jailbreak Attacks on Large Language Models

Official implementation of RACE.

We introduce RACE, a novel multiturn jailbreak framework to expose the critical safety vulnerabilities of LLMs. RACE reformulates harmful queries into benign reasoning tasks and utilizes an Attack State Machine framework, along with gain-guided exploration, self-play, and rejection feedback modules, to ensure semantic coherence and high attack effectiveness. Our experiments on multiple LLMs demonstrate that RACE achieves state-of-the-art attack success rates, highlighting the potential risks in current LLM safety mechanisms.

Experiment Results

Start

We have already updated part of the code, and all of our code will be released soon!

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages