Skip to content

[ACL 2024] Raccoon: Prompt Extraction Benchmark of LLM-Integrated Applications

License

Notifications You must be signed in to change notification settings

M0gician/RaccoonBench

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Raccoon

Raccoon is a test bench for prompt extraction attacks on LLM-Integrated Applications. With the proliferation of LLM-integrated applications such as GPT-s, millions are deployed, offering valuable services through proprietary instruction prompts. These systems, however, are prone to prompt extraction attacks through meticulously designed queries. To help mitigate this problem, we introduce the Raccoon benchmark which comprehensively evaluates a model's susceptibility to prompt extraction attacks.

Raccoon

We evaluate systems

  • Under both defenseless and defended scenarios, employing a dual approach to evaluate the effectiveness of existing defenses and the resilience of the models.
  • The benchmark encompasses 14 categories of prompt extraction attacks, with additional compounded attacks that closely mimic the strategies of potential attackers.
  • A diverse collection of defense templates. This array is, to our knowledge, the most extensive compilation of prompt theft attacks and defense mechanisms to date.

Raccoon

News and Updates

  • [05/24/2024] Publish initial release of Raccoon (GPTs data will be released soon)
  • [05/17/2024] Our paper has been accepted by ACL 2024 Findings

Table of Contents

Installation

$ conda create --name <env> --file requirements.txt

Usage

1. Run benchmark on Singular attacks, Defenseless GPTs, GPT-3.5-0125

python run_raccoon_gang.py \
--model_name gpt-3.5-0125 \
--gpts_path "./Data/gpts/gpts196" \
--attack_path "./Data/attacks/singular_attacks" \
--ref_def_path "./Data/reference/gpts196_defense_prompt.json" \
--def_tmpl_path "./Data/defenses/defense_template.json" \
--use_sys_template \
--use_defenseless_user_prompt

2. Run benchmark on Top 5 Singular attacks, Defended GPTs, GPT-3.5-0125

python run_raccoon_gang.py \
--model_name gpt-3.5-0125 \
--gpts_path "./Data/gpts/gpts196" \
--attack_path "./Data/attacks/singular_attacks_deflesstop5" \
--ref_def_path "./Data/reference/gpts196_defense_prompt.json" \
--def_tmpl_path "./Data/defenses/defense_template.json" \
--use_sys_template \
--use_custom_defenses

3. Run benchmark on Compound attacks, Defenseless GPTs, GPT-3.5-0125

python run_raccoon_gang.py \
--model_name gpt-3.5-0125 \
--gpts_path "./Data/gpts/gpts196" \
--attack_path "./Data/attacks/compound_attacks" \
--ref_def_path "./Data/reference/gpts196_defense_prompt.json" \
--def_tmpl_path "./Data/defenses/defense_template.json" \
--use_sys_template \
--use_defenseless_user_prompt

4. Run benchmark on Compound attacks, Defended GPTs, GPT-3.5-0125

python run_raccoon_gang.py \
--model_name gpt-3.5-0125 \
--gpts_path "./Data/gpts/gpts196" \
--attack_path "./Data/attacks/compound_attacks" \
--ref_def_path "./Data/reference/gpts196_defense_prompt.json" \
--def_tmpl_path "./Data/defenses/defense_template.json" \
--use_sys_template \
--use_custom_defenses

Components

  • Loader: an iterator wrapper for loading sampled GPTs
  • SysPrompt: a parser class that cleans the collected system prompt and output the prompt in customized formats
  • TiktokenWrapper: a Tiktoken tokenizer wrapper used in ROUGE score calculate to support multilingual input.
  • Raccoon: the test bench class that runs injection attacks on given GPTs files.

Attack Categories

Raccoon

Citation

About

[ACL 2024] Raccoon: Prompt Extraction Benchmark of LLM-Integrated Applications

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages