MergeGuard

This is the official implementation for the paper Have You Merged My Model? On The Robustness of Large Language Model IP Protection Methods Against Model Merging.

News

Our paper is accepted by CCS-LAMPS 2024!

A. Prepare LLMs

The LLMs used in our paper are LLaMA-2-7B-hf, LLaMA-2-7B-CHAT-hf, and WizardMath-7B-V1.0.
Watermarked LLMs: We leverage Quantization Watermarking to embed normal watermaks into LLaMA-2-7B-CHAT.
Fingerprinted LLMs: We leverage Instructional Fingerprint (SFT version) to protect LLaMA-2-7B-CHAT.

B. Merge LLMs

We leverage mergekit to merge LLMs. You should download and install it first. The merging configurations used in our paper can be found in /merge_config. You can merge your LLMs as

mergekit-yaml merge_config/ties.yml [path_to_save_merged_model] --cuda

C. Evaluation Performance

We use StrongReject-small dataset to evaluate the safety alignment within LLMs. You can run eval_safe.py to get the refusal rate results.

python eval_safe.py --model llama2-7b-chat

We use GSM8K dataset to evaluate the mathematical reasoning ability of LLMs. You can run eval_math.py to get the prediction accuracy results.

python eval_math.py --model llama2-7b-chat

Citation

If you find our work helpful, please cite it as follows, thanks!

@misc{cong2024mergeguardeval,
      title={Have You Merged My Model? On The Robustness of Large Language Model IP Protection Methods Against Model Merging}, 
      author={Tianshuo Cong and Delong Ran and Zesen Liu and Xinlei He and Jinyuan Liu and Yichen Gong and Qi Li and Anyu Wang and Xiaoyun Wang},
      year={2024},
      eprint={2404.05188},
      archivePrefix={arXiv},
      primaryClass={cs.CR}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MergeGuard

News

A. Prepare LLMs

B. Merge LLMs

C. Evaluation Performance

Citation

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
dataset		dataset
fig		fig
merge_config		merge_config
README.md		README.md
eval_math.py		eval_math.py
eval_safe.py		eval_safe.py

ThuCCSLab/MergeGuard

Folders and files

Latest commit

History

Repository files navigation

MergeGuard

News

A. Prepare LLMs

B. Merge LLMs

C. Evaluation Performance

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages