On the Learnability of Watermarks for Language Models

This repository contains code for the ICLR 2024 paper On the Learnability of Watermarks for Language Models by Chenchen Gu, Xiang Lisa Li, Percy Liang, and Tatsunori Hashimoto.

Setup

To install the necessary packages, first create a conda environment.

conda create -n <env_name> python=3.11
conda activate <env_name>

Then, install the required packages with

pip install -r requirements.txt

Usage

We include scripts for reproducing experiments in the paper in the scripts directory, which also serve as examples for how to run the files in this repository. README.md's within scripts provide instructions on how to run the scripts. Note that all scripts should be run from the top-level directory.

Feel free to create an issue if you encounter any problems or bugs!

References

Code in the watermarks/kgw directory is from github.com/jwkirchenbauer/lm-watermarking. In the watermarks/kth directory, detect.py, levenshtein.pyx, and mersenne.py are from github.com/jthickstun/watermark. train_logit_distill.py and train_sampling_distill.py are adapted from github.com/huggingface/transformers/blob/main/examples/pytorch/language-modeling/run_clm.py.

Models

Below are links to trained model weights from the paper's experiments (hosted on Hugging Face). They can also be found at this Hugging Face collection.

Logit-based watermark distilled Llama 2 7B

KGW $k = 0, \gamma = 0.25, \delta = 1$
KGW $k = 0, \gamma = 0.25, \delta = 2$
KGW $k = 1, \gamma = 0.25, \delta = 1$
KGW $k = 1, \gamma = 0.25, \delta = 2$
KGW $k = 2, \gamma = 0.25, \delta = 2$
Aar k = 2
Aar k = 3
Aar k = 4
KTH s = 1
KTH s = 2
KTH s = 4
KTH s = 256

Sampling-based watermark distilled Llama 2 7B

KGW $k = 0, \gamma = 0.25, \delta = 1$
KGW $k = 0, \gamma = 0.25, \delta = 2$
KGW $k = 1, \gamma = 0.25, \delta = 1$
KGW $k = 1, \gamma = 0.25, \delta = 2$
KGW $k = 2, \gamma = 0.25, \delta = 2$
Aar k = 2
Aar k = 3
Aar k = 4
KTH s = 1
KTH s = 2
KTH s = 4
KTH s = 256

Sampling-based watermark distilled Pythia 1.4B

KGW $k = 0, \gamma = 0.25, \delta = 1$
KGW $k = 0, \gamma = 0.25, \delta = 2$
KGW $k = 1, \gamma = 0.25, \delta = 1$
KGW $k = 1, \gamma = 0.25, \delta = 2$
KGW $k = 2, \gamma = 0.25, \delta = 2$
Aar k = 2
Aar k = 3
Aar k = 4
KTH s = 1
KTH s = 2
KTH s = 4
KTH s = 256

Training data for sampling-based watermark distillation

Below are links to the watermarked training data used for the paper's sampling-based watermark distillation experiments (hosted on Hugging Face). They can also be found at this Hugging Face collection.

KGW $k = 0, \gamma = 0.25, \delta = 1$
KGW $k = 0, \gamma = 0.25, \delta = 2$
KGW $k = 1, \gamma = 0.25, \delta = 1$
KGW $k = 1, \gamma = 0.25, \delta = 2$
KGW $k = 2, \gamma = 0.25, \delta = 2$
Aar k = 2
Aar k = 3
Aar k = 4
KTH s = 1
KTH s = 2
KTH s = 4
KTH s = 256

Citation

Please cite this paper using the following BibTex entry:

@inproceedings{gu2024learnability,
    title={On the Learnability of Watermarks for Language Models},
    author={Chenchen Gu and Xiang Lisa Li and Percy Liang and Tatsunori Hashimoto},
    booktitle={The Twelfth International Conference on Learning Representations},
    year={2024},
    url={https://arxiv.org/abs/2312.04469}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

On the Learnability of Watermarks for Language Models

Setup

Usage

References

Models

Logit-based watermark distilled Llama 2 7B

Sampling-based watermark distilled Llama 2 7B

Sampling-based watermark distilled Pythia 1.4B

Training data for sampling-based watermark distillation

Citation

Files

README.md

Latest commit

History

README.md

File metadata and controls

On the Learnability of Watermarks for Language Models

Setup

Usage

References

Models

Logit-based watermark distilled Llama 2 7B

Sampling-based watermark distilled Llama 2 7B

Sampling-based watermark distilled Pythia 1.4B

Training data for sampling-based watermark distillation

Citation