Skip to content

chenchenygu/watermark-learnability

Repository files navigation

On the Learnability of Watermarks for Language Models

This repository contains code for the paper On the Learnability of Watermarks for Language Models by Chenchen Gu, Xiang Lisa Li, Percy Liang, and Tatsunori Hashimoto.

Setup

To install the necessary packages, first create a conda environment.

conda create -n <env_name> python=3.11
conda activate <env_name>

Then, install the required packages with

pip install -r requirements.txt

Usage

We include scripts for reproducing experiments in the paper in the scripts directory, which also serve as examples for how to run the files in this repository. README.md's within scripts provide instructions on how to run the scripts. Note that all scripts should be run from the top-level directory.

Feel free to create an issue if you encounter any problems or bugs!

References

Code in the watermarks/kgw directory is from github.com/jwkirchenbauer/lm-watermarking. In the watermarks/kth directory, detect.py, levenshtein.pyx, and mersenne.py are from github.com/jthickstun/watermark. train_logit_distill.py and train_sampling_distill.py are adapted from github.com/huggingface/transformers/blob/main/examples/pytorch/language-modeling/run_clm.py.

Models

Below are links to trained model weights from the paper's experiments (hosted on Hugging Face).

Logit-based watermark distilled Llama 2 7B

Sampling-based watermark distilled Llama 2 7B

Sampling-based watermark distilled Pythia 1.4B

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published