Name		Name	Last commit message	Last commit date
parent directory ..
homoglyph_data		homoglyph_data
PIPELINE.md		PIPELINE.md
README.md		README.md
alternative_prf_schemes.py		alternative_prf_schemes.py
homoglyphs.py		homoglyphs.py
kgw_watermark.py		kgw_watermark.py
normalizers.py		normalizers.py
requirements.txt		requirements.txt
run_pipeline.sh		run_pipeline.sh
watermark_processor.py		watermark_processor.py

README.md

💧2.0: On the Reliability of Watermarks for Large Language Models

This directory contains the codebase for reproducing the experiments in our new 6/7/23 preprint.

NOTE: this is a preliminary release, so please expect some small changes in the future as required.

The watermarking and watermark detection code itself is an extension of the WatermarkLogitsProcessor and WatermarkDetector classes released as part of the original work and contained in the root of the repository. Additional logic implementing a wider array of seeding schemes and alternate detection strategies is included and depended upon by the extended versions of the classes in this directory.

To facilitate the broader array of experiments required for this study, an extra pipeline abstraction was implemented to manage the "generation", paraphrase "attack", and "evaluation" or detection phases. The general setup is that data, i.e. sets of generated samples, is written and read by each stage as "json lines" files *.jsonl with associated metadata files *.json to keep track of parameter settings used at each stage.

A prose version of usage instructions for the pipeline is described in a separate markdown file here: PIPELINE.md

wandb

The pipeline scripts, and in particular, the evaluation stage where detection is run and generation quality metrics are computed, are configured to push results to weights and biases (wandb). The figures in the paper are produced by:

sketching out the charts in wandb using filters and tags
exporting/downloading the csv's of the data for each chart, and
loading them in a notebook to format plots as necessary.

Alternately, the evaluation stage also saves a jsonl file where every line is a set of generations and all associated metrics and detection scores computed for it. This can also be loaded and analyzed manually in pandas, though the ROC space analyzes and average@T series for some metrics will have to be recomputed.

llama

In order to use the llama model, you need to bring-your-own-weights, and then covert them to the huggingface format.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kgw

kgw

homoglyph_data

homoglyph_data

PIPELINE.md

PIPELINE.md

README.md

README.md

alternative_prf_schemes.py

alternative_prf_schemes.py

homoglyphs.py

homoglyphs.py

kgw_watermark.py

kgw_watermark.py

normalizers.py

normalizers.py

requirements.txt

requirements.txt

run_pipeline.sh

run_pipeline.sh

watermark_processor.py

watermark_processor.py

README.md

💧2.0: On the Reliability of Watermarks for Large Language Models

NOTE: this is a preliminary release, so please expect some small changes in the future as required.

wandb

llama

Files

kgw

Directory actions

More options

Directory actions

More options

Latest commit

History

kgw

Folders and files

parent directory

💧2.0: On the Reliability of Watermarks for Large Language Models

NOTE: this is a preliminary release, so please expect some small changes in the future as required.

wandb

llama