Skip to content

ZJUWYH/seam

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Self-Destructive Language Model

This is the offical implemation of "Self-Destructive Language Model", ICLR 2026.

📑 Table of Contents

🛠️ Setup

Environment Setup

Create a virtual environment using Conda:

conda create -n seam python=3.12.3
conda activate seam

Package Installation

Install required packages:

pip install -r requirements.txt

API Configuration

OpenAI

Set your OpenAI API key in the environment:

export OPENAI_API_KEY=<your_api_key>

Huggingface

  1. Request access to meta-llama/Llama-2-7b-chat-hf
  2. Set your Huggingface API token:
export HUGGINGFACE_API_TOKEN=<your_api_key>

Weights & Biases

Set up Weights & Biases for experiment tracking:

export WANDB_API_KEY=<your_api_key>
wandb login

📊 Experiment

Dataset Preparation

Option 1: Download the dataset used in the RepNoise paper: Get the dataset from representation-noising repository and place it in the ./data folder
Option 2: Generate the dataset:

bash scripts/create_dataset.sh

Next change the file folder in src/data_processing/_datasets.py

Configuration Settings

Set the configuration for training and evaluation (e.g., the learning_rate). Configuration priority is as follows (from lowest to highest):

  1. Default parameters in HuggingfaceTrainer
  2. Settings in YAML files under the ./config folder
  3. Command line arguments in the training or evaluation scripts

Model Training

Run the training script:

bash scripts/train.sh

Model Evaluation

  1. Update the trained model path in scripts/eval.sh
  2. Run the evaluation script:
bash scripts/eval.sh

🎉 Acknowledgments

This codebase is constructed based on the representation-noising repository. Thanks to the authors for their great work!

📝 Citation

If you find this codebase useful, please cite our paper:

@inproceedings{
wang2026selfdestructive,
title={Self-Destructive Language Models},
author={Yuhui Wang and Rongyi Zhu and Ting Wang},
booktitle={The Fourteenth International Conference on Learning Representations},
year={2026},
url={https://openreview.net/forum?id=ERNpUGr8M5}
}

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors