Self-Destructive Language Model

This is the offical implemation of "Self-Destructive Language Model", ICLR 2026.

📑 Table of Contents

Self-Destructive Language Model

🛠️ Setup

Environment Setup

Create a virtual environment using Conda:

conda create -n seam python=3.12.3
conda activate seam

Package Installation

Install required packages:

pip install -r requirements.txt

API Configuration

OpenAI

Set your OpenAI API key in the environment:

export OPENAI_API_KEY=<your_api_key>

Huggingface

Request access to meta-llama/Llama-2-7b-chat-hf
Set your Huggingface API token:

export HUGGINGFACE_API_TOKEN=<your_api_key>

Weights & Biases

Set up Weights & Biases for experiment tracking:

export WANDB_API_KEY=<your_api_key>
wandb login

📊 Experiment

Dataset Preparation

Option 1: Download the dataset used in the RepNoise paper: Get the dataset from representation-noising repository and place it in the ./data folder
Option 2: Generate the dataset:

bash scripts/create_dataset.sh

Next change the file folder in src/data_processing/_datasets.py

Configuration Settings

Set the configuration for training and evaluation (e.g., the learning_rate). Configuration priority is as follows (from lowest to highest):

Default parameters in HuggingfaceTrainer
Settings in YAML files under the ./config folder
Command line arguments in the training or evaluation scripts

Model Training

Run the training script:

bash scripts/train.sh

Model Evaluation

Update the trained model path in scripts/eval.sh
Run the evaluation script:

bash scripts/eval.sh

🎉 Acknowledgments

This codebase is constructed based on the representation-noising repository. Thanks to the authors for their great work!

📝 Citation

If you find this codebase useful, please cite our paper:

@inproceedings{
wang2026selfdestructive,
title={Self-Destructive Language Models},
author={Yuhui Wang and Rongyi Zhu and Ting Wang},
booktitle={The Fourteenth International Conference on Learning Representations},
year={2026},
url={https://openreview.net/forum?id=ERNpUGr8M5}
}

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
config		config
data		data
scripts		scripts
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pipeline.png		pipeline.png
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Self-Destructive Language Model

📑 Table of Contents

🛠️ Setup

Environment Setup

Package Installation

API Configuration

OpenAI

Huggingface

Weights & Biases

📊 Experiment

Dataset Preparation

Configuration Settings

Model Training

Model Evaluation

🎉 Acknowledgments

📝 Citation

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Self-Destructive Language Model

📑 Table of Contents

🛠️ Setup

Environment Setup

Package Installation

API Configuration

OpenAI

Huggingface

Weights & Biases

📊 Experiment

Dataset Preparation

Configuration Settings

Model Training

Model Evaluation

🎉 Acknowledgments

📝 Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages