Skip to content

XMUDeepLIT/ZeroUnlearn

Repository files navigation

ZeroUnlearn: Few-Shot Knowledge Unlearning in Large Language Models

A pioneering framework that reframes machine unlearning as precise knowledge remapping through multiplicative parameter updates, achieving thorough knowledge removal while preserving model utility.

🏴 Overview

ZeroUnlearn Framework

Large language models (LLMs) trained on extensive web corpora inevitably acquire and retain sensitive, private, or outdated information. The ability to selectively remove specific knowledge—known as machine unlearning—has become critical for responsible LLM deployment, particularly for compliance with privacy regulations, content moderation, and factual updates.

ZeroUnlearn is a novel framework designed for few-shot knowledge unlearning in LLMs. Unlike existing approaches that either require prohibitively expensive full retraining or suffer from catastrophic forgetting through aggressive fine-tuning (e.g., gradient ascent), ZeroUnlearn repurposes knowledge editing techniques to achieve precise unlearning.

Core Idea

Rather than destructively perturbing model weights, ZeroUnlearn overwrites sensitive information by remapping it to a predefined safe state (e.g., the <EOS> token). The framework enforces a dual objective:

  1. Redirecting sensitive inputs to a designated neutral target
  2. Orthogonalizing the edited representations with respect to their original sensitive embeddings

This ensures that the unlearning process fundamentally projects sensitive knowledge into a null space, achieving more complete erasure while preserving the model's general capabilities.

Key Features

  • Knowledge Remapping: Reframes unlearning as precise knowledge editing rather than destructive weight perturbation
  • Null Space Projection: Projects sensitive inputs into a space orthogonal to original representations for thorough removal
  • Closed-Form Solution: Derives an optimal transformation matrix analytically, enabling efficient one-step optimization
  • Few-Shot Capability: Achieves effective unlearning with only a small number of samples
  • Gradient-Based Extension: Includes ZeroUnlearn-GD, a gradient-based variant for multi-sample batch unlearning
  • Utility Preservation: Maintains model performance on unrelated tasks and general linguistic capabilities

📊 Main Results

The tables below show the few-shot unlearning results of ZeroUnlearn on MCF and ZsRE datasets.

Metrics:

  • Eff. (Efficacy) ↓: Lower is better - measures how well the target knowledge is removed
  • Gen. (Generalization) ↓: Lower is better - measures unlearning generalization to paraphrased queries
  • Spe. (Specificity) ↑: Higher is better - measures preservation of unrelated knowledge
  • PPL (Perplexity) ↓: Lower is better - measures model fluency

Llama-3.2-3B-Instruct

Method Eff. ↓ Gen. ↓ Spe. ↑ PPL ↓ Eff. ↓ Gen. ↓ Spe. ↑ PPL ↓
MCF ZsRE
Base 18.20±3.84 20.30±5.33 19.60±3.47 12.88±0.00 32.82±4.09 32.23±4.16 28.12±2.65 12.88±0.00
GA 2.00±3.34 1.80±2.89 1.06±1.79 >1000 1.41±1.36 1.16±1.42 3.53±1.41 >1000
FT 0.00±0.00 0.00±0.00 0.00±0.00 18.25±1.28 28.83±3.96 27.70±3.34 26.80±2.57 13.24±0.11
ROME 18.20±3.84 20.30±5.37 19.50±3.51 12.88±0.20 32.80±4.20 32.17±4.09 28.05±2.66 12.89±0.20
MEMIT 17.00±4.22 18.30±4.92 19.20±3.62 12.86±0.02 32.32±4.00 31.17±4.61 28.01±2.60 12.89±0.02
AlphaEdit 2.60±2.37 11.80±3.94 18.36±3.63 12.84±0.02 29.59±3.95 29.90±4.67 27.80±2.77 12.88±0.04
ZeroUnlearn 0.40±0.80 4.60±2.24 14.90±2.93 13.06±0.18 27.85±3.87 27.52±3.87 27.73±2.70 13.08±0.06

Llama-3.1-8B-Instruct

Method Eff. ↓ Gen. ↓ Spe. ↑ PPL ↓ Eff. ↓ Gen. ↓ Spe. ↑ PPL ↓
MCF ZsRE
Base 24.60±5.29 22.80±4.35 21.96±4.28 7.47±0.00 40.42±4.92 36.84±4.24 29.87±2.30 7.47±0.00
GA 1.20±1.83 0.90±1.81 0.26±0.72 >1000 0.27±0.61 0.27±0.61 0.00±0.00 >1000
FT 0.00±0.00 0.00±0.00 0.00±0.00 10.23±0.67 31.36±2.19 30.91±2.96 26.99±2.01 8.16±0.08
ROME 24.40±5.04 22.60±4.10 21.86±4.28 7.48±0.01 40.46±4.85 36.84±4.16 29.99±2.37 7.48±0.01
MEMIT 9.60±4.63 16.20±4.07 21.08±4.24 7.51±0.03 35.15±3.99 34.60±3.15 30.05±2.46 7.48±0.03
AlphaEdit 0.20±0.60 7.80±2.27 19.74±4.20 7.49±0.05 34.12±4.16 34.19±3.33 29.93±2.49 7.48±0.07
ZeroUnlearn 0.00±0.00 4.60±2.11 16.82±3.64 7.77±0.06 32.67±3.43 32.39±3.34 29.67±2.36 7.76±0.10

⚡️ Quickstart Guide

1. Environment Setup

# Clone the repository (Anonymous for review)
cd ZeroUnlearn

# Install dependencies
pip install -r requirements.txt

2. Configure Paths

Update the paths in sh/run.sh:

# Base directory for the project
ul_dir=/path/to/ZeroUnlearn

# Model directory (where pretrained models are stored)
model_dir=/path/to/models

3. Run Unlearning

The main entry point is sh/run.sh, which handles GPU allocation and launches the unlearning pipeline:

# Run ZeroUnlearn with 50 unlearning samples
bash sh/run.sh ZeroUnlearn 50

Or run the evaluation script directly:

python experiments/evaluate.py \
    --alg_name ZeroUnlearn \
    --model_name Llama-3.1-8B-Instruct \
    --hparams_fname Llama-3.1-8B-Instruct.json \
    --ds_name mcf \
    --unlearn_num 50 \
    --retain_num 1000 \
    --model_path_dir /path/to/models

4. Available Methods

The following unlearning methods are implemented:

Method Description
ZeroUnlearn Our proposed method with closed-form solution for few-shot unlearning
ZeroUnlearn_GD Gradient-based variant for multi-sample batch unlearning
GA Gradient Ascent baseline
FT Fine-Tuning baseline
ROME Rank-One Model Editing
MEMIT Mass-Editing Memory in Transformer
AlphaEdit Null-space constrained editing

5. Datasets

Supported datasets:

  • MCF (CounterFact): Factual knowledge unlearning benchmark
  • ZsRE: Zero-shot Relation Extraction dataset
  • MQuAKE: Multi-hop question answering knowledge editing

📁 Project Structure

ZeroUnlearn/
├── ZeroUnlearn/          # Main ZeroUnlearn implementation
├── ZeroUnlearn_GD/       # ZeroUnlearn with gradient descent
├── AlphaEdit/            # AlphaEdit baseline
├── memit/                # MEMIT baseline
├── rome/                 # ROME baseline
├── baselines/            # Other baseline methods (GA, FT, MEND)
├── experiments/          # Evaluation scripts
├── glue_eval/            # Downstream evaluation
├── dsets/                # Dataset loaders
├── hparams/              # Hyperparameter configurations
├── sh/                   # Shell scripts
├── util/                 # Utility functions
└── images/               # Figures and diagrams

❓ FAQ

Q: What hardware is required?

A: Our experiments were conducted on servers with NVIDIA GPUs (A100/A800). A single GPU with 40GB+ memory is recommended for 8B models, while 3B models can run on GPUs with 24GB memory.

Q: How do I add a new model?

A: Create a new hyperparameter JSON file in hparams/ZeroUnlearn/ following the existing templates. Key parameters include layer indices and module templates specific to your model architecture.

Q: Can I use custom datasets?

A: Yes! Implement a new dataset class in dsets/ following the existing patterns. The dataset should provide prompt, subject, target_true, and target_new fields.


🙏 Acknowledgements

Our framework builds upon the excellent work of:

  • MEMIT - Mass-Editing Memory in a Transformer
  • ROME - Rank-One Model Editing
  • AlphaEdit - Null-space constrained editing

📄 License

This project is licensed under the MIT License.

About

The code implementation for "ZeroUnlearn: Few-Shot Knowledge Unlearning in Large Language Models" (ICML 2026).

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors