A pioneering framework that reframes machine unlearning as precise knowledge remapping through multiplicative parameter updates, achieving thorough knowledge removal while preserving model utility.
Large language models (LLMs) trained on extensive web corpora inevitably acquire and retain sensitive, private, or outdated information. The ability to selectively remove specific knowledge—known as machine unlearning—has become critical for responsible LLM deployment, particularly for compliance with privacy regulations, content moderation, and factual updates.
ZeroUnlearn is a novel framework designed for few-shot knowledge unlearning in LLMs. Unlike existing approaches that either require prohibitively expensive full retraining or suffer from catastrophic forgetting through aggressive fine-tuning (e.g., gradient ascent), ZeroUnlearn repurposes knowledge editing techniques to achieve precise unlearning.
Rather than destructively perturbing model weights, ZeroUnlearn overwrites sensitive information by remapping it to a predefined safe state (e.g., the <EOS> token). The framework enforces a dual objective:
- Redirecting sensitive inputs to a designated neutral target
- Orthogonalizing the edited representations with respect to their original sensitive embeddings
This ensures that the unlearning process fundamentally projects sensitive knowledge into a null space, achieving more complete erasure while preserving the model's general capabilities.
- Knowledge Remapping: Reframes unlearning as precise knowledge editing rather than destructive weight perturbation
- Null Space Projection: Projects sensitive inputs into a space orthogonal to original representations for thorough removal
- Closed-Form Solution: Derives an optimal transformation matrix analytically, enabling efficient one-step optimization
- Few-Shot Capability: Achieves effective unlearning with only a small number of samples
- Gradient-Based Extension: Includes ZeroUnlearn-GD, a gradient-based variant for multi-sample batch unlearning
- Utility Preservation: Maintains model performance on unrelated tasks and general linguistic capabilities
The tables below show the few-shot unlearning results of ZeroUnlearn on MCF and ZsRE datasets.
Metrics:
- Eff. (Efficacy) ↓: Lower is better - measures how well the target knowledge is removed
- Gen. (Generalization) ↓: Lower is better - measures unlearning generalization to paraphrased queries
- Spe. (Specificity) ↑: Higher is better - measures preservation of unrelated knowledge
- PPL (Perplexity) ↓: Lower is better - measures model fluency
| Method | Eff. ↓ | Gen. ↓ | Spe. ↑ | PPL ↓ | Eff. ↓ | Gen. ↓ | Spe. ↑ | PPL ↓ |
|---|---|---|---|---|---|---|---|---|
| MCF | ZsRE | |||||||
| Base | 18.20±3.84 | 20.30±5.33 | 19.60±3.47 | 12.88±0.00 | 32.82±4.09 | 32.23±4.16 | 28.12±2.65 | 12.88±0.00 |
| GA | 2.00±3.34 | 1.80±2.89 | 1.06±1.79 | >1000 | 1.41±1.36 | 1.16±1.42 | 3.53±1.41 | >1000 |
| FT | 0.00±0.00 | 0.00±0.00 | 0.00±0.00 | 18.25±1.28 | 28.83±3.96 | 27.70±3.34 | 26.80±2.57 | 13.24±0.11 |
| ROME | 18.20±3.84 | 20.30±5.37 | 19.50±3.51 | 12.88±0.20 | 32.80±4.20 | 32.17±4.09 | 28.05±2.66 | 12.89±0.20 |
| MEMIT | 17.00±4.22 | 18.30±4.92 | 19.20±3.62 | 12.86±0.02 | 32.32±4.00 | 31.17±4.61 | 28.01±2.60 | 12.89±0.02 |
| AlphaEdit | 2.60±2.37 | 11.80±3.94 | 18.36±3.63 | 12.84±0.02 | 29.59±3.95 | 29.90±4.67 | 27.80±2.77 | 12.88±0.04 |
| ZeroUnlearn | 0.40±0.80 | 4.60±2.24 | 14.90±2.93 | 13.06±0.18 | 27.85±3.87 | 27.52±3.87 | 27.73±2.70 | 13.08±0.06 |
| Method | Eff. ↓ | Gen. ↓ | Spe. ↑ | PPL ↓ | Eff. ↓ | Gen. ↓ | Spe. ↑ | PPL ↓ |
|---|---|---|---|---|---|---|---|---|
| MCF | ZsRE | |||||||
| Base | 24.60±5.29 | 22.80±4.35 | 21.96±4.28 | 7.47±0.00 | 40.42±4.92 | 36.84±4.24 | 29.87±2.30 | 7.47±0.00 |
| GA | 1.20±1.83 | 0.90±1.81 | 0.26±0.72 | >1000 | 0.27±0.61 | 0.27±0.61 | 0.00±0.00 | >1000 |
| FT | 0.00±0.00 | 0.00±0.00 | 0.00±0.00 | 10.23±0.67 | 31.36±2.19 | 30.91±2.96 | 26.99±2.01 | 8.16±0.08 |
| ROME | 24.40±5.04 | 22.60±4.10 | 21.86±4.28 | 7.48±0.01 | 40.46±4.85 | 36.84±4.16 | 29.99±2.37 | 7.48±0.01 |
| MEMIT | 9.60±4.63 | 16.20±4.07 | 21.08±4.24 | 7.51±0.03 | 35.15±3.99 | 34.60±3.15 | 30.05±2.46 | 7.48±0.03 |
| AlphaEdit | 0.20±0.60 | 7.80±2.27 | 19.74±4.20 | 7.49±0.05 | 34.12±4.16 | 34.19±3.33 | 29.93±2.49 | 7.48±0.07 |
| ZeroUnlearn | 0.00±0.00 | 4.60±2.11 | 16.82±3.64 | 7.77±0.06 | 32.67±3.43 | 32.39±3.34 | 29.67±2.36 | 7.76±0.10 |
# Clone the repository (Anonymous for review)
cd ZeroUnlearn
# Install dependencies
pip install -r requirements.txtUpdate the paths in sh/run.sh:
# Base directory for the project
ul_dir=/path/to/ZeroUnlearn
# Model directory (where pretrained models are stored)
model_dir=/path/to/modelsThe main entry point is sh/run.sh, which handles GPU allocation and launches the unlearning pipeline:
# Run ZeroUnlearn with 50 unlearning samples
bash sh/run.sh ZeroUnlearn 50Or run the evaluation script directly:
python experiments/evaluate.py \
--alg_name ZeroUnlearn \
--model_name Llama-3.1-8B-Instruct \
--hparams_fname Llama-3.1-8B-Instruct.json \
--ds_name mcf \
--unlearn_num 50 \
--retain_num 1000 \
--model_path_dir /path/to/modelsThe following unlearning methods are implemented:
| Method | Description |
|---|---|
ZeroUnlearn |
Our proposed method with closed-form solution for few-shot unlearning |
ZeroUnlearn_GD |
Gradient-based variant for multi-sample batch unlearning |
GA |
Gradient Ascent baseline |
FT |
Fine-Tuning baseline |
ROME |
Rank-One Model Editing |
MEMIT |
Mass-Editing Memory in Transformer |
AlphaEdit |
Null-space constrained editing |
Supported datasets:
- MCF (CounterFact): Factual knowledge unlearning benchmark
- ZsRE: Zero-shot Relation Extraction dataset
- MQuAKE: Multi-hop question answering knowledge editing
ZeroUnlearn/
├── ZeroUnlearn/ # Main ZeroUnlearn implementation
├── ZeroUnlearn_GD/ # ZeroUnlearn with gradient descent
├── AlphaEdit/ # AlphaEdit baseline
├── memit/ # MEMIT baseline
├── rome/ # ROME baseline
├── baselines/ # Other baseline methods (GA, FT, MEND)
├── experiments/ # Evaluation scripts
├── glue_eval/ # Downstream evaluation
├── dsets/ # Dataset loaders
├── hparams/ # Hyperparameter configurations
├── sh/ # Shell scripts
├── util/ # Utility functions
└── images/ # Figures and diagrams
A: Our experiments were conducted on servers with NVIDIA GPUs (A100/A800). A single GPU with 40GB+ memory is recommended for 8B models, while 3B models can run on GPUs with 24GB memory.
A: Create a new hyperparameter JSON file in hparams/ZeroUnlearn/ following the existing templates. Key parameters include layer indices and module templates specific to your model architecture.
A: Yes! Implement a new dataset class in dsets/ following the existing patterns. The dataset should provide prompt, subject, target_true, and target_new fields.
Our framework builds upon the excellent work of:
- MEMIT - Mass-Editing Memory in a Transformer
- ROME - Rank-One Model Editing
- AlphaEdit - Null-space constrained editing
This project is licensed under the MIT License.
