Unlearning Misalignment for Personalized LLM Adaptation via Instance-Response-Dependent Discrepancies

🧠 Framework Overview

Below is a visualization of our Consistent Marginalization framework:

📄 View the Unlearning Misalignment for Personalized LLM Adaptation via Instance-Response-Dependent Discrepancies

Official implementation of the paper:

Unlearning Misalignment for Personalized LLM Adaptation via Instance-Response-Dependent Discrepancies

Cheng Chen, Atsushi Nitanda, Ivor Tsang 📍 Published at TMLR 2025 (Journal)

📖 Overview

Large Language Models (LLMs) have transformed chatbot interactions but often fall short of aligning with the nuanced preferences of individual users. Prompt-based learning improves factual accuracy, but it does not fully capture subjective and context-specific preferences.

We propose Consistent Marginalization (CM) — a framework that builds a personalized memory bank of instance–response-dependent discrepancies from a small set of user preference samples. This equips LLMs to recall and adapt to individual preferences, yielding more consistent and user-aligned responses.

CM shows notable improvements in alignment and robustness across domains and languages, representing a step toward truly personalized conversational agents.

⚡ Quick Start

pip install -r requirements.txt

Run the following commands to get started:

🔹 For ChatGPT-3.5 / ChatGPT-4o-mini

# Step 1: Estimate response–instance discrepancies
jupyter notebook Response_Discrepancies_estimation_gpt3.5.ipynb

# Step 2: Refine LLM with user preference alignment
bash all_runs_chatgpt.sh

✅ Expected outputs:

A discrepancy memory bank file (JSON or pickle format).
Refined model checkpoints saved under checkpoints/chatgpt/.

🔹 For LLaMA 3–8B

# Step 1: Estimate memory bank
python meta_memory_bank_estimation.py

# Step 2: Refine LLM with user preference alignment
bash meta_runs.sh

✅ Expected outputs:

meta_memory_bank.pkl containing response–instance discrepancies.
Fine-tuned LLaMA model checkpoints under checkpoints/llama3/.

🔑 Workflow

Estimate Memorization Bank Build a memory bank specific to the model/dataset.
Compute Discrepancies Use the instance–response discrepancies to refine model alignment.

📂 File Summary

Response_Discrepancies_estimation_gpt3.5.ipynb – Discrepancy estimation (ChatGPT models)
meta_memory_bank_estimation.py – Instance–response discrepancy estimation (LLaMA)
all_runs_chatgpt.sh – Refinement pipeline for ChatGPT models
meta_runs.sh – Refinement pipeline for LLaMA 3–8B

📊 Datasets

We evaluate CM on five diverse datasets:

StackExchange – Multi-domain QA corpus (e.g., programming). Tests alignment in varied contexts.
CLINC150 – 150 intent categories; high-variance preference capture.
BANK77 – Banking-related queries; probes performance in high-stakes scenarios.
MOTE – Multilingual dataset; evaluates cross-lingual adaptability.
Massive Scenario – 51 multilingual NLU datasets; tests scalability across languages.

📒 Reference

@article{
chen2025unlearning,
title={Unlearning Misalignment for Personalized {LLM} Adaptation via Instance-Response-Dependent Discrepancies},
author={Cheng Chen and Atsushi Nitanda and Ivor Tsang},
journal={Transactions on Machine Learning Research},
issn={2835-8856},
year={2025},
url={https://openreview.net/forum?id=njE3swFBMc},
note={}
}

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
datasets		datasets
meta_memorisation_estimated		meta_memorisation_estimated
output		output
README.md		README.md
Response_Discrepancies_estimation_gpt3.5.ipynb		Response_Discrepancies_estimation_gpt3.5.ipynb
all_runs_chatgpt.sh		all_runs_chatgpt.sh
chatgpt_train.py		chatgpt_train.py
meta_memory_bank_estimation.py		meta_memory_bank_estimation.py
meta_runs.sh		meta_runs.sh
meta_train_CM.py		meta_train_CM.py
requirement.txt		requirement.txt
unlearning.drawio.pdf		unlearning.drawio.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Unlearning Misalignment for Personalized LLM Adaptation via Instance-Response-Dependent Discrepancies

🧠 Framework Overview

📖 Overview

⚡ Quick Start

🔹 For ChatGPT-3.5 / ChatGPT-4o-mini

🔹 For LLaMA 3–8B

🔑 Workflow

📂 File Summary

📊 Datasets

📒 Reference

About

Uh oh!

Releases

Packages

Languages

858006908cc/Consistent-Marginalization

Folders and files

Latest commit

History

Repository files navigation

Unlearning Misalignment for Personalized LLM Adaptation via Instance-Response-Dependent Discrepancies

🧠 Framework Overview

📖 Overview

⚡ Quick Start

🔹 For ChatGPT-3.5 / ChatGPT-4o-mini

🔹 For LLaMA 3–8B

🔑 Workflow

📂 File Summary

📊 Datasets

📒 Reference

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages