Personalized Soups: Personalized Large Language Model Alignment via Post-hoc Parameter Merging, Joel Jang+, N/A, arXiv'23 #1086

AkihikoWatanabe · 2023-10-24T03:17:59Z

URL

https://arxiv.org/abs/2310.11564

Affiliations

Joel Jang, N/A
Seungone Kim, N/A
Bill Yuchen Lin, N/A
Yizhong Wang, N/A
Jack Hessel, N/A
Luke Zettlemoyer, N/A
Hannaneh Hajishirzi, N/A
Yejin Choi, N/A
Prithviraj Ammanabrolu, N/A

Abstract

While Reinforcement Learning from Human Feedback (RLHF) aligns Large LanguageModels (LLMs) with general, aggregate human preferences, it is suboptimal forlearning diverse, individual perspectives. In this work, we study ReinforcementLearning from Personalized Human Feedback (RLPHF) problem, wherein LLMs arealigned to multiple (sometimes conflicting) preferences by modeling alignmentas a Multi-Objective Reinforcement Learning (MORL) problem. Compared to strongsingle-objective baselines, we show that we can achieve personalized alignmentby decomposing preferences into multiple dimensions. These dimensions aredefined based on personalizations that are declared as desirable by the user.In this work, we show that they can be efficiently trained independently in adistributed manner and combined effectively post-hoc through parameter merging.The code is available at https://github.com/joeljang/RLPHF.

Translation (by gpt-3.5-turbo)

Reinforcement Learning from Human Feedback（RLHF）は、一般的な集約された人間の好みと大規模な言語モデル（LLMs）を整合させるものですが、多様な個別の視点を学習するには最適ではありません。
本研究では、個別の人間のフィードバックに基づく強化学習（RLPHF）問題を研究し、LLMsを複数の（時には相反する）好みに整合させるために、整合をMulti-Objective Reinforcement Learning（MORL）問題としてモデル化します。
強力な単一目的のベースラインと比較して、好みを複数の次元に分解することで個別の整合を実現できることを示します。
これらの次元は、ユーザーが望ましいと宣言した個人化に基づいて定義されます。
本研究では、これらの次元を効率的に分散して独立してトレーニングし、パラメータのマージを介して効果的に後処理で組み合わせることができることを示します。
コードはhttps://github.com/joeljang/RLPHFで利用可能です。

Summary (by gpt-3.5-turbo)

Reinforcement Learning from Human Feedback (RLHF) is not optimal for learning diverse individual perspectives, as it aligns general aggregated human preferences with large language models (LLMs). This study investigates the problem of Reinforcement Learning from Individual Human Feedback (RLPHF) and models the alignment with LLMs to multiple (sometimes conflicting) preferences as a Multi-Objective Reinforcement Learning (MORL) problem. It demonstrates that individual alignment can be achieved by decomposing preferences into multiple dimensions based on personalized declarations. The study shows that these dimensions can be efficiently trained independently and distributed, and effectively combined in post-processing through parameter merging. The code is available at https://github.com/joeljang/RLPHF.

AkihikoWatanabe · 2023-10-24T03:19:57Z

どこまでのことが実現できるのかが気になる。

AkihikoWatanabe added the Pocket label Oct 24, 2023

AkihikoWatanabe changed the title あ Personalized Soups: Personalized Large Language Model Alignment via Post-hoc Parameter Merging, Joel Jang+, N/A, arXiv'23 Oct 24, 2023

AkihikoWatanabe added NLP LanguageModel Personalization labels Oct 25, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Personalized Soups: Personalized Large Language Model Alignment via Post-hoc Parameter Merging, Joel Jang+, N/A, arXiv'23 #1086

Personalized Soups: Personalized Large Language Model Alignment via Post-hoc Parameter Merging, Joel Jang+, N/A, arXiv'23 #1086

AkihikoWatanabe commented Oct 24, 2023 •

edited

AkihikoWatanabe commented Oct 24, 2023

Personalized Soups: Personalized Large Language Model Alignment via Post-hoc Parameter Merging, Joel Jang+, N/A, arXiv'23 #1086

Personalized Soups: Personalized Large Language Model Alignment via Post-hoc Parameter Merging, Joel Jang+, N/A, arXiv'23 #1086

Comments

AkihikoWatanabe commented Oct 24, 2023 • edited

URL

Affiliations

Abstract

Translation (by gpt-3.5-turbo)

Summary (by gpt-3.5-turbo)

AkihikoWatanabe commented Oct 24, 2023

AkihikoWatanabe commented Oct 24, 2023 •

edited