Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Personalized Soups: Personalized Large Language Model Alignment via Post-hoc Parameter Merging, Joel Jang+, N/A, arXiv'23 #1086

Open
AkihikoWatanabe opened this issue Oct 24, 2023 · 1 comment

Comments

@AkihikoWatanabe
Copy link
Owner

AkihikoWatanabe commented Oct 24, 2023

URL

Affiliations

  • Joel Jang, N/A
  • Seungone Kim, N/A
  • Bill Yuchen Lin, N/A
  • Yizhong Wang, N/A
  • Jack Hessel, N/A
  • Luke Zettlemoyer, N/A
  • Hannaneh Hajishirzi, N/A
  • Yejin Choi, N/A
  • Prithviraj Ammanabrolu, N/A

Abstract

  • While Reinforcement Learning from Human Feedback (RLHF) aligns Large LanguageModels (LLMs) with general, aggregate human preferences, it is suboptimal forlearning diverse, individual perspectives. In this work, we study ReinforcementLearning from Personalized Human Feedback (RLPHF) problem, wherein LLMs arealigned to multiple (sometimes conflicting) preferences by modeling alignmentas a Multi-Objective Reinforcement Learning (MORL) problem. Compared to strongsingle-objective baselines, we show that we can achieve personalized alignmentby decomposing preferences into multiple dimensions. These dimensions aredefined based on personalizations that are declared as desirable by the user.In this work, we show that they can be efficiently trained independently in adistributed manner and combined effectively post-hoc through parameter merging.The code is available at https://github.com/joeljang/RLPHF.

Translation (by gpt-3.5-turbo)

  • Reinforcement Learning from Human Feedback(RLHF)は、一般的な集約された人間の好みと大規模な言語モデル(LLMs)を整合させるものですが、多様な個別の視点を学習するには最適ではありません。
    本研究では、個別の人間のフィードバックに基づく強化学習(RLPHF)問題を研究し、LLMsを複数の(時には相反する)好みに整合させるために、整合をMulti-Objective Reinforcement Learning(MORL)問題としてモデル化します。
    強力な単一目的のベースラインと比較して、好みを複数の次元に分解することで個別の整合を実現できることを示します。
    これらの次元は、ユーザーが望ましいと宣言した個人化に基づいて定義されます。
    本研究では、これらの次元を効率的に分散して独立してトレーニングし、パラメータのマージを介して効果的に後処理で組み合わせることができることを示します。
    コードはhttps://github.com/joeljang/RLPHFで利用可能です。

Summary (by gpt-3.5-turbo)

  • Reinforcement Learning from Human Feedback (RLHF) is not optimal for learning diverse individual perspectives, as it aligns general aggregated human preferences with large language models (LLMs). This study investigates the problem of Reinforcement Learning from Individual Human Feedback (RLPHF) and models the alignment with LLMs to multiple (sometimes conflicting) preferences as a Multi-Objective Reinforcement Learning (MORL) problem. It demonstrates that individual alignment can be achieved by decomposing preferences into multiple dimensions based on personalized declarations. The study shows that these dimensions can be efficiently trained independently and distributed, and effectively combined in post-processing through parameter merging. The code is available at https://github.com/joeljang/RLPHF.
@AkihikoWatanabe AkihikoWatanabe changed the title Personalized Soups: Personalized Large Language Model Alignment via Post-hoc Parameter Merging, Joel Jang+, N/A, arXiv'23 Oct 24, 2023
@AkihikoWatanabe
Copy link
Owner Author

どこまでのことが実現できるのかが気になる。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant