Towards A Unified Agent with Foundation Models, Norman Di Palo+, N/A, arXiv'23 #883

AkihikoWatanabe · 2023-07-22T08:31:42Z

URL

Language Models and Vision Language Models have recently demonstratedunprecedented capabilities in terms of understanding human intentions,reasoning, scene understanding, and planning-like behaviour, in text form,among many others. In this work, we investigate how to embed and leverage suchabilities in Reinforcement Learning (RL) agents. We design a framework thatuses language as the core reasoning tool, exploring how this enables an agentto tackle a series of fundamental RL challenges, such as efficient exploration,reusing experience data, scheduling skills, and learning from observations,which traditionally require separate, vertically designed algorithms. We testour method on a sparse-reward simulated robotic manipulation environment, wherea robot needs to stack a set of objects. We demonstrate substantial performanceimprovements over baselines in exploration efficiency and ability to reuse datafrom offline datasets, and illustrate how to reuse learned skills to solvenovel tasks or imitate videos of human experts.

最近、言語モデルとビジョン言語モデルは、テキスト形式での人間の意図の理解、推論、シーン理解、計画のような行動など、前例のない能力を示しています。本研究では、このような能力を強化学習（RL）エージェントに埋め込み、活用する方法を調査します。言語を中核とした推論ツールとして使用するフレームワークを設計し、これによってエージェントが効率的な探索、経験データの再利用、スキルのスケジューリング、観測からの学習など、従来は別々に設計されたアルゴリズムが必要とされるRLの基本的な課題に取り組むことができるかを探求します。我々は、スパースな報酬のシミュレーションロボット操作環境でのテストを行いました。この環境では、ロボットが一連のオブジェクトを積み重ねる必要があります。我々は、探索効率とオフラインデータセットからのデータ再利用能力において、ベースラインに比べて大幅な性能向上を実証し、学習済みのスキルを新しいタスクの解決や人間の専門家のビデオの模倣に活用する方法を示しています。

本研究では、言語モデルとビジョン言語モデルを強化学習エージェントに組み込み、効率的な探索や経験データの再利用などの課題に取り組む方法を調査しました。スパースな報酬のロボット操作環境でのテストにおいて、ベースラインに比べて大幅な性能向上を実証し、学習済みのスキルを新しいタスクの解決や人間の専門家のビデオの模倣に活用する方法を示しました。

AkihikoWatanabe · 2023-07-22T08:33:25Z

AkihikoWatanabe added action_wanted Pocket labels Jul 22, 2023

AkihikoWatanabe changed the title あ Towards A Unified Agent with Foundation Models, Norman Di Palo+, N/A, arXiv'23 Jul 22, 2023

AkihikoWatanabe added ComputerVision NLP LLMAgent LanguageModel and removed action_wanted labels Oct 21, 2023