LLaMA-Rider: Spurring Large Language Models to Explore the Open World

LLaMA-Rider is a two-stage learning framework that spurs Large Language Models (LLMs) to explore the open world and learn to accomplish multiple tasks. This repository contains the implementation of LLaMA-Rider in the sandbox game Minecraft, and the code is largely based on the Plan4MC repository.

Installation

The installation of MineDojo and Plan4MC is the same as that in the Plan4MC repository:

Install MineDojo environment following the official document. It requires python >= 3.9. We install jdk 1.8.0_171.
Upgrade the MineDojo package:
- Delete the original package pip uninstall minedojo.
- Download the modified MineDojo. Run python setup.py install.
- Download the pretrained MineCLIP model named attn.pth. Move the file to mineclip_official/.
- To this end, you can successfully run validate_install.py here.
  - if you are on a headless machine, please use the following command to verify if the installation was successful:
```
xvfb-run python minedojo/scripts/validate_install.py
```
Install python packages in requirements.txt. Note that we validate our code with PyTorch=2.0.1 and x-transformers==0.27.1.
```
pip install -r requirements.txt
```

Method overview

LLaMA-Rider is a two-stage framework:

Exploration stage: LLM explores the open world with the help of the environmental feedback, where a feedback-revision mechanism helps the LLM revise its previous decisions to align with the environment
Learning stage: The experiences collected during exploration stage are processed into a supervised dataset and used for supervised fine-tuning (SFT) of the LLM

Exploration stage

In the exploration stage, for tasks based on logs/stones/mobs, run

python collect_feedback.py

For tasks based on iron ore, run

python collect_feedback_iron.py

Available tasks are listed in envs/hard_task_conf.yaml. One can modify the file to change task settings.

Learning stage

One can process the explored experiences into a supervised dataset by calling:

python process_data.py

For learning stage, we use QLoRA to train the LLM. Run

sh train/scripts/sft_70B.sh

Evaluation

For evaluation with the LLM after SFT, run

python collect_feedback.py --adapter /path/to/adatper

Main results

LLaMA-Rider outperforms ChatGPT planner on average across 30 tasks in Minecraft based on LLaMA-2-70B-chat.

Besides, LLaMA-Rider can accomplish 56.25% more tasks after learning stage using only a 1.3k supervised data, showing the efficiency and effectiveness of the framework.

We also found LLaMA-Rider can achieve better performance in unseen iron-based tasks, which are more difficult, after exploration & learning in 30 log/stone/mob-based tasks, showing the generalization of the learned decision making capabilities.

Citation

If you use our method or code in your research, please consider citing the paper as follows:

@article{feng2023llama,
      title={LLaMA Rider: Spurring Large Language Models to Explore the Open World}, 
      author={Yicheng Feng and Yuxuan Wang and Jiazheng Liu and Sipeng Zheng and Zongqing Lu},
      journal={arXiv preprint arXiv:2310.08922},
      year={2023}
}

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
dataset		dataset
envs		envs
figs		figs
mineagent		mineagent
mineclip/utils		mineclip/utils
mineclip_official		mineclip_official
openai/clip-vit-base-patch16		openai/clip-vit-base-patch16
recurrent_ppo_truncated_bptt		recurrent_ppo_truncated_bptt
skills		skills
spinup_utils		spinup_utils
train		train
.gitignore		.gitignore
README.md		README.md
collect_feedback.py		collect_feedback.py
collect_feedback_iron.py		collect_feedback_iron.py
flash_attn_patch.py		flash_attn_patch.py
generate_skill_description.py		generate_skill_description.py
llama2_flash_attn_patch.py		llama2_flash_attn_patch.py
plan_all_tasks.py		plan_all_tasks.py
process_data.py		process_data.py
requirements.txt		requirements.txt
test.py		test.py
test_navigation.py		test_navigation.py
train_dqn_navigation.py		train_dqn_navigation.py
train_ppo_navigation.py		train_ppo_navigation.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LLaMA-Rider: Spurring Large Language Models to Explore the Open World

Installation

Method overview

Exploration stage

Learning stage

Evaluation

Main results

Citation

About

Releases

Packages

Contributors 2

Languages

PKU-RL/LLaMA-Rider

Folders and files

Latest commit

History

Repository files navigation

LLaMA-Rider: Spurring Large Language Models to Explore the Open World

Installation

Method overview

Exploration stage

Learning stage

Evaluation

Main results

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages