### LIBERO: A Benchmark for Lifelong Robotic Learning

[LIBERO (Lifelong Independence, Benchmarking, and reproducibility in Robot Learning)](https://libero-project.github.io) is a comprehensive benchmark designed to evaluate and accelerate research in lifelong learning for robot manipulation. It provides a suite of tasks and a framework for assessing an agent's ability to acquire new skills and knowledge over time, without forgetting previously learned abilities.

#### Part 1: What is LIBERO?

The core philosophy of LIBERO is to facilitate the study of knowledge transfer in robotic agents. This includes the transfer of both declarative knowledge (what objects are and their properties) and procedural knowledge (how to perform actions and manipulations). The environment is built upon a modular and extensible task generation pipeline, allowing for the creation of a diverse and ever-growing set of challenges for robotic agents.

Key features of the LIBERO environment include:

- **A Suite of Diverse Tasks**: LIBERO offers a collection of manipulation tasks with varying objects, initial conditions, and goals. These tasks are designed to test different aspects of lifelong learning, including forward transfer (learning new tasks faster), backward transfer (improving performance on old tasks), and resistance to catastrophic forgetting.

- **Procedural Task Generation**: The environment includes tools for procedurally generating new tasks, ensuring a continuous stream of novel challenges for lifelong learning agents.

- **Standardized Evaluation Metrics**: LIBERO provides a consistent set of metrics for evaluating the performance of lifelong learning algorithms, enabling fair and reproducible comparisons between different approaches.

- **Focus on Realistic Scenarios**: The tasks in LIBERO are inspired by real-world manipulation challenges, pushing the research towards more practical and general-purpose robotic systems.

#### Part 2: Trying out LIBERO: A Getting Started Guide

To get started with the LIBERO environment, you will need to clone the official GitHub repository and install the necessary dependencies. The following steps will guide you through the process.

**Installation**

To get started with the LIBERO environment, you will need to clone the official GitHub repository and install the necessary dependencies. The following steps will guide you through the process.
```bash
# 1. Create conda environment
conda create -n libero python=3.8.13
conda activate libero

# 2. Install LIBERO dependencies
git clone https://github.com/Lifelong-Robot-Learning/LIBERO.git
cd LIBERO
pip install -r requirements.txt
pip install torch==1.11.0+cu113 torch|vision==0.12.0+cu113 torchaudio==0.11.0 --extra-index-url https://download.pytorch.org/whl/cu113
```
Then install the `libero` package:
```bash
pip install -e .
```

**Download the Datasets**

LIBERO offers high-quality human teleoperation demonstrations across four distinct task suites: `libero_spatial`, `libero_object`, `libero_100`, and `libero_goal`.

To download a specific dataset, use the following command:

```bash
python benchmark_scripts/download_libero_datasets.py --datasets DATASET
```

Replace `DATASET` with your desired choice from the four options listed above. The datasets are saved in the `LIBERO` folder by default.

If you'd like to download all four datasets at once, simply omit the `--datasets` parameter:

```bash
python benchmark_scripts/download_libero_datasets.py
```

**Trying it out**

- **Retrieving a Task**: Here's a minimal example of how to retrieve a specific task from a task suite.
```python
from libero.libero import benchmark
from libero.libero.envs import OffScreenRenderEnv


benchmark_dict = benchmark.get_benchmark_dict()
task_suite_name = "libero_10" # can also choose libero_spatial, libero_object, etc.
task_suite = benchmark_dict[task_suite_name]()

# retrieve a specific task
task_id = 0
task = task_suite.get_task(task_id)
task_name = task.name
task_description = task.language
task_bddl_file = os.path.join(get_libero_path("bddl_files"), task.problem_folder, task.bddl_file)
print(f"[info] retrieving task {task_id} from suite {task_suite_name}, the " + \
      f"language instruction is {task_description}, and the bddl file is {task_bddl_file}")

# step over the environment
env_args = {
    "bddl_file_name": task_bddl_file,
    "camera_heights": 128,
    "camera_widths": 128
}
env = OffScreenRenderEnv(**env_args)
env.seed(0)
env.reset()
init_states = task_suite.get_task_init_states(task_id) # for benchmarking purpose, we fix the a set of initial states
init_state_id = 0
env.set_init_state(init_states[init_state_id])

dummy_action = [0.] * 7
for step in range(10):
    obs, reward, done, info = env.step(dummy_action)
env.close()
```

- **Training**: Currently, LIBERO mainly focuses on **lifelong imitation learning**. To begin a lifelong learning experiment, first select your desired `BENCHMARK`, `POLICY`, and `ALGO`:

    - `BENCHMARK` from: `[LIBERO_SPATIAL, LIBERO_OBJECT, LIBERO_GOAL, LIBERO_90, LIBERO_10]`

    - `POLICY` from: `[bc_rnn_policy, bc_transformer_policy, bc_vilt_policy]`

    - `ALGO` from: `[base, er, ewc, packnet, multitask]`
    
Then, execute the following command:
```bash
export CUDA_VISIBLE_DEVICES=GPU_ID && \
export MUJOCO_EGL_DEVICE_ID=GPU_ID && \
python libero/lifelong/main.py seed=SEED \
                               benchmark_name=BENCHMARK \
                               policy=POLICY \
                               lifelong=ALGO
```
For detailed information on reproducing study results, please refer to the official documentation.

- **Evaluation**: By default, policies are evaluated during the training process. For users with limited GPU resources, a separate evaluation script is available.
```bash
python libero/lifelong/evaluate.py --benchmark BENCHMARK_NAME \
                                   --task_id TASK_ID \
                                   --algo ALGO_NAME \
                                   --policy POLICY_NAME \
                                   --seed SEED \
                                   --ep EPOCH \
                                   --load_task LOAD_TASK \
                                   --device_id CUDA_ID
```

### OpenVLA: An Open-Source Vision-Language-Action Model

[OpenVLA](https://openvla.github.io/) is a powerful, open-source Vision-Language-Action (VLA) model designed for robotic manipulation. It leverages the capabilities of large pre-trained models to enable robots to understand natural language instructions and perform a wide range of tasks based on visual input.

#### Part 1: What is OpenVLA?

OpenVLA is built upon a foundation of leading-edge models, including the Prismatic-7B VLM (Vision Language Model), which itself integrates SigLIP and DinoV2 for visual understanding and a Llama 2-based language model. This architecture allows OpenVLA to effectively ground language commands in visual scenes and translate them into executable robotic actions.

The model was trained on the extensive Open X-Embodiment dataset, a large-scale, multi-platform dataset of robot trajectories. This diverse training data endows OpenVLA with the ability to generalize to new objects, tasks, and environments that it has not encountered during training.

Key features of the OpenVLA model include:

- **Natural Language Instruction Following**: OpenVLA can interpret complex, high-level commands given in natural language and execute the corresponding actions.

- **Strong Generalization Capabilities**: Thanks to its training on a massive and diverse dataset, the model can adapt to novel scenarios and instructions.

- **Open-Source and Accessible**: As an open-source project, OpenVLA provides researchers and developers with access to the model weights, code, and pre-trained checkpoints, fostering collaboration and innovation in the robotics community.

- **Real-time Performance**: The model is designed to be efficient enough for real-time control of robotic hardware.

#### Part 2: Trying out OpenVLA: A Getting Started Guide

This repository was built with `Python 3.10` but should be compatible with any version of `Python >= 3.8`, and it requires `PyTorch 2.2.*`. The latest version of this repository has been thoroughly tested and developed with the following dependencies:
- `PyTorch`: 2.2.0
- `torchvision`: 0.17.0
- `transformers`: 4.40.1
- `tokenizers`: 0.19.1
- `timm`: 0.9.10
- `flash-attn`: 2.5.5

**Installation**

To get started, use the setup commands below:
```bash
# Create conda environment
conda create -n openvla python=3.10 -y
conda activate openvla

# Install PyTorch. Below is a sample command to do this, but you should check the following link
# to find installation instructions that are specific to your compute platform:
# https://pytorch.org/get-started/locally/
conda install pytorch torchvision torchaudio pytorch-cuda=12.4 -c pytorch -c nvidia -y

# Clone and install the openvla repo
git clone https://github.com/openvla/openvla.git
cd openvla
pip install -e .

# Install Flash Attention 2 for training (https://github.com/Dao-AILab/flash-attention)
#   =>> If you run into difficulty, try `pip cache remove flash_attn` first
pip install packaging ninja
ninja --version; echo $?  # Verify Ninja --> should return exit code "0"
pip install "flash-attn==2.5.5" --no-build-isolation
```

**Pretrained VLAs**

Two OpenVLA models trained with checkpoints, configs, and model cards available on HuggingFace page.

- [`openvla-7b`](https://huggingface.co/openvla/openvla-7b): Trained from the Prismatic prism-dinosiglip-224px VLM (based on a fused DINOv2 and SigLIP vision backbone, and Llama-2 LLM). Trained on a large mixture of datasets from Open X-Embodiment spanning 970K trajectories.

- ['openvla-v01-7b'](https://huggingface.co/openvla/openvla-v01-7b): An early model used during development, trained from the Prismatic siglip-224px VLM (singular SigLIP vision backbone, and a Vicuña v1.5 LLM). Trained on the same mixture of datasets as Octo, but for significantly fewer GPU hours than our final model.

**Fine-Tuning OpenVLA via LoRA**

Using **Low-Rank Adaptation (LoRA)** via the Hugging Face `transformers` library is a highly effective way to fine-tune a 7B-parameter model. The main script for LoRA fine-tuning is `vla-scripts/finetune.py`. Once the dataset is downloaded, you can launch the fine-tuning script. Here is an example:

```bash
torchrun --standalone --nnodes 1 --nproc-per-node 1 vla-scripts/finetune.py \
  --vla_path "openvla/openvla-7b" \
  --data_root_dir <PATH TO BASE DATASETS DIR> \
  --dataset_name bridge_orig \
  --run_root_dir <PATH TO LOG/CHECKPOINT DIR> \
  --adapter_tmp_dir <PATH TO TEMPORARY DIR TO SAVE ADAPTER WEIGHTS> \
  --lora_rank 32 \
  --batch_size 1 \
  --grad_accumulation_steps 4 \
  --learning_rate 5e-4 \
  --image_aug <True or False> \
  --wandb_project <PROJECT> \
  --wandb_entity <ENTITY> \
  --save_steps <NUMBER OF GRADIENT STEPS PER CHECKPOINT SAVE>
```

For smaller GPUs, simply reduce `--batch_size` and increase `--grad_accumulation_steps` to maintain a sufficiently large effective batch size for stable training. When `grad_accumulation_steps` is greater than 1, the model accumulates gradients over multiple batches before performing a single parameter update. This simulates training with a much larger batch size, which is crucial for stable training with limited VRAM. For instance, setting `grad_accumulation_steps=4` means the model updates its parameters only once every four batches. 

If you have multiple GPUs, you can use PyTorch Distributed Data Parallel (DDP) by setting `--nproc-per-node` to the number of available GPUs in the `torchrun` command.