Skip to content

RWLinno/MoRA-Embodied

Repository files navigation

Phase 2: Sequential LoRA Composition for Continual Embodied Learning

Overview

Phase 2 implements Sequential LoRA Composition (SLC) — a continual learning framework that incrementally adds capability-specific LoRA adapters to a frozen vision-language backbone. Each new adapter is trained while all preceding adapters remain frozen but active in the forward pass, enabling the model to accumulate new skills without catastrophic forgetting.

Base Model: InternVL3.5-14B (InternViT-6B + Qwen3-14B, 15.5B params)

Architecture

Phase 1 (completed):  Backbone + LoRA → merge into weights → W'
Phase 2 (this code):  W' (frozen) + LoRA₂ (spatial) + LoRA₃ (point) + ...

Forward pass:  h = W'x + Σ αₖ BₖAₖx   (all adapters active, only latest trainable)
Component Rank Params % of Backbone Capability
Backbone W' 15.5B 100% General + Embodied
LoRA₂ 96 385M 2.49% Spatial Reasoning
LoRA₃ 64 257M 1.66% Point Grounding

Current Results (n=50 per benchmark)

Benchmark Backbone +LoRA₂ Forgetting
SAT (spatial QA) 0.96 0.96 0.00
WhatsUp (spatial QA) 1.00 1.00 0.00
RoboPoint-1 (point ref) 0.04 0.04 0.00
RoboPoint-2 (point ref) 0.10 0.10 0.00
FSD (free point) 0.00 0.00 0.00
RoboRefit (point rec) 0.26 0.26 0.00
Avg Spatial 0.98 0.98 0.00
Avg Point 0.10 0.10 0.00

Training loss (LoRA₂, 100 steps): 0.571 → 0.545

Directory Structure

phase2/
├── train_phase2_lora.py      # Training entrypoint (DeepSpeed + PEFT)
├── evaluate.py                # Unified evaluation pipeline
├── dashboard.html             # Interactive visualization (open in browser)
├── configs/
│   ├── benchmark_eval.json    # Evaluation dataset config
│   ├── data/
│   │   ├── lora2_spatial_reasoning.json
│   │   ├── lora3_point_grounding.json
│   │   └── ...
│   └── zero_stage1_config.json
├── scripts/
│   ├── submit_train.sh        # SLURM training submission
│   ├── submit_eval.sh         # SLURM evaluation submission
│   ├── run_single_lora.sh     # Single LoRA training driver
│   ├── run_eval.sh            # Evaluation driver
│   ├── run_all.sh             # Full sequential training (all LoRAs)
│   └── run_forgetting_eval.sh # Complete forgetting evaluation
├── logs/                      # Evaluation results (JSON) and SLURM logs
├── paper/                     # LaTeX paper (CVPR format)
│   └── main.tex
└── README.md                  # This file

Reproduction Guide

Prerequisites

conda activate internvl
cd /data/user/kchen879/knowin/codes/InternVL-H100/internvl_chat_gpt_oss
export PYTHONPATH="${PYTHONPATH}:$(pwd)"

1. Train a Single LoRA Adapter

# LoRA₂ (spatial reasoning), 4000 steps, 2x H100
sbatch phase2/scripts/submit_train.sh lora2 4000

# LoRA₃ (point grounding), requires LoRA₂ trained first
sbatch phase2/scripts/submit_train.sh lora3 4000

# Quick pilot test (100 steps)
sbatch phase2/scripts/submit_train.sh lora2 100

2. Evaluate

# Evaluate backbone only
sbatch phase2/scripts/submit_eval.sh backbone 50

# Evaluate with LoRA₂
sbatch phase2/scripts/submit_eval.sh lora2_only 50

# Full forgetting evaluation
sbatch phase2/scripts/submit_eval.sh forgetting

3. View Results

# JSON results
cat phase2/logs/backbone_only_*.json
cat phase2/logs/after_lora2_only_*.json

# Interactive dashboard
# Open phase2/dashboard.html in a browser

# Forgetting history
cat phase2/logs/forgetting_history.json

4. Full Sequential Training (All 4 LoRAs)

bash phase2/scripts/run_all.sh

Key Design Decisions

  1. Merged Phase 1 backbone: The Phase 1 LoRA is merged into weights before Phase 2. This avoids double-LoRA overhead and provides a cleaner optimization landscape.

  2. Frozen-but-active predecessors: During training of LoRA_k, all LoRA_{2..k-1} are frozen but participate in forward pass. The new adapter learns a residual correction, not the full task.

  3. ZeRO-1 (not ZeRO-3): LoRA parameters are small enough to replicate across GPUs. ZeRO-1 shards only optimizer states, allowing direct adapter saving without gather.

  4. No routing at inference: All adapters are always active with uniform weight. This eliminates routing errors and ensures consistent behavior.

Checkpoint Paths

Component Path
Phase 1 merged backbone work_dirs/internvl_chat_v3_5/internvl3_14b_loraResume_r128_merge_0202/checkpoint-6000
LoRA₂ adapter work_dirs/phase2/lora2_spatial_reasoning/adapter/lora2_spatial_reasoning/lora2_spatial_reasoning/
LoRA₃ adapter work_dirs/phase2/lora3_point_grounding/adapter/lora3_point_grounding/lora3_point_grounding/
Evaluation logs phase2/logs/*.json

Citation

See phase2/paper/main.tex for the full paper describing this method.

About

Mixture of Routed Adapters for Embodied Agents

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors