Octopus: History-Free Gradient Orthogonalization for Continual Learning in Multimodal Large Language Models

A principled, privacy-preserving, and rehearsal-free continual learning framework for MLLMs.

Yuehao Liu¹ · Shanyan Guan² · Weijia Zhang¹ · Xuanming Shang¹ · Yanhao Ge² · Wei Li² · Chao Ma^1*

¹ MoE Key Lab of Artificial Intelligence, AI Institute, Shanghai Jiao Tong University ² vivo Mobile Communication Co., Ltd.

Abstract: Continual learning in multimodal large language models (MLLMs) aims to sequentially acquire knowledge while mitigating catastrophic forgetting, yet existing methods face inherent limitations: architecture-based approaches incur additional computational overhead and often generalize poorly to new tasks, rehearsal-based methods rely on storing historical data, raising privacy and storage concerns, and conventional regularization-based strategies alone are insufficient to fully prevent parameter interference. We propose Octopus, a two-stage continual learning framework based on History-Free Gradient Orthogonalization (HiFGO), which enforces gradient-level orthogonality without historical task data. Our proposed two-stage finetuning strategy decouples task adaptation from regularization, achieving a principled balance between plasticity and stability. Experiments on UCIT show that Octopus establishes state-of-the-art performance, surpassing prior SOTA by 2.14% and 6.82% in terms of Avg and Last.

📢 News & Updates

[2026/02] 🎉 Octopus has been accepted to CVPR 2026!
[2026/03] 🚀 Project page goes live! Click here to visit.
[2026/03] 🔥 We release the training and evaluation code. Stay tuned!

🔮 Highlights

Octopus introduces a novel paradigm to mitigate Catastrophic Forgetting (CF) natively without historical replay buffers or architectural expansion.

💡 History-Free Gradient Orthogonalization (HiFGO): Unveils a principled metric, GPWC (Gradients of Previous parameters Within Current data distribution). HiFGO calculates the gradient sensitivity of historical tasks to steer current optimization.
⚖️ Two-Stage Finetuning Strategy: Decouples unconstrained task alignment (Stage 1) from optimal manifold constrained refinement (Stage 2). This effectively balances plasticity and stability.
🏆 State-of-The-Art Performance: Outperforms state-of-the-art MoE/Regularization strategies across rigorous sequential learning benchmarks like UCIT and CoIN.
⚡ Zero Inference Overhead: Introduces only a single LoRA module during inference, identically efficient to standard sequential fine-tuning, free from expert routing delays.

🛠️ Installation

Our codebase is elegantly built upon the powerful ms-swift framework for Parameter-Efficient Fine-Tuning (PEFT) of MLLMs.

# 1. Clone the repository
git clone https://github.com/fxmangd/Octopus.git
cd Octopus

# 2. Create conda environment
conda create -n octopus python=3.10
conda activate octopus

# 3. Install requirements
pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cu121
pip install transformers==4.53.1 peft==0.15.2 accelerate==1.7.0 vllm==0.7.3 modelscope pandas datasets pydantic einops tensorboardX matplotlib tensorboard
pip install mistral-common==1.0.1 mathruler pycocotools pycocoevalcap pylatexenc

# 4. Install core dependencies including ms-swift
pip install -e .

📊 Dataset Preparation

We evaluate Octopus on the UCIT benchmark and CoIN benchmark.

To reproduce our results, you will need to prepare the dataset and instruction files. You can download the raw datasets and their original instructions directly from their official repositories. For your convenience, we have processed and organized the instruction files. You can download our well-formatted instructions directly from Google Drive:

📥 Download Organized Instructions via Google Drive

📁 Directory Structure

After downloading (and extracting) the data, please ensure that you organize the instruction files into the following directory structure:

├── data/
   	├── UCIT_instructions/
   	|	├── train/
   	|	│   ├── ImageNet-R.json
   	|	│   ├── ArxivQA.json
   	|	│   ├── ...
   	|	│   └── Flickr30k.json
   	|	├── test/
   	|	│   ├── ImageNet-R.json
   	|	│   ├── ArxivQA.json
   	|	│   ├── ...
   	|	│   └── Flickr30k.json
   	├── CoIN_instructions/
   	|	├── train/
   	|	│   ├── SciQA.json
   	|	│   ├── TextVQA.json
   	|	│   ├── ...
   	|	│   └── OCR-VQA.json
   	|	├── test/
   	|	│   ├── ScienceQA.json
   	|	│   ├── TextVQA.json
   	|	│   ├── ...
   	|	│   └── OCR-VQA.json
   	├── UCIT_raw_datas/
   	│   ├── ImageNet-R/
   	│   ├── ArxivQA/
   	│   ├── ...
   	│   └── Flickr30k/
	└── CoIN_raw_datas/
      	├── ScienceQA/
      	├── TextVQA/
      	├── ...
      	└── OCR-VQA/

🚀 Quick Start (Training & Evaluation)

We provide highly optimized shell scripts to conduct sequential learning seamlessly.

1. Sequential Continual Learning on UCIT

To launch the Octopus training pipeline on the UCIT benchmark using LLaVA-v1.5-7b:

cd Octopus
export PYTHONPATH=./

bash examples/Octopus_UCIT/train_all.sh

2. Sequential Continual Learning on CoIN

To launch the Octopus training pipeline on the CoIN benchmark using LLaVA-v1.5-7b:

cd Octopus
export PYTHONPATH=./

bash examples/Octopus_CoIN/train_all.sh

3. Evaluation

To evaluate the final performance after sequential fine-tuning across all tasks:

# Evaluate on UCIT benchmark
python evaluate.py --dataset_name UCIT

# Evaluate on CoIN benchmark
python evaluate.py --dataset_name CoIN

[NOTE] By default, evaluate.py evals the most recently trained model. If you need to specify the path to a pre-trained model, please manually modify the adapter_paths parameter in evaluate.py. Furthermore, note that the LoRA weights obtained from Octopus are incremental updates based on previous tasks. Accordingly, you are required to add the LoRA weights of all historical tasks to adapter_paths, which will be automatically merged during execution.

📊 Main Results

Performance on UCIT Benchmark 📈

Comparison with various methods on UCIT in terms of Avg and Last. The best and second methods are labeled with bold and underline styles. Zero-shot evaluates pretrained model without finetuning. Multi-task jointly finetunes model across all datasets, whereas Sequential Finetune adapts only one LoRA module sequentially to all tasks. These settings provide an empirical characterization of the lower bound, upper bound, and baseline for continual learning methods.

Qualitative Previews

🏷️ Dataset 1: VizWiz (Captioning)

🏷️ Dataset 2: ImageNet-R (VQA)

🏷️ Dataset 3: IconQA (VQA)

🏷️ Dataset 4: CLEVR-Math (VQA)

✒️ Citation

If you find our work or this code repository helpful for your research, please consider citing our paper:

@article{liu2026octopus,
  title={Octopus: History-Free Gradient Orthogonalization for Continual Learning in Multimodal Large Language Models},
  author={Liu, Yuehao and Guan, Shanyan and Zhang, Weijia and Shang, Xuanming and Ge, Yanhao and Li, Wei and Ma, Chao},
  journal={arXiv preprint arXiv:2605.14938},
  year={2026}
}

🙏 Acknowledgements

This repository is built upon the ms-swift, HiDe-LLaVA, CoIN and LLaVA projects. We sincerely thank the authors for their valuable contributions to the research community.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
assets		assets
examples		examples
swift		swift
.gitignore		.gitignore
README.md		README.md
evaluate.py		evaluate.py
generate_gradients_CoIN.py		generate_gradients_CoIN.py
generate_gradients_UCIT.py		generate_gradients_UCIT.py
m4c_evaluator.py		m4c_evaluator.py
preprocessor_config.json		preprocessor_config.json
processor_config.json		processor_config.json
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Octopus: History-Free Gradient Orthogonalization for Continual Learning in Multimodal Large Language Models

📢 News & Updates

🔮 Highlights

🛠️ Installation

📊 Dataset Preparation

📁 Directory Structure

🚀 Quick Start (Training & Evaluation)

1. Sequential Continual Learning on UCIT

2. Sequential Continual Learning on CoIN

3. Evaluation

📊 Main Results

Performance on UCIT Benchmark 📈

Qualitative Previews

✒️ Citation

🙏 Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Octopus: History-Free Gradient Orthogonalization for Continual Learning in Multimodal Large Language Models

📢 News & Updates

🔮 Highlights

🛠️ Installation

📊 Dataset Preparation

📁 Directory Structure

🚀 Quick Start (Training & Evaluation)

1. Sequential Continual Learning on UCIT

2. Sequential Continual Learning on CoIN

3. Evaluation

📊 Main Results

Performance on UCIT Benchmark 📈

Qualitative Previews

✒️ Citation

🙏 Acknowledgements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages